allenai
✓ VerifiedResearch LabAllen Institute for AI, nonprofit AI research
longformer-base-4096
--- language: en license: apache-2.0 ---
OLMo-2-0425-1B
--- license: apache-2.0 language: - en library_name: transformers ---
unifiedqa-t5-small
--- language: en ---
scibert_scivocab_uncased
--- language: en ---
specter2_base
--- license: apache-2.0 datasets: - allenai/scirepeval language: - en ---
olmOCR-7B-0825-FP8
--- language: - en license: apache-2.0 datasets: - allenai/olmOCR-mix-0225 base_model: - Qwen/Qwen2.5-VL-7B-Instruct library_name: transformers new_version: allenai/olmOCR-2-7B-1025-FP8 ---
olmOCR-2-7B-1025-FP8
Quantized to FP8 Version of olmOCR-2-7B-1025, using llmcompressor. This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been additionally fine tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases. Quick links: - 📃 Paper - 🤗 SFT Dataset - 🤗 RL Dataset - 🛠️ Code - 🎮 Demo The best way to use this model is via the olmOCR toolkit. The toolkit comes with an efficient inference setup via VLLM that can handle millions of documents at scale. This model scores the following scores on olmOCR-bench when used with the olmOCR toolkit toolkit which automatically renders, rotates, and retries pages as needed. Model ArXiv Old Scans Math Tables Old Scans Headers and Footers Multi column Long tiny text Base Overall olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025 82.9 82.1 84.3 48.3 95.7 84.3 81.4 99.7 82.3 ± 1.1 olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025-FP8 83.0 82.3 84.9 47.7 96.1 83.7 81.9 99.7 82.4 ± 1.1 This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit. If you want to prompt this model manually instead of using the olmOCR toolkit, please see the code below. In normal usage, the olmOCR toolkit builds the prompt by rendering the PDF page, and extracting relevant text blocks and image metadata. To duplicate that you will need to This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.
OLMo-1B-0724-hf
OLMo 1B July 2024 is the latest version of the original OLMo 1B model rocking a 4.4 point increase in HellaSwag, among other evaluations improvements, from an improved version of the Dolma dataset and staged training. This version is for direct use with HuggingFace Transformers from v4.40 on. OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset. We release all code, checkpoints, logs, and details involved in training these models. The core models released in this batch are the following: | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length | |------|--------|---------|-------------|-----------------|----------------| | OLMo 1B July 2024 | 3.05 Trillion | 16 | 2048 | 16 | 4096 | | OLMo 7B July 2024 | 2.75 Trillion | 32 | 4096 | 32 | 4096 | [Coming soon] We are releasing many checkpoints for these models, for every 1000 training steps. The naming convention is `stepXXX-tokensYYYB`. To load a specific model revision with HuggingFace, simply add the argument `revision`: All revisions/branches are listed in the file `revisions.txt`. Or, you can access all the revisions for the models via the following code snippet: - Developed by: Allen Institute for AI (AI2) - Supported by: Databricks, Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University, AMD, CSC (Lumi Supercomputer), UW - Model type: a Transformer style autoregressive language model. - Language(s) (NLP): English - License: The code and model are released under Apache 2.0. - Contact: Technical inquiries: `olmo at allenai dot org`. Press: `press at allenai dot org` - Date cutoff: Oct. 2023, with most data from Feb./March 2023 based on Dolma dataset version. - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo - Evaluation code: https://github.com/allenai/OLMo-Eval - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: Link Install Transformers. Then proceed as usual with HuggingFace: Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.frompretrained("allenai/OLMo-1B-0724-hf", torchdtype=torch.float16, loadin8bit=True)` (requires `bitsandbytes`). The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.inputids.to('cuda')` to avoid potential issues. Fine-tuning Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available. 1. Fine-tune with the OLMo repository: 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are here. Core model results for the new and original 7B model are found below. | Task | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | OLMo 7B 0424 | |-------------------|----------|-----------|-----------|--------|---------|------------|-------------| | arcc | 44.5 | 48.5 | 47.5 | 46.5 | 48.5 | 52.8 | 42.5 | | arce | 67.9 | 69.5 | 70.4 | 70.5 | 65.4 | 73.7 | 67.2 | | boolq | 75.4 | 80.2 | 74.6 | 74.2 | 73.4 | 82.2 | 83.7 | | copa | 91.0 | 86.0 | 86.0 | 85.0 | 90.0 | 90.0 | 86.0 | | hellaswag | 76.2 | 76.8 | 75.9 | 77.6 | 76.4 | 78.6 | 75.5 | | openbookqa | 51.2 | 48.4 | 53.0 | 48.6 | 50.4 | 51.8 | 50.0 | | piqa | 77.2 | 76.7 | 78.5 | 77.3 | 78.4 | 79.0 | 77.5 | | sciq | 93.9 | 94.5 | 93.9 | 93.7 | 93.8 | 95.5 | 96.7 | | winogrande | 70.5 | 69.4 | 68.9 | 69.9 | 67.9 | 73.5 | 69.8 | | truthfulQA (MC2) | 33.9 | 38.5 | 34.0 | 33.0 | 36.0 | 36.8 | 35.8 | | MMLU (5 shot MC) | 31.5 | 45.0 | 24.0 | 30.8 | 28.3 | 55.5 | 52.0 | | GSM8k | 10.0 | 12.0 | 4.0 | 4.5 | 8.5 | 25.0 | 29.0 | | Full average | 60.3 | 62.1 | 59.2 | 59.3 | 59.8 | 66.2 | 63.8 | | task | random | StableLM 2 1.6b\ | Pythia 1B | TinyLlama 1.1B | OLMo 1B | OLMo 1B 0724 (ours) | | ------------- | ------ | ----------------- | --------- | -------------------------------------- | ------- | ---- | | arcchallenge | 25 | 43.8 | 33.1 | 34.8 | 34.5 | 36.5 | | arceasy | 25 | 63.7 | 50.2 | 53.2 | 58.1 | 55.3 | | boolq | 50 | 76.6 | 61.8 | 64.6 | 60.7 | 67.5 | | copa | 50 | 84.0 | 72.0 | 78.0 | 79.0 | 83.0 | | hellaswag | 25 | 68.2 | 44.7 | 58.7 | 62.5 | 66.9 | | openbookqa | 25 | 45.8 | 37.8 | 43.6 | 46.4 | 46.4 | | piqa | 50 | 74.0 | 69.1 | 71.1 | 73.7 | 74.9 | | sciq | 25 | 94.7 | 86.0 | 90.5 | 88.1 | 93.4 | | winogrande | 50 | 64.9 | 53.3 | 58.9 | 58.9 | 61.4 | | Average | 36.1 | 68.4 | 56.4 | 61.5 | 62.4 | 65.0 | \Unlike OLMo, Pythia, and TinyLlama, StabilityAI has not disclosed yet the data StableLM was trained on, making comparisons with other efforts challenging. Data For training data details, please see the Dolma documentation. This model uses the new 1.7 version with more data sources, better deduplication, and quality filtering. During the annealing phase we use a higher quality subset of Dolma with a linearly decaying learning rate to 0. In contrast to the first OLMo, we trained OLMo 7B 0424 with a two-stage curriculum: In the first stage, we trained the model from scratch on the Dolma 1.7 dataset. We set a cosine learning rate schedule with a warmup of 2500 steps, a peak learning rate of 3e-4, and a cosine decay to 3e-5 after 3T tokens. We cut off this stage after 2T tokens, when the learning rate is still high. At this point we switch to the second stage, in which we train on a higher-quality subset of Dolma 1.7 (see below) for another 50B tokens, while linearly decaying the learning rate to 0. Our high-quality subset includes (1) using all available Wikipedia, OpenWebMath and Flan data, (2) removing Dolma CC, CC News, and Megawika, and (3) rebalancing remaining sources to achieve approximately equal proportions of each. See exact token counts and relative proportions of this second stage mix below. Both stages contribute equally to the final performance of the OLMo model. After the first stage, OLMo 7B 0424 already outperforms the older OLMo. The second stage consistently adds 2 to 3 points of performance on top. OLMo 7B architecture with peer models for comparison. | | OLMo 7B | Llama 2 7B | OpenLM 7B | Falcon 7B | PaLM 8B | |------------------------|-------------------|---------------------|--------------------|--------------------|------------------| | dmodel | 4096 | 4096 | 4096 | 4544 | 4096 | | num heads | 32 | 32 | 32 | 71 | 16 | | num layers | 32 | 32 | 32 | 32 | 32 | | MLP ratio | ~8/3 | ~8/3 | ~8/3 | 4 | 4 | | LayerNorm type | non-parametric LN | RMSNorm | parametric LN | parametric LN | parametric LN | | pos embeddings | RoPE | RoPE | RoPE | RoPE | RoPE | | attention variant | full | GQA | full | MQA | MQA | | biases | none | none | in LN only | in LN only | none | | block type | sequential | sequential | sequential | parallel | parallel | | activation | SwiGLU | SwiGLU | SwiGLU | GeLU | SwiGLU | | sequence length | 2048 | 4096 | 2048 | 2048 | 2048 | | batch size (instances) | 2160 | 1024 | 2048 | 2304 | 512 | | batch size (tokens) | ~4M | ~4M | ~4M | ~4M | ~1M | | weight tying | no | no | no | no | yes | | Size | Peak LR | Betas | Epsilon | Weight Decay | |------|------------|-----------------|-------------|--------------| | 1B | 4.0E-4 | (0.9, 0.95) | 1.0E-5 | 0.1 | | 7B | 3.0E-4 | (0.9, 0.99) | 1.0E-5 | 0.1 | | | OLMo 7B | Llama 2 7B | OpenLM 7B | Falcon 7B | |-----------------------|------------------|---------------------|--------------------|--------------------| | warmup steps | 5000 | 2000 | 2000 | 1000 | | peak LR | 3.0E-04 | 3.0E-04 | 3.0E-04 | 6.0E-04 | | minimum LR | 3.0E-05 | 3.0E-05 | 3.0E-05 | 1.2E-05 | | weight decay | 0.1 | 0.1 | 0.1 | 0.1 | | beta1 | 0.9 | 0.9 | 0.9 | 0.99 | | beta2 | 0.95 | 0.95 | 0.95 | 0.999 | | epsilon | 1.0E-05 | 1.0E-05 | 1.0E-05 | 1.0E-05 | | LR schedule | linear | cosine | cosine | cosine | | gradient clipping | global 1.0 | global 1.0 | global 1.0 | global 1.0 | | gradient reduce dtype | FP32 | FP32 | FP32 | BF16 | | optimizer state dtype | FP32 | most likely FP32 | FP32 | FP32 | OLMo 7B variants were either trained on MI250X GPUs at the LUMI supercomputer, or A100-40GB GPUs provided by MosaicML. A summary of the environmental impact. Further details are available in the paper. | | GPU Type | Power Consumption From GPUs | Carbon Intensity (kg CO₂e/KWh) | Carbon Emissions (tCO₂eq) | |-----------|------------|-----------------------------|--------------------------------|---------------------------| | OLMo 7B Twin | MI250X (LUMI supercomputer) | 135 MWh | 0 | 0 | | OLMo 7B | A100-40GB (MosaicML) | 104 MWh | 0.656 | 75.05 | Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content. Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology. Otherwise, many facts from OLMo or any LLM will often not be true, so they should be checked. Groeneveld, D., Beltagy, I., Walsh, P., Bhagia, A., Kinney, R., Tafjord, O., Jha, A., Ivison, H., Magnusson, I., Wang, Y., Arora, S., Atkinson, D., Authur, R., Chandu, K., Cohan, A., Dumas, J., Elazar, Y., Gu, Y., Hessel, J., Khot, T., Merrill, W., Morrison, J., Muennighoff, N., Naik, A., Nam, C., Peters, M., Pyatkin, V., Ravichander, A., Schwenk, D., Shah, S., Smith, W., Subramani, N., Wortsman, M., Dasigi, P., Lambert, N., Richardson, K., Dodge, J., Lo, K., Soldaini, L., Smith, N., & Hajishirzi, H. (2024). OLMo: Accelerating the Science of Language Models. Preprint. For errors in this model card, contact Nathan, `{nathanl} at allenai dot org`.
biomed_roberta_base
BioMed-RoBERTa-base is a language model based on the RoBERTa-base (Liu et. al, 2019) architecture. We adapt RoBERTa-base to 2.68 million scientific papers from the Semantic Scholar corpus via continued pretraining. This amounts to 7.55B tokens and 47GB of data. We use the full text of the papers in training, not just abstracts. Specific details of the adaptive pretraining procedure can be found in Gururangan et. al, 2020. BioMed-RoBERTa achieves competitive performance to state of the art models on a number of NLP tasks in the biomedical domain (numbers are mean (standard deviation) over 3+ random seeds) | Task | Task Type | RoBERTa-base | BioMed-RoBERTa-base | |--------------|---------------------|--------------|---------------------| | RCT-180K | Text Classification | 86.4 (0.3) | 86.9 (0.2) | | ChemProt | Relation Extraction | 81.1 (1.1) | 83.0 (0.7) | | JNLPBA | NER | 74.3 (0.2) | 75.2 (0.1) | | BC5CDR | NER | 85.6 (0.1) | 87.8 (0.1) | | NCBI-Disease | NER | 86.6 (0.3) | 87.1 (0.8) | If using this model, please cite the following paper:
OLMo-2-1124-7B
We introduce OLMo 2, a new family of 7B and 13B models featuring a 9-point increase in MMLU, among other evaluation improvements, compared to the original OLMo 7B model. These gains come from training on OLMo-mix-1124 and Dolmino-mix-1124 datasets and staged training approach. OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details. | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length | |------|--------|---------|-------------|-----------------|----------------| | OLMo 2-7B | 4 Trillion | 32 | 4096 | 32 | 4096 | | OLMo 2-13B | 5 Trillion | 40 | 5120 | 40 | 4096 | The core models released in this batch include the following: | Stage | OLMo 2 7B | OLMo 2 13B | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | Base Model | allenai/OLMo-2-1124-7B | allenai/OLMo-2-1124-13B | | SFT | allenai/OLMo-2-1124-7B-SFT | allenai/OLMo-2-1124-13B-SFT | | DPO | allenai/OLMo-2-1124-7B-DPO | allenai/OLMo-2-1124-13B-DPO | | Final Models (RLVR) | allenai/OLMo-2-1124-7B-Instruct | allenai/OLMo-2-1124-13B-Instruct | | Reward Model (RM)| allenai/OLMo-2-1124-7B-RM | (Same as 7B) | OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: You can use OLMo with the standard HuggingFace transformers library: For faster performance, you can quantize the model using the following method: The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using: We have released checkpoints for these models. For pretraining, the naming convention is `stepXXX-tokensYYYB`. For checkpoints with ingredients of the soup, the naming convention is `stage2-ingredientN-stepXXX-tokensYYYB` To load a specific model revision with HuggingFace, simply add the argument `revision`: Or, you can access all the revisions for the models via the following code snippet: Fine-tuning Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available. 1. Fine-tune with the OLMo repository: 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are here. - Developed by: Allen Institute for AI (Ai2) - Model type: a Transformer style autoregressive language model. - Language(s) (NLP): English - License: The code and model are released under Apache 2.0. - Contact: Technical inquiries: `[email protected]`. Press: `[email protected]` - Date cutoff: Dec. 2023. - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo - Evaluation code: https://github.com/allenai/OLMo-Eval - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2501.00656 Evaluation Core model results for OLMo 2 7B and 13B models are found below. | Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMLUPro | TriviaQA | |-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------| | Open weights models: | | Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 | | Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 | | Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 | | Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 | | Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 | | Gemma-2-9B | 4.4·10²³ | 67.8 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 70.1 | 42 | 81.8 | | Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 | | Partially open models: | | StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 | | Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 | | Fully open models: | | Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 | | OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 | | MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 | | OLMo-0424-7B | 0.9·10²³ | 50.7 | 66.9 | 80.1 | 73.6 | 54.3 | 50 | 29.6 | 43.9 | 27.7 | 22.1 | 58.8 | | DCLM-7B | 1.0·10²³ | 56.9 | 79.8 | 82.3 | 77.3 | 64.4 | 39.3 | 28.8 | 47.5 | 46.1 | 31.3 | 72.1 | | OLMo-2-1124-7B | 1.8·10²³ | 62.9 | 79.8 | 83.8 | 77.2 | 63.7 | 60.8 | 36.9 | 50.4 | 67.5 | 31 | 78 | | OLMo-2-1124-13B | 4.6·10²³ | 68.3 | 83.5 | 86.4 | 81.5 | 67.5 | 70.7 | 46.7 | 54.2 | 75.1 | 35.1 | 81.9 | Pretraining | | OLMo 2 7B | OLMo 2 13B | |-------------------|------------|------------| | Pretraining Stage 1 (OLMo-Mix-1124) | 4 trillion tokens (1 epoch) | 5 trillion tokens (1.2 epochs) | | Pretraining Stage 2 (Dolmino-Mix-1124) | 50B tokens (3 runs) merged | 100B tokens (3 runs) 300B tokens (1 run) merged | | Post-training (Tulu 3 SFT OLMo mix) | SFT + DPO + PPO (preference mix) | SFT + DPO + PPO (preference mix) | Stage 1: Initial Pretraining - Dataset: OLMo-Mix-1124 (3.9T tokens) - Coverage: 90%+ of total pretraining budget - 7B Model: ~1 epoch - 13B Model: 1.2 epochs (5T tokens) Stage 2: Fine-tuning - Dataset: Dolmino-Mix-1124 (843B tokens) - Three training mixes: - 50B tokens - 100B tokens - 300B tokens - Mix composition: 50% high-quality data + academic/Q&A/instruction/math content Model Merging - 7B Model: 3 versions trained on 50B mix, merged via model souping - 13B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint Bias, Risks, and Limitations Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified. Model Card Contact For errors in this model card, contact `[email protected]`.
led-base-16384
As described in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan, led-base-16384 was initialized from bart-base since both models share the exact same architecture. To be able to process 16K tokens, bart-base's position embedding matrix was simply copied 16 times. This model is especially interesting for long-range summarization and question answering. This notebook shows how led-base-16384 can effectively be fine-tuned on a downstream task.
OLMo-2-0425-1B-Instruct
OLMo 2 1B Instruct April 2025 is post-trained variant of the allenai/OLMo-2-0425-1B-RLVR1 model, which has undergone supervised finetuning on an OLMo-specific variant of the Tülu 3 dataset, further DPO training on this dataset, and final RLVR training on this dataset. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. Check out the OLMo 2 paper or Tülu 3 paper for more details! OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs, and associated training details. - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Apache 2.0 - Finetuned from model: allenai/OLMo-2-0425-1B-RLVR1 - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo-core - Evaluation code: https://github.com/allenai/olmes - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2501.00656 - Demo: https://playground.allenai.org/ OLMo 2 1B is supported in transformers v4.48 or higher: If using vLLM, you will need to install from the main branch until v0.7.4 is released. Please To load the model with HuggingFace, use the following snippet: NOTE: This is different than previous OLMo 2 and Tülu 3 models due to a minor change in configuration. It does NOT have the bos token before the rest. Our other models have at the beginning of the chat template. It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. To facilitate research on RL finetuning, we have released our intermediate checkpoints during the model's RLVR training. The model weights are saved every 20 training steps, and can be accessible in the revisions of the HuggingFace repository. For example, you can load with: The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). | Model | Average | AlpacaEval 2 LC | BBH | DROP | GSM8K | IFEval | MATH | MMLU | Safety | PopQA | TruthQA | |-------|---------|-----------------|-----|------|-------|--------|------|------|--------|-------|---------| | OLMo 1B 0724 | 24.4 | 2.4 | 29.9 | 27.9 | 10.8 | 25.3 | 2.2 | 36.6 | 52.0 | 12.1 | 44.3 | | SmolLM2 1.7B | 34.2 | 5.8 | 39.8 | 30.9 | 45.3 | 51.6 | 20.3 | 34.3 | 52.4 | 16.4 | 45.3 | | Gemma 3 1B | 38.3 | 20.4 | 39.4 | 25.1 | 35.0 | 60.6 | 40.3 | 38.9 | 70.2 | 9.6 | 43.8 | | Llama 3.1 1B | 39.3 | 10.1 | 40.2 | 32.2 | 45.4 | 54.0 | 21.6 | 46.7 | 87.2 | 13.8 | 41.5 | | Qwen 2.5 1.5B | 41.7 | 7.4 | 45.8 | 13.4 | 66.2 | 44.2 | 40.6 | 59.7 | 77.6 | 15.5 | 46.5 | | --- | | | | | | | | | | | | | OLMo 2 1B SFT | 36.9 | 2.4 | 32.8 | 33.8 | 52.1 | 50.5 | 13.2 | 36.4 | 93.2 | 12.7 | 42.1 | | OLMo 2 1B DPO | 40.6 | 9.5 | 33.0 | 34.5 | 59.0 | 67.1 | 14.1 | 39.9 | 89.9 | 12.3 | 46.4 | | OLMo 2 1B | 42.7 | 9.1 | 35.0 | 34.6 | 68.3 | 70.1 | 20.7 | 40.0 | 87.6 | 12.9 | 48.7 | OLMo 2 is licensed under the Apache 2.0 license. OLMo 2 is intended for research and educational use. For more information, please see our Responsible Use Guidelines.
Llama-3.1-Tulu-3-8B-SFT
Tülu3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques. Tülu3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Llama 3.1 Community License Agreement - Finetuned from model: meta-llama/Llama-3.1-8B - Training Repository: https://github.com/allenai/open-instruct - Eval Repository: https://github.com/allenai/olmes - Paper: https://arxiv.org/abs/2411.15124 - Demo: https://playground.allenai.org/ | Stage | Llama 3.1 8B | Llama 3.1 70B | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | Base Model | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-70B | | SFT | allenai/Llama-3.1-Tulu-3-8B-SFT | allenai/Llama-3.1-Tulu-3-70B-SFT | | DPO | allenai/Llama-3.1-Tulu-3-8B-DPO | allenai/Llama-3.1-Tulu-3-70B-DPO | | Final Models (RLVR) | allenai/Llama-3.1-Tulu-3-8B | allenai/Llama-3.1-Tulu-3-70B | | Reward Model (RM)| allenai/Llama-3.1-Tulu-3-8B-RM | (Same as 8B) | | Stage | Llama 3.1 405B | |-----------|-------------------| | Base Model | meta-llama/llama-3.1-405B | | SFT | allenai/llama-3.1-Tulu-3-405B-SFT | | DPO | allenai/llama-3.1-Tulu-3-405B-DPO | | Final Model (RLVR) | allenai/llama-3.1-Tulu-3-405B | | Reward Model (RM)| (Same as 8B) To load the model with HuggingFace, use the following snippet: As a Llama base model, the model can be easily served with: Note that given the long chat template of Llama, you may want to use `--maxmodellen=8192`. It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. The Tülu3 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base Llama 3.1 models, however it is likely to have included a mix of Web data and technical sources like books and code. See the Falcon 180B model card for an example of this. | Benchmark (eval) | Tülu 3 SFT 8B | Tülu 3 DPO 8B | Tülu 3 8B | Llama 3.1 8B Instruct | Qwen 2.5 7B Instruct | Magpie 8B | Gemma 2 9B Instruct | Ministral 8B Instruct | |---------------------------------|----------------|----------------|------------|------------------------|----------------------|-----------|---------------------|-----------------------| | Avg. | 60.4 | 64.4 | 64.8 | 62.2 | 57.8 | 44.7 | 55.2 | 58.3 | | MMLU (0 shot, CoT) | 65.9 | 68.7 | 68.2 | 71.2 | 76.6 | 62.0 | 74.6 | 68.5 | | PopQA (15 shot) | 29.3 | 29.3 | 29.1 | 20.2 | 18.1 | 22.5 | 28.3 | 20.2 | | TruthfulQA (6 shot) | 46.8 | 56.1 | 55.0 | 55.1 | 63.1 | 57.0 | 61.4 | 55.5 | | BigBenchHard (3 shot, CoT) | 67.9 | 65.8 | 66.0 | 62.8 | 21.7 | 0.9 | 2.5 | 56.2 | | DROP (3 shot) | 61.3 | 62.5 | 62.6 | 61.5 | 54.4 | 49.4 | 58.8 | 56.2 | | MATH (4 shot CoT, Flex) | 31.5 | 42.0 | 43.7 | 42.5 | 14.8 | 5.1 | 29.8 | 40.0 | | GSM8K (8 shot, CoT) | 76.2 | 84.3 | 87.6 | 83.4 | 83.8 | 61.2 | 79.7 | 80.0 | | HumanEval (pass@10) | 86.2 | 83.9 | 83.9 | 86.3 | 93.1 | 75.4 | 71.7 | 91.0 | | HumanEval+ (pass@10) | 81.4 | 78.6 | 79.2 | 82.9 | 89.7 | 69.1 | 67.0 | 88.5 | | IFEval (prompt loose) | 72.8 | 81.1 | 82.4 | 80.6 | 74.7 | 38.8 | 69.9 | 56.4 | | AlpacaEval 2 (LC % win) | 12.4 | 33.5 | 34.5 | 24.2 | 29.0 | 49.0 | 43.7 | 31.4 | | Safety (6 task avg.) | 93.1 | 87.2 | 85.5 | 75.2 | 75.0 | 46.4 | 75.5 | 56.2 | | Benchmark (eval) | Tülu 3 70B SFT | Tülu 3 DPO 70B | Tülu 3 70B | Llama 3.1 70B Instruct | Qwen 2.5 72B Instruct | Hermes 3 Llama 3.1 70B | Nemotron Llama 3.1 70B | |---------------------------------|-----------------|-----------------|-------------|-------------------------|-----------------------|------------------------|-------------------------| | Avg. | 72.6 | 75.9 | 76.0 | 73.4 | 71.5 | 68.3 | 65.5 | | MMLU (0 shot, CoT) | 78.9 | 83.3 | 83.1 | 85.3 | 85.5 | 80.4 | 83.8 | | PopQA (15 shot) | 48.6 | 46.3 | 46.5 | 46.4 | 30.6 | 48.1 | 36.4 | | TruthfulQA (6 shot) | 55.7 | 67.9 | 67.6 | 66.8 | 69.9 | 66.5 | 62.6 | | BigBenchHard (3 shot, CoT) | 82.7 | 81.8 | 82.0 | 73.8 | 67.2 | 82.1 | 0.7 | | DROP (3 shot) | 77.2 | 74.1 | 74.3 | 77.0 | 34.2 | 73.2 | 68.8 | | MATH (4 shot CoT, Flex) | 53.7 | 62.3 | 63.0 | 56.4 | 74.3 | 41.9 | 55.0 | | GSM8K (8 shot, CoT) | 91.1 | 93.5 | 93.5 | 93.7 | 89.5 | 90.0 | 84.7 | | HumanEval (pass@10) | 92.9 | 92.4 | 92.4 | 93.6 | 94.0 | 89.6 | 94.1 | | HumanEval+ (pass@10) | 87.3 | 88.4 | 88.0 | 89.5 | 90.8 | 85.9 | 85.5 | | IFEval (prompt loose) | 82.1 | 82.6 | 83.2 | 88.0 | 87.6 | 76.0 | 79.9 | | AlpacaEval 2 (LC % win) | 26.3 | 49.6 | 49.8 | 33.4 | 47.7 | 28.4 | 66.1 | | Safety (6 task avg.) | 94.4 | 89.0 | 88.3 | 76.5 | 87.0 | 57.9 | 69.0 | | Benchmark (eval) | Tülu 3 405B SFT | Tülu 3 405B DPO | Tülu 3 405B | Llama 3.1 405B Instruct | Nous Hermes 3 405B | Deepseek V3 | GPT 4o (11-24) | |-----------------|----------------|----------------|-------------|------------------------|-------------------|-------------|----------------| | Avg w/o Safety | 76.3 | 79.0 | 80.0 | 78.1 | 74.4 | 79.0 | 80.5 | | Avg w/ Safety | 77.5 | 79.6 | 80.7 | 79.0 | 73.5 | 75.9 | 81.6 | | MMLU (5 shot, CoT) | 84.4 | 86.6 | 87.0 | 88.0 | 84.9 | 82.1 | 87.9 | | PopQA (3 shot) | 55.7 | 55.4 | 55.5 | 52.9 | 54.2 | 44.9 | 53.6 | | BigBenchHard (0 shot, CoT) | 88.0 | 88.8 | 88.6 | 87.1 | 87.7 | 89.5 | 83.3 | | MATH (4 shot, Flex) | 63.4 | 59.9 | 67.3 | 66.6 | 58.4 | 72.5 | 68.8 | | GSM8K (8 shot, CoT) | 93.6 | 94.2 | 95.5 | 95.4 | 92.7 | 94.1 | 91.7 | | HumanEval (pass@10) | 95.7 | 97.2 | 95.9 | 95.9 | 92.3 | 94.6 | 97.0 | | HumanEval+ (pass@10) | 93.3 | 93.9 | 92.9 | 90.3 | 86.9 | 91.6 | 92.7 | | IFEval (prompt loose) | 82.4 | 85.0 | 86.0 | 88.4 | 81.9 | 88.0 | 84.8 | | AlpacaEval 2 (LC % win) | 30.4 | 49.8 | 51.4 | 38.5 | 30.2 | 53.5 | 65.0 | | Safety (6 task avg.) | 87.7 | 85.5 | 86.7 | 86.8 | 65.8 | 72.2 | 90.9 | SFT: - Learning Rate: 5E-6 (8B), 2E-6 (70B, 405B) - Effective Batch Size: 128 (8B, 70B), 256 (405B) - Max. Sequence Length: 4096 - Loss Accumulation: Sum (see https://unsloth.ai/blog/gradient) - Learning Rate Schedule: Linear - LR Warmup Ratio: 0.03 - Num. Epochs: 2 All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement. Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. Tülu3 is intended for research and educational use. For more information, please see our Responsible Use Guidelines. If Tülu3 or any of the related materials were helpful to your work, please cite:
FlexOlmo-7x7B-1T
OLMoE-1B-7B-0924-Instruct
> OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in September 2024 (0924) that has been adapted via SFT and DPO from OLMoE-1B-7B. It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source. This information and more can also be found on the OLMoE GitHub repository. - Paper: https://arxiv.org/abs/2409.02060 - Pretraining Checkpoints, Code, Data and Logs. - SFT (Supervised Fine-Tuning) Checkpoints, Code, Data and Logs. - DPO/KTO (Direct Preference Optimization/Kahneman-Tversky Optimization), Checkpoints, Preference Data, DPO code, KTO code and Logs. Install `transformers` from source until a release after this PR & `torch` and run: Branches: - `main`: Preference tuned via DPO model of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT (`main` branch) - `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT - `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924) - `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto` branches correspond to the other checkpoints mentioned in the paper. | Task (→) | MMLU | GSM8k | BBH | Human-Eval | Alpaca-Eval 1.0 | XSTest | IFEval | Avg | |---------------|------|-------|------|------------|-----------------|--------|--------|------| | Setup (→) | 0-shot | 8-shot CoT | 3-shot | 0-shot | 0-shot | 0-shot | 0-shot | | | Metric (→) | EM | EM | EM | Pass@10 | %win | F1 | Loose Acc | | | | | | | | | | | | | OLMo-1B (0724) | 25.0 | 7.0 | 22.5 | 16.0 | - | 67.6 | 20.5 | - | | +SFT | 36.0 | 12.5 | 27.2 | 21.2 | 41.5 | 81.9 | 26.1 | 35.9 | | +DPO | 36.7 | 12.5 | 30.6 | 22.0 | 50.9 | 79.8 | 24.2 | 37.4 | | OLMo-7B (0724) | 50.8 | 32.5 | 36.9 | 32.3 | - | 80.8 | 19.6 | - | | +SFT | 54.2 | 25.0 | 35.7 | 38.5 | 70.9 | 86.1 | 39.7 | 49.3 | | +DPO | 52.8 | 9.0 | 16.6 | 35.0 | 83.5 | 87.5 | 37.9 | 49.1 | | JetMoE-2B-9B | 45.6 | 43.0 | 37.2 | 54.6 | - | 68.2 | 20.0 | - | | +SFT | 46.1 | 53.5 | 35.6 | 64.8 | 69.3 | 55.6 | 30.5 | 50.4 | | DeepSeek-3B-16B | 37.7 | 18.5 | 39.4 | 48.3 | - | 65.9 | 13.5 | - | | +Chat | 48.5 | 46.5 | 40.8 | 70.1 | 74.8 | 85.6 | 32.3 | 57.0 | | Qwen1.5-3B-14B | 60.4 | 13.5 | 27.2 | 60.2 | - | 73.4 | 20.9 | - | | +Chat | 58.9 | 55.5 | 21.3 | 59.7 | 83.9 | 85.6 | 36.2 | 57.3 | | OLMoE (This Model) | 49.8 | 3.0 | 33.6 | 22.4 | - | 59.7 | 16.6 | - | | +SFT | 51.4 | 40.5 | 38.0 | 51.6 | 69.2 | 84.1 | 43.3 | 54.0 | | +DPO | 51.9 | 45.5 | 37.0 | 54.8 | 84.0 | 82.6 | 48.1 | 57.7 |
specter
SPECTER is a pre-trained language model to generate document-level embedding of documents. It is pre-trained on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. If you're coming here because you want to embed papers, SPECTER has now been superceded by SPECTER2. Use that instead. Paper: SPECTER: Document-level Representation Learning using Citation-informed Transformers Authors: Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld
OLMo-2-1124-7B-Instruct
Upon the initial release of OLMo-2 models, we realized the post-trained models did not share the pre-tokenization logic that the base models use. As a result, we have trained new post-trained models. The new models are available under the same names as the original models, but we have made the old models available with a postfix "-preview". See OLMo 2 Preview Post-trained Models for the colleciton of the legacy models. OLMo 2 7B Instruct November 2024 is post-trained variant of the OLMo-2 7B November 2024 model, which has undergone supervised finetuning on an OLMo-specific variant of the Tülu 3 dataset and further DPO training on this dataset, and finally RLVR training using this data. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. Check out the OLMo 2 paper or Tülu 3 paper for more details! OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details. The core models released in this batch include the following: | Stage | OLMo 2 7B | OLMo 2 13B | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | Base Model | allenai/OLMo2-7B-1124 | allenai/OLMo-2-13B-1124 | | SFT | allenai/OLMo-2-1124-7B-SFT | allenai/OLMo-2-1124-13B-SFT | | DPO | allenai/OLMo-2-1124-7B-DPO | allenai/OLMo-2-1124-13B-DPO | | Final Models (RLVR) | allenai/OLMo-2-1124-7B-Instruct | allenai/OLMo-2-1124-13B-Instruct | | Reward Model (RM)| allenai/OLMo-2-1124-7B-RM | allenai/OLMo-2-1124-13B-RM | - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Apache 2.0 - Finetuned from model: allenai/OLMo-2-7B-1124-DPO - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo - Evaluation code: https://github.com/allenai/olmes - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2501.00656 - Demo: https://playground.allenai.org/ OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: To load the model with HuggingFace, use the following snippet: It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). See the Falcon 180B model card for an example of this. | Model | Average | AlpacaEval | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA | |-------|---------|------------|-----|------|--------|---------|------|-------|---------|-------|---------| | Open weights models | | Gemma-2-9B-it | 51.9 | 43.7 | 2.5 | 58.8 | 79.7 | 69.9 | 29.8 | 69.1 | 75.5 | 28.3 | 61.4 | | Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 | | Mistral-Nemo-Instruct-2407 | 50.9 | 45.8 | 54.6 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 | | Qwen-2.5-7B-Instruct | 57.1 | 29.7 | 25.3 | 54.4 | 83.8 | 74.7 | 69.9 | 76.6 | 75.0 | 18.1 | 63.1 | | Llama-3.1-8B-Instruct | 58.9 | 25.8 | 69.7 | 61.7 | 83.4 | 80.6 | 42.5 | 71.3 | 70.2 | 28.4 | 55.1 | | Tülu 3 8B | 60.4 | 34.0 | 66.0 | 62.6 | 87.6 | 82.4 | 43.7 | 68.2 | 75.4 | 29.1 | 55.0 | | Qwen-2.5-14B-Instruct | 60.8 | 34.6 | 34.0 | 50.5 | 83.9 | 82.4 | 70.6 | 81.1 | 79.3 | 21.1 | 70.8 | | Fully open models | | OLMo-7B-Instruct | 28.2 | 5.2 | 35.3 | 30.7 | 14.3 | 32.2 | 2.1 | 46.3 | 54.0 | 17.1 | 44.5 | | OLMo-7B-0424-Instruct | 33.1 | 8.5 | 34.4 | 47.9 | 23.2 | 39.2 | 5.2 | 48.9 | 49.3 | 18.9 | 55.2 | | OLMoE-1B-7B-0924-Instruct | 35.5 | 8.5 | 37.2 | 34.3 | 47.2 | 46.2 | 8.4 | 51.6 | 51.6 | 20.6 | 49.1 | | MAP-Neo-7B-Instruct | 42.9 | 17.6 | 26.4 | 48.2 | 69.4 | 35.9 | 31.5 | 56.5 | 73.7 | 18.4 | 51.6 | | OLMo-2-7B-SFT | 50.2 | 10.2 | 49.7 | 59.6 | 74.6 | 66.9 | 25.3 | 61.1 | 82.1 | 23.6 | 48.6 | | OLMo-2-7B-DPO | 54.2 | 27.9 | 46.7 | 60.2 | 82.6 | 73.0 | 30.3 | 60.8 | 81.0 | 23.5 | 56.0 | | OLMo-2-13B-SFT | 55.3 | 11.5 | 59.6 | 71.3 | 76.3 | 68.6 | 29.5 | 68.0 | 82.3 | 29.4 | 57.1 | | OLMo-2-13B-DPO | 60.6 | 38.3 | 57.9 | 71.5 | 82.3 | 80.2 | 35.2 | 67.9 | 79.7 | 29.0 | 63.9 | | OLMo-2-7B-1124–Instruct | 54.8 | 29.1 | 46.6 | 60.5 | 85.1 | 72.3 | 32.5 | 61.3 | 80.6 | 23.2 | 56.5 | | OLMo-2-13B-1124-Instruct | 62.0 | 39.5 | 58.8 | 71.5 | 87.4 | 82.6 | 39.2 | 68.5 | 79.1 | 28.8 | 64.3 | OLMo 2 is licensed under the Apache 2.0 license. OLMo 2 is intended for research and educational use. For more information, please see our Responsible Use Guidelines. This model has been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: Gemma Terms of Use.
Molmo-7B-D-0924
OLMoE-1B-7B-0125
> OLMoE-1B-7B is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in January 2025 (0125) that is 100% open-source. It is an improved version of OLMoE-09-24, see the paper appendix for details. This information and more can also be found on the OLMoE GitHub repository. - Paper: arxiv.org/abs/2409.02060 - Pretraining Checkpoints, Code, Data and Logs. - SFT (Supervised Fine-Tuning) Checkpoints, Code, Data and Logs. - DPO/KTO (Direct Preference Optimization/Kahneman-Tversky Optimization), Checkpoints, Preference Data, DPO code, KTO code and Logs. Install `transformers` (version `4.45.0` or greater) & `torch` and run: You can list all revisions/branches by installing `huggingface-hub` & running: Important branches: - `step1200000-tokens5033B`: Pretraining checkpoint used for annealing. There are a few more checkpoints after this one but we did not use them. - `main`: Checkpoint annealed from `step1200000-tokens5033B` for an additional 100B tokens (23,842 steps). We use this checkpoint for our adaptation (https://huggingface.co/allenai/OLMoE-1B-7B-0125-SFT & https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct). - `fp32`: FP32 version of `main`. The model weights were stored in FP32 during training but we did not observe any performance drop from casting them to BF16 after training so we upload all weights in BF16. If you want the original FP32 checkpoint for `main` you can use this one. You will find that it yields slightly different results but should perform around the same on benchmarks. | Model | Active Params | Open Data | MMLU | HellaSwag | ARC-Chall. | ARC-Easy | PIQA | WinoGrande | |-----------------------------|---------------|-----------|------|-----------|------------|----------|------|------------| | LMs with ~1B active parameters | | | | | | | | | | OLMoE-1B-7B-0125 | 1.3B | ✅ | 56.3 | 81.7 | 67.5 | 84.4 | 78.7 | 70.6 | | OLMoE-1B-7B-0924 | 1.3B | ✅ | 54.1 | 80.0 | 62.1 | 84.2 | 79.8 | 70.2 | | DCLM-1B | 1.4B | ✅ | 48.5 | 75.1 | 57.6 | 79.5 | 76.6 | 68.1 | | TinyLlama-1B | 1.1B | ✅ | 33.6 | 60.8 | 38.1 | 69.5 | 71.7 | 60.1 | | OLMo-1B (0724) | 1.3B | ✅ | 32.1 | 67.5 | 36.4 | 53.5 | 74.0 | 62.9 | | Pythia-1B | 1.1B | ✅ | 31.1 | 48.0 | 31.4 | 63.4 | 68.9 | 52.7 | | LMs with ~2-3B active parameters | | | | | | | | | | Qwen1.5-3B-14B | 2.7B | ❌ | 62.4 | 80.0 | 77.4 | 91.6 | 81.0 | 72.3 | | Gemma2-3B | 2.6B | ❌ | 53.3 | 74.6 | 67.5 | 84.3 | 78.5 | 71.8 | | JetMoE-2B-9B | 2.2B | ❌ | 49.1 | 81.7 | 61.4 | 81.9 | 80.3 | 70.7 | | DeepSeek-3B-16B | 2.9B | ❌ | 45.5 | 80.4 | 53.4 | 82.7 | 80.1 | 73.2 | | StableLM-2B | 1.6B | ❌ | 40.4 | 70.3 | 50.6 | 75.3 | 75.6 | 65.8 | | OpenMoE-3B-9B | 2.9B | ✅ | 27.4 | 44.4 | 29.3 | 50.6 | 63.3 | 51.9 | | LMs with ~7-9B active parameters | | | | | | | | | | Gemma2-9B | 9.2B | ❌ | 70.6 | 87.3 | 89.5 | 95.5 | 86.1 | 78.8 | | Llama3.1-8B | 8.0B | ❌ | 66.9 | 81.6 | 79.5 | 91.7 | 81.1 | 76.6 | | DCLM-7B | 6.9B | ✅ | 64.4 | 82.3 | 79.8 | 92.3 | 80.1 | 77.3 | | Mistral-7B | 7.3B | ❌ | 64.0 | 83.0 | 78.6 | 90.8 | 82.8 | 77.9 | | OLMo-7B (0724) | 6.9B | ✅ | 54.9 | 80.5 | 68.0 | 85.7 | 79.3 | 73.2 | | Llama2-7B | 6.7B | ❌ | 46.2 | 78.9 | 54.2 | 84.0 | 77.5 | 71.7 |
OLMoE-1B-7B-0125-Instruct
OLMoE-1B-7B-0125-Instruct January 2025 is post-trained variant of the OLMoE-1B-7B January 2025 model, which has undergone supervised finetuning on an OLMo-specific variant of the Tülu 3 dataset and further DPO training on this dataset, and finally RLVR training using this data. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. Check out the OLMoE paper or Tülu 3 paper for more details! OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details. The core models released in this batch include the following: | Stage | OLMoE 1B-7B | |----------------------|----------------------------------------------------------------------------------------------------------| | Base Model | allenai/OLMoE-1B-7B-0125 | | SFT | allenai/OLMoE-1B-7B-0125-SFT | | DPO | allenai/OLMoE-1B-7B-0125-DPO | | Final Models (RLVR) | allenai/OLMoE-1B-7B-0125-Instruct | | Reward Model (RM)| allenai/OLMoE-1B-7B-0125-RM | - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Apache 2.0 - Finetuned from model: allenai/OLMoE-1B-7B-0125-DPO - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo - Evaluation code: https://github.com/allenai/olmes - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2409.02060 - Demo: https://playground.allenai.org/ OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: To load the model with HuggingFace, use the following snippet: It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). See the Falcon 180B model card for an example of this. | Benchmark (eval) | OLMoE-1B-7B-0125-Instruct | OLMoE-1B-7B-0924-Instruct | OLMoE-1B-7B-0125-DPO | OLMoE-1B-7B-0125-SFT | OLMoE-1B-7B-0924-SFT | |--------------------------------|---------------------------|--------------------------|----------------------|---------------------|---------------------| | Avg. | 45.62 | 38.44 | 45.05 | 41.76 | 37.05 | | MMLU (CoT) | 55.08 | 54.57 | 54.93 | 55.26 | 54.32 | | PopQA | 19.75 | 20.56 | 19.65 | 20.12 | 21.01 | | TruthfulQA | 50.56 | 49.14 | 49.99 | 45.48 | 44.66 | | BigBenchHard (CoT) | 38.61 | 36.78 | 37.37 | 37.31 | 36.55 | | DROP | 47.87 | 34.48 | 48.38 | 48.57 | 34.71 | | MATH (Flex) | 21.41 | 8.16 | 20.36 | 21.38 | 8.15 | | GSM8K | 72.40 | 47.38 | 64.59 | 55.72 | 42.46 | | HumanEval | 62.30 | 63.04 | 61.92 | 62.58 | 63.72 | | HumanEval+ | 54.37 | 58.93 | 57.61 | 55.67 | 57.40 | | IFEval | 66.36 | 45.29 | 65.62 | 56.56 | 41.22 | | AlpacaEval | 17.99 | 7.54 | 19.50 | 5.83 | 6.38 | | Safety (average) | 90.40 | 51.40 | 91.40 | 94.50 | 65.80 | OLMoE is licensed under the Apache 2.0 license. OLMoE is intended for research and educational use. For more information, please see our Responsible Use Guidelines. This model has been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: Gemma Terms of Use.
olmOCR-2-7B-1025
Full BF16 version of olmOCR-2-7B-1025-FP8. We recommend using the FP8 version for all practical purposes except further fine tuning. This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been additionally fine tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases. Quick links: - 📃 Paper - 🤗 SFT Dataset - 🤗 RL Dataset - 🛠️ Code - 🎮 Demo The best way to use this model is via the olmOCR toolkit. The toolkit comes with an efficient inference setup via VLLM that can handle millions of documents at scale. This model scores the following scores on olmOCR-bench when used with the olmOCR toolkit toolkit which automatically renders, rotates, and retries pages as needed. Model ArXiv Old Scans Math Tables Old Scans Headers and Footers Multi column Long tiny text Base Overall olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025 82.9 82.1 84.3 48.3 95.7 84.3 81.4 99.7 82.3 ± 1.1 olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025-FP8 83.0 82.3 84.9 47.7 96.1 83.7 81.9 99.7 82.4 ± 1.1 This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit. If you want to prompt this model manually instead of using the olmOCR toolkit, please see the code below. In normal usage, the olmOCR toolkit builds the prompt by rendering the PDF page, and extracting relevant text blocks and image metadata. To duplicate that you will need to This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.
OLMo-1B-hf
This model is licensed under Apache 2.0 and is associated with the dataset allenai/dolma.
wildguard
Llama-3.1-Tulu-3-8B
ivila-row-layoutlm-finetuned-s2vl-v2
Molmo-7B-O-0924
OLMo-2-1124-7B-SFT
OLMo-2-0325-32B-Instruct
OLMo 2 32B Instruct March 2025 is post-trained variant of the OLMo-2 32B March 2025 model, which has undergone supervised finetuning on an OLMo-specific variant of the Tülu 3 dataset, further DPO training on this dataset, and final RLVR training on this dataset. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. Check out the OLMo 2 paper or Tülu 3 paper for more details! OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs, and associated training details. - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Apache 2.0 - Finetuned from model: allenai/OLMo-2-0325-32B-DPO - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo-core - Evaluation code: https://github.com/allenai/olmes - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2501.00656 - Demo: https://playground.allenai.org/ OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: To load the model with HuggingFace, use the following snippet: NOTE: This is different than previous OLMo 2 and Tülu 3 models due to a minor change in configuration. It does NOT have the bos token before the rest. Our other models have at the beginning of the chat template. It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. To facilitate research on RL finetuning, we have released our intermediate checkpoints during the model's RLVR training. The model weights are saved every 20 training steps, and can be accessible in the revisions of the HuggingFace repository. For example, you can load with: The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). See the Falcon 180B model card for an example of this. | Model | Average | AlpacaEval 2 LC | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA | |-------|---------|------|-----|------|-------|--------|------|------|--------|-------|---------| | Closed API models | | | | | | | | | | | | | GPT-3.5 Turbo 0125 | 59.6 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9 | | GPT 4o Mini 2024-07-18 | 65.7 | 49.7 | 65.9 | 36.3 | 83.0 | 83.5 | 67.9 | 82.2 | 84.9 | 39.0 | 64.8 | | Open weights models | | | | | | | | | | | | | Mistral-Nemo-Instruct-2407 | 50.9 | 45.8 | 54.6 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 | | Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 | | Gemma-2-27b-it | 61.3 | 49.0 | 72.7 | 67.5 | 80.7 | 63.2 | 35.1 | 70.7 | 75.9 | 33.9 | 64.6 | | Qwen2.5-32B | 66.5 | 39.1 | 82.3 | 48.3 | 87.5 | 82.4 | 77.9 | 84.7 | 82.4 | 26.1 | 70.6 | | Mistral-Small-24B | 67.6 | 43.2 | 80.1 | 78.5 | 87.2 | 77.3 | 65.9 | 83.7 | 66.5 | 24.4 | 68.1 | | Llama-3.1-70B | 70.0 | 32.9 | 83.0 | 77.0 | 94.5 | 88.0 | 56.2 | 85.2 | 76.4 | 46.5 | 66.8 | | Llama-3.3-70B | 73.0 | 36.5 | 85.8 | 78.0 | 93.6 | 90.8 | 71.8 | 85.9 | 70.4 | 48.2 | 66.1 | | Gemma-3-27b-it | - | 63.4 | 83.7 | 69.2 | 91.1 | - | - | 81.8 | - | 30.9 | - | | Fully open models | | | | | | | | | | | | | OLMo-2-7B-1124-Instruct | 55.7 | 31.0 | 48.5 | 58.9 | 85.2 | 75.6 | 31.3 | 63.9 | 81.2 | 24.6 | 56.3 | | OLMo-2-13B-1124-Instruct | 61.4 | 37.5 | 58.4 | 72.1 | 87.4 | 80.4 | 39.7 | 68.6 | 77.5 | 28.8 | 63.9 | | OLMo-2-32B-0325-SFT | 61.7 | 16.9 | 69.7 | 77.2 | 78.4 | 72.4 | 35.9 | 76.1 | 93.8 | 35.4 | 61.3 | | OLMo-2-32B-0325-DPO | 68.8 | 44.1 | 70.2 | 77.5 | 85.7 | 83.8 | 46.8 | 78.0 | 91.9 | 36.4 | 73.5 | | OLMo-2-32B-0325-Instruct | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 | Below is the training curves for `allenai/OLMo-2-0325-32B-Instruct`. The model was trained using 5 8xH100 nodes. Below are the core eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct` (note we took step `320` as the final checkpoint, corresponding to episode `573,440`): Below are the other eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct`: The command below is copied directly from the tracked training job: OLMo 2 is licensed under the Apache 2.0 license. OLMo 2 is intended for research and educational use. For more information, please see our Responsible Use Guidelines. This model has been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: Gemma Terms of Use.
OLMoE-1B-7B-0924
This model is licensed under the Apache 2.0 license and is designed for the English language.
olmOCR-7B-0725-FP8
scibert_scivocab_cased
olmOCR-7B-0225-preview
This is a preview release of the olmOCR model that's fine tuned from Qwen2-VL-7B-Instruct using the olmOCR-mix-0225 dataset. Quick links: - 📃 Paper - 🤗 Dataset - 🛠️ Code - 🎮 Demo The best way to use this model is via the olmOCR toolkit. The toolkit comes with an efficient inference setup via sglang that can handle millions of documents at scale. This model expects as input a single document image, rendered such that the longest dimension is 1024 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit. If you want to prompt this model manually instead of using the olmOCR toolkit, please see the code below. In normal usage, the olmOCR toolkit builds the prompt by rendering the PDF page, and extracting relevant text blocks and image metadata. To duplicate that you will need to olmOCR is licensed under the Apache 2.0 license. olmOCR is intended for research and educational use. For more information, please see our Responsible Use Guidelines.
tulu-2-dpo-7b
led-large-16384
specter2_aug2023refresh_base
tk-instruct-11b-def
tulu-2-7b
olmOCR-7B-0825
This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-0225 dataset. Quick links: - 📃 Paper - 🤗 Dataset - 🛠️ Code - 🎮 Demo The best way to use ...
OLMo-2-1124-13B
OLMo-2-0425-1B-DPO
Olmo-3-7B-Instruct
Olmo-3-7B-Think
MolmoAct-7B-D-LIBERO-Spatial-0812
MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family here. Learn more about MolmoAct in our announcement blog post or the paper. MolmoAct 7B-D LIBERO-Spatial is based on Qwen2.5-7B and uses SigLip2 as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's Pre-training Mixture, then mid-trained on MolmoAct Dataset, and finally post-trained on LIBERO-Spatial. This model is intended to be used for replicating our results on LIBERO-Spatial. This checkpoint is a preview of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. Quick links: - 📂 All Models - 📂 All Data - 📄 Paper - 💻 Code - 🎥 Blog Post - 🎥 Video This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines. MolmoAct offers the ability to inspect a visual trace of its intended actions in space before they occur, allowing users to ensure safe behavior by proactively auditing and adjusting the actions of any hardware acting under the model’s instructions. MolmoAct’s action space is bounded within the data provided, and compliance is built into the model to prevent excessive force when resistance is detected. Please follow the hardware manufacturer’s guidelines when using this model with a robot and perform all operations in a safely configured environment.
OLMo-2-0325-32B
OLMo-2-0325-32B-Instruct-GGUF
Llama-3.1-Tulu-3-8B-DPO
License: llama3.1 Language: en
MolmoAct-7B-D-LIBERO-Object-0812
MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family here. Learn more about MolmoAct in our announcement blog post or the paper. MolmoAct 7B-D LIBERO-Object is based on Qwen2.5-7B and uses SigLip2 as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's Pre-training Mixture, then mid-trained on MolmoAct Dataset, and finally post-trained on LIBERO-Long. This model is intended to be used for replicating our results on LIBERO-Object. This checkpoint is a preview of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. Quick links: - 📂 All Models - 📂 All Data - 📄 Paper - 💻 Code - 🎥 Blog Post - 🎥 Video This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines. MolmoAct offers the ability to inspect a visual trace of its intended actions in space before they occur, allowing users to ensure safe behavior by proactively auditing and adjusting the actions of any hardware acting under the model’s instructions. MolmoAct’s action space is bounded within the data provided, and compliance is built into the model to prevent excessive force when resistance is detected. Please follow the hardware manufacturer’s guidelines when using this model with a robot and perform all operations in a safely configured environment.
Olmo-3-1125-32B
longformer-large-4096
OLMo-7B
OLMo-2-0425-1B-SFT
Olmo-3-7B-Think-DPO
Olmo-3-1025-7B
MolmoAct-7B-D-Pretrain-RT-1-0812
OLMo-1B
Flex-reddit-2x7B-1T
specter2
OLMo-2-1124-13B-Instruct
olmOCR-7B-0725
Olmo-3-7B-Instruct-SFT
MolmoAct-7B-D-LIBERO-Goal-0812
MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family here. Learn more about MolmoAct in our announcement blog post or the paper. MolmoAct 7B-D LIBERO-Goal is based on Qwen2.5-7B and uses SigLip2 as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's Pre-training Mixture, then mid-trained on MolmoAct Dataset, and finally post-trained on LIBERO-Long. This model is intended to be used for replicating our results on LIBERO-Goal. This checkpoint is a preview of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. Quick links: - 📂 All Models - 📂 All Data - 📄 Paper - 💻 Code - 🎥 Blog Post - 🎥 Video This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines. MolmoAct offers the ability to inspect a visual trace of its intended actions in space before they occur, allowing users to ensure safe behavior by proactively auditing and adjusting the actions of any hardware acting under the model’s instructions. MolmoAct’s action space is bounded within the data provided, and compliance is built into the model to prevent excessive force when resistance is detected. Please follow the hardware manufacturer’s guidelines when using this model with a robot and perform all operations in a safely configured environment.
MolmoAct-7B-D-Pretrain-0812
Olmo-3-32B-Think
OLMoE-1B-7B-0924-Instruct-GGUF
OLMo-7B-hf
Language model for English with Apache 2.0 license.
tulu-2-dpo-70b
tulu-2-dpo-13b
Olmo-3-7B-Think-SFT
OLMo-2-1124-7B-DPO
OLMo-7B-0724-Instruct-hf
MolmoE-1B-0924
truthfulqa-truth-judge-llama2-7B
This model is built based on LLaMa2 7B in replacement of the truthfulness/informativeness judge models that were originally introduced in the TruthfulQA paper. That model is based on OpenAI's Curie engine using their finetuning API. However, as of February 08, 2024, OpenAI has taken down its Curie engine, and thus, we cannot use it for TruthfulQA evaluation anymore. So, we decided to train the judge models using an open model (i.e., LLaMa), which can make the evaluation more accessible and reproducible. We released two models for the truthfulness and informativeness evaluation, respectively. The training code and validation results of these models can be found here These models are only intended for the TruthfulQA evaluation. They are intended to generalize to the evaluation of new models on the fixed set of prompts, but they may fail to generalize to new prompts. You can try the model using the following scripts:
Llama-3.1-8B-Instruct-RM-RB2
OLMo-2-0425-1B-RLVR1
OLMo-2-1124-13B-DPO
Olmo-3-7B-Instruct-DPO
OLMo-7B-0724-hf
led-large-16384-arxiv
Molmo-72B-0924
specter2_aug2023refresh
open-instruct-human-mix-65b
open-instruct-pythia-6.9b-tulu
This model is a 6.9B Pythia model finetuned on a mixture of instruction datasets (FLAN V2, CoT, Dolly, Open Assistant 1, GPT4-Alpaca, Code-Alpaca, and ShareGPT). This was trained as part of the paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources. The codebase used to train and evaluate this model can be found at https://github.com/allenai/open-instruct. This model is licensed under the AI model license given in LICENSE.txt, with the original model license at pythialicense.txt. Usage Simply download and use - this model is not a diff, unlike the other open-instruct models. The model is trained to use the following format (note the newlines): For best results, format all inputs in this manner. Make sure to include a newline after ` `, this can affect generation quality quite a bit. Here is the performance of this model across benchmarks explored in our paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources: | MMLU 0-shot | MMLU 5-shot | GSM Direct | GSM CoT | BBH Direct | BBH CoT | TydiQA Gold-Passage | TydiQA Closed-book | Codex-Eval Pass@1 | Codex-Eval Pass@10 | AlpacaFarm vs Davinci-003 | Average | |:-----------:|:-----------:|:----------:|:-------:|:----------:|:-------:|:-------------------:|:------------------:|:-----------------:|:------------------:|:-------------------------:|---------| | 34.1 | 34.6 | 3.5 | 15.5 | 31.3 | 27.8 | 33.4 | 3.8 | 14.3 | 21.4 | 9.2 | 19.8 | If you use this model, please cite our work, the Pythia paper, and the original datasets:
unifiedqa-t5-base
unifiedqa-t5-large
digital-socrates-7b
olmOCR-7B-0225-preview-GGUF
digital-socrates-13b
MolmoAct 7B D 0812
Llama-3.1-Tulu-3-70B
License: llama3.1 Language: en
Llama-3.1-Tulu-3.1-8B
OLMoE-1B-7B-0125-Instruct-GGUF
OLMo-2-1124-7B-Instruct-GGUF
unifiedqa-v2-t5-base-1363200
OLMo-2-0325-32B-SFT
OLMo-2-0425-1B-Instruct-GGUF
OLMo-2-0325-32B-DPO
Olmo-3-32B-Think-DPO
OLMo-7B-0424-hf
Olmo-3-32B-Think-SFT
OLMoE-1B-7B-0924-GGUF
OLMo-7B-Instruct-hf
This model is licensed under Apache 2.0 and is associated with the dataset allenai/dolma.
OLMo-7B-Twin-2T-hf
Llama-3.1-Tulu-3-8B-RM
License: llama3.1 Language: en
GraspMolmo
olmOCR-7B-0225-preview-FP8
DataDecide-c4-150M
Olmo-3-7B-RL-Zero-Math
Olmo-3-7B-RLZero-Math
OLMo-2-1124-13B-GGUF
OLMo-2-1124-13B-RM
OLMo-2-0425-1B-early-training
tulu-2-13b
OLMo-2-1124-13B-Instruct-GGUF
dsp_roberta_base_dapt_biomed_tapt_rct_500
hvila-block-layoutlm-finetuned-grotoap2
truthfulqa-info-judge-llama2-7B
DataDecide-fineweb-edu-20M
OLMo-7B-0424
PRIMERA
ACE2-ERA5
Ai2 Climate Emulator (ACE) is a family of models designed to simulate atmospheric variability from the time scale of days to centuries. Disclaimer: ACE models are research tools and should not be used for operational climate predictions. ACE2-ERA5 is trained on the ERA5 dataset and is described in ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses. As part of that paper, the repository containing training and evaluation scripts and configuration files used for this model is located here. 1. Download this repository. Optionally, you can just download a subset of the `forcingdata` and `initialconditions` for the period you are interested in. 2. Update paths in the `inferenceconfig.yaml`. Specifically, update `experimentdir`, `checkpointpath`, `initialcondition.path` and `forcingloader.dataset.path`. 3. Install code dependencies with `pip install fme`. 4. Run inference with `python -m fme.ace.inference inferenceconfig.yaml`. Briefly, the strengths of ACE2-ERA5 are: - accurate atmospheric warming response to combined increase of sea surface temperature and CO2 over last 80 years - highly accurate atmospheric response to El Niño sea surface temperature variability - good representation of the geographic distribution of tropical cyclones - accurate Madden Julian Oscillation variability - realistic stratospheric polar vortex strength and variability - exact conservation of global dry air mass and moisture Some known weaknesses are: - the individual sensitivities to changing sea surface temperature and CO2 are not entirely realistic - the medium-range (3-10 day) weather forecast skill is not state of the art - not expected to generalize accurately for large perturbations of certain inputs (e.g. doubling of CO2)
unifiedqa-v2-t5-3b-1363200
longformer-large-4096-finetuned-triviaqa
OLMo-2-1124-13B-Instruct-preview
OlmoEarth-v1-Base
Olmo-3-7B-RL-Zero-Mix
Olmo-3-7B-RLZero-Mix
unifiedqa-v2-t5-large-1363200
OLMo-7B-0724-SFT-hf
cs_roberta_base
MolmoAct-7B-D-LIBERO-Long-0812
MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family here. Learn more about MolmoAct in our announcement blog post or the paper. MolmoAct 7B-D LIBERO-Long is based on Qwen2.5-7B and uses SigLip2 as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's Pre-training Mixture, then mid-trained on MolmoAct Dataset, and finally post-trained on LIBERO-Long. This model is intended to be used for replicating our results on LIBERO-Long. This checkpoint is a preview of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. Quick links: - 📂 All Models - 📂 All Data - 📄 Paper - 💻 Code - 🎥 Blog Post - 🎥 Video This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines. MolmoAct offers the ability to inspect a visual trace of its intended actions in space before they occur, allowing users to ensure safe behavior by proactively auditing and adjusting the actions of any hardware acting under the model’s instructions. MolmoAct’s action space is bounded within the data provided, and compliance is built into the model to prevent excessive force when resistance is detected. Please follow the hardware manufacturer’s guidelines when using this model with a robot and perform all operations in a safely configured environment.
MolmoAct-7B-O-0812
OLMo-2-1124-13B-SFT
OLMo-2-1124-7B-RM
wmt19-de-en-6-6-big
Llama 3.1 Tulu 3 405B
OLMoE-1B-7B-0125-GGUF
Llama-3.1-Tulu-3-70B-SFT
Language model with capabilities in English. License: llama3.1.
open-instruct-stanford-alpaca-7b
ACE2-ERA5-training-artifacts
OLMo-2-0425-1B-GGUF
FlexOlmo-7x7B-1T-RT
tk-instruct-base-def-pos
Llama-3.1-Tulu-3-70B-DPO
License: llama3.1 Language: en