allenai

✓ VerifiedResearch Lab

Allen Institute for AI, nonprofit AI research

500 models • 29 total models in database
Sort by:

longformer-base-4096

--- language: en license: apache-2.0 ---

license:apache-2.0
1,329,038
215

OLMo-2-0425-1B

--- license: apache-2.0 language: - en library_name: transformers ---

NaNK
license:apache-2.0
1,308,123
65

unifiedqa-t5-small

--- language: en ---

801,019
5

scibert_scivocab_uncased

--- language: en ---

273,876
161

specter2_base

--- license: apache-2.0 datasets: - allenai/scirepeval language: - en ---

license:apache-2.0
193,888
41

olmOCR-7B-0825-FP8

--- language: - en license: apache-2.0 datasets: - allenai/olmOCR-mix-0225 base_model: - Qwen/Qwen2.5-VL-7B-Instruct library_name: transformers new_version: allenai/olmOCR-2-7B-1025-FP8 ---

NaNK
license:apache-2.0
113,989
9

olmOCR-2-7B-1025-FP8

Quantized to FP8 Version of olmOCR-2-7B-1025, using llmcompressor. This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been additionally fine tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases. Quick links: - 📃 Paper - 🤗 SFT Dataset - 🤗 RL Dataset - 🛠️ Code - 🎮 Demo The best way to use this model is via the olmOCR toolkit. The toolkit comes with an efficient inference setup via VLLM that can handle millions of documents at scale. This model scores the following scores on olmOCR-bench when used with the olmOCR toolkit toolkit which automatically renders, rotates, and retries pages as needed. Model ArXiv Old Scans Math Tables Old Scans Headers and Footers Multi column Long tiny text Base Overall olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025 82.9 82.1 84.3 48.3 95.7 84.3 81.4 99.7 82.3 ± 1.1 olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025-FP8 83.0 82.3 84.9 47.7 96.1 83.7 81.9 99.7 82.4 ± 1.1 This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit. If you want to prompt this model manually instead of using the olmOCR toolkit, please see the code below. In normal usage, the olmOCR toolkit builds the prompt by rendering the PDF page, and extracting relevant text blocks and image metadata. To duplicate that you will need to This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.

NaNK
license:apache-2.0
113,417
135

OLMo-1B-0724-hf

OLMo 1B July 2024 is the latest version of the original OLMo 1B model rocking a 4.4 point increase in HellaSwag, among other evaluations improvements, from an improved version of the Dolma dataset and staged training. This version is for direct use with HuggingFace Transformers from v4.40 on. OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset. We release all code, checkpoints, logs, and details involved in training these models. The core models released in this batch are the following: | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length | |------|--------|---------|-------------|-----------------|----------------| | OLMo 1B July 2024 | 3.05 Trillion | 16 | 2048 | 16 | 4096 | | OLMo 7B July 2024 | 2.75 Trillion | 32 | 4096 | 32 | 4096 | [Coming soon] We are releasing many checkpoints for these models, for every 1000 training steps. The naming convention is `stepXXX-tokensYYYB`. To load a specific model revision with HuggingFace, simply add the argument `revision`: All revisions/branches are listed in the file `revisions.txt`. Or, you can access all the revisions for the models via the following code snippet: - Developed by: Allen Institute for AI (AI2) - Supported by: Databricks, Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University, AMD, CSC (Lumi Supercomputer), UW - Model type: a Transformer style autoregressive language model. - Language(s) (NLP): English - License: The code and model are released under Apache 2.0. - Contact: Technical inquiries: `olmo at allenai dot org`. Press: `press at allenai dot org` - Date cutoff: Oct. 2023, with most data from Feb./March 2023 based on Dolma dataset version. - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo - Evaluation code: https://github.com/allenai/OLMo-Eval - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: Link Install Transformers. Then proceed as usual with HuggingFace: Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.frompretrained("allenai/OLMo-1B-0724-hf", torchdtype=torch.float16, loadin8bit=True)` (requires `bitsandbytes`). The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.inputids.to('cuda')` to avoid potential issues. Fine-tuning Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available. 1. Fine-tune with the OLMo repository: 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are here. Core model results for the new and original 7B model are found below. | Task | Llama-7b | Llama2-7b | Falcon-7b | Mpt-7b | OLMo-7B | Llama2-13b | OLMo 7B 0424 | |-------------------|----------|-----------|-----------|--------|---------|------------|-------------| | arcc | 44.5 | 48.5 | 47.5 | 46.5 | 48.5 | 52.8 | 42.5 | | arce | 67.9 | 69.5 | 70.4 | 70.5 | 65.4 | 73.7 | 67.2 | | boolq | 75.4 | 80.2 | 74.6 | 74.2 | 73.4 | 82.2 | 83.7 | | copa | 91.0 | 86.0 | 86.0 | 85.0 | 90.0 | 90.0 | 86.0 | | hellaswag | 76.2 | 76.8 | 75.9 | 77.6 | 76.4 | 78.6 | 75.5 | | openbookqa | 51.2 | 48.4 | 53.0 | 48.6 | 50.4 | 51.8 | 50.0 | | piqa | 77.2 | 76.7 | 78.5 | 77.3 | 78.4 | 79.0 | 77.5 | | sciq | 93.9 | 94.5 | 93.9 | 93.7 | 93.8 | 95.5 | 96.7 | | winogrande | 70.5 | 69.4 | 68.9 | 69.9 | 67.9 | 73.5 | 69.8 | | truthfulQA (MC2) | 33.9 | 38.5 | 34.0 | 33.0 | 36.0 | 36.8 | 35.8 | | MMLU (5 shot MC) | 31.5 | 45.0 | 24.0 | 30.8 | 28.3 | 55.5 | 52.0 | | GSM8k | 10.0 | 12.0 | 4.0 | 4.5 | 8.5 | 25.0 | 29.0 | | Full average | 60.3 | 62.1 | 59.2 | 59.3 | 59.8 | 66.2 | 63.8 | | task | random | StableLM 2 1.6b\ | Pythia 1B | TinyLlama 1.1B | OLMo 1B | OLMo 1B 0724 (ours) | | ------------- | ------ | ----------------- | --------- | -------------------------------------- | ------- | ---- | | arcchallenge | 25 | 43.8 | 33.1 | 34.8 | 34.5 | 36.5 | | arceasy | 25 | 63.7 | 50.2 | 53.2 | 58.1 | 55.3 | | boolq | 50 | 76.6 | 61.8 | 64.6 | 60.7 | 67.5 | | copa | 50 | 84.0 | 72.0 | 78.0 | 79.0 | 83.0 | | hellaswag | 25 | 68.2 | 44.7 | 58.7 | 62.5 | 66.9 | | openbookqa | 25 | 45.8 | 37.8 | 43.6 | 46.4 | 46.4 | | piqa | 50 | 74.0 | 69.1 | 71.1 | 73.7 | 74.9 | | sciq | 25 | 94.7 | 86.0 | 90.5 | 88.1 | 93.4 | | winogrande | 50 | 64.9 | 53.3 | 58.9 | 58.9 | 61.4 | | Average | 36.1 | 68.4 | 56.4 | 61.5 | 62.4 | 65.0 | \Unlike OLMo, Pythia, and TinyLlama, StabilityAI has not disclosed yet the data StableLM was trained on, making comparisons with other efforts challenging. Data For training data details, please see the Dolma documentation. This model uses the new 1.7 version with more data sources, better deduplication, and quality filtering. During the annealing phase we use a higher quality subset of Dolma with a linearly decaying learning rate to 0. In contrast to the first OLMo, we trained OLMo 7B 0424 with a two-stage curriculum: In the first stage, we trained the model from scratch on the Dolma 1.7 dataset. We set a cosine learning rate schedule with a warmup of 2500 steps, a peak learning rate of 3e-4, and a cosine decay to 3e-5 after 3T tokens. We cut off this stage after 2T tokens, when the learning rate is still high. At this point we switch to the second stage, in which we train on a higher-quality subset of Dolma 1.7 (see below) for another 50B tokens, while linearly decaying the learning rate to 0. Our high-quality subset includes (1) using all available Wikipedia, OpenWebMath and Flan data, (2) removing Dolma CC, CC News, and Megawika, and (3) rebalancing remaining sources to achieve approximately equal proportions of each. See exact token counts and relative proportions of this second stage mix below. Both stages contribute equally to the final performance of the OLMo model. After the first stage, OLMo 7B 0424 already outperforms the older OLMo. The second stage consistently adds 2 to 3 points of performance on top. OLMo 7B architecture with peer models for comparison. | | OLMo 7B | Llama 2 7B | OpenLM 7B | Falcon 7B | PaLM 8B | |------------------------|-------------------|---------------------|--------------------|--------------------|------------------| | dmodel | 4096 | 4096 | 4096 | 4544 | 4096 | | num heads | 32 | 32 | 32 | 71 | 16 | | num layers | 32 | 32 | 32 | 32 | 32 | | MLP ratio | ~8/3 | ~8/3 | ~8/3 | 4 | 4 | | LayerNorm type | non-parametric LN | RMSNorm | parametric LN | parametric LN | parametric LN | | pos embeddings | RoPE | RoPE | RoPE | RoPE | RoPE | | attention variant | full | GQA | full | MQA | MQA | | biases | none | none | in LN only | in LN only | none | | block type | sequential | sequential | sequential | parallel | parallel | | activation | SwiGLU | SwiGLU | SwiGLU | GeLU | SwiGLU | | sequence length | 2048 | 4096 | 2048 | 2048 | 2048 | | batch size (instances) | 2160 | 1024 | 2048 | 2304 | 512 | | batch size (tokens) | ~4M | ~4M | ~4M | ~4M | ~1M | | weight tying | no | no | no | no | yes | | Size | Peak LR | Betas | Epsilon | Weight Decay | |------|------------|-----------------|-------------|--------------| | 1B | 4.0E-4 | (0.9, 0.95) | 1.0E-5 | 0.1 | | 7B | 3.0E-4 | (0.9, 0.99) | 1.0E-5 | 0.1 | | | OLMo 7B | Llama 2 7B | OpenLM 7B | Falcon 7B | |-----------------------|------------------|---------------------|--------------------|--------------------| | warmup steps | 5000 | 2000 | 2000 | 1000 | | peak LR | 3.0E-04 | 3.0E-04 | 3.0E-04 | 6.0E-04 | | minimum LR | 3.0E-05 | 3.0E-05 | 3.0E-05 | 1.2E-05 | | weight decay | 0.1 | 0.1 | 0.1 | 0.1 | | beta1 | 0.9 | 0.9 | 0.9 | 0.99 | | beta2 | 0.95 | 0.95 | 0.95 | 0.999 | | epsilon | 1.0E-05 | 1.0E-05 | 1.0E-05 | 1.0E-05 | | LR schedule | linear | cosine | cosine | cosine | | gradient clipping | global 1.0 | global 1.0 | global 1.0 | global 1.0 | | gradient reduce dtype | FP32 | FP32 | FP32 | BF16 | | optimizer state dtype | FP32 | most likely FP32 | FP32 | FP32 | OLMo 7B variants were either trained on MI250X GPUs at the LUMI supercomputer, or A100-40GB GPUs provided by MosaicML. A summary of the environmental impact. Further details are available in the paper. | | GPU Type | Power Consumption From GPUs | Carbon Intensity (kg CO₂e/KWh) | Carbon Emissions (tCO₂eq) | |-----------|------------|-----------------------------|--------------------------------|---------------------------| | OLMo 7B Twin | MI250X (LUMI supercomputer) | 135 MWh | 0 | 0 | | OLMo 7B | A100-40GB (MosaicML) | 104 MWh | 0.656 | 75.05 | Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content. Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology. Otherwise, many facts from OLMo or any LLM will often not be true, so they should be checked. Groeneveld, D., Beltagy, I., Walsh, P., Bhagia, A., Kinney, R., Tafjord, O., Jha, A., Ivison, H., Magnusson, I., Wang, Y., Arora, S., Atkinson, D., Authur, R., Chandu, K., Cohan, A., Dumas, J., Elazar, Y., Gu, Y., Hessel, J., Khot, T., Merrill, W., Morrison, J., Muennighoff, N., Naik, A., Nam, C., Peters, M., Pyatkin, V., Ravichander, A., Schwenk, D., Shah, S., Smith, W., Subramani, N., Wortsman, M., Dasigi, P., Lambert, N., Richardson, K., Dodge, J., Lo, K., Soldaini, L., Smith, N., & Hajishirzi, H. (2024). OLMo: Accelerating the Science of Language Models. Preprint. For errors in this model card, contact Nathan, `{nathanl} at allenai dot org`.

NaNK
license:apache-2.0
91,121
22

biomed_roberta_base

BioMed-RoBERTa-base is a language model based on the RoBERTa-base (Liu et. al, 2019) architecture. We adapt RoBERTa-base to 2.68 million scientific papers from the Semantic Scholar corpus via continued pretraining. This amounts to 7.55B tokens and 47GB of data. We use the full text of the papers in training, not just abstracts. Specific details of the adaptive pretraining procedure can be found in Gururangan et. al, 2020. BioMed-RoBERTa achieves competitive performance to state of the art models on a number of NLP tasks in the biomedical domain (numbers are mean (standard deviation) over 3+ random seeds) | Task | Task Type | RoBERTa-base | BioMed-RoBERTa-base | |--------------|---------------------|--------------|---------------------| | RCT-180K | Text Classification | 86.4 (0.3) | 86.9 (0.2) | | ChemProt | Relation Extraction | 81.1 (1.1) | 83.0 (0.7) | | JNLPBA | NER | 74.3 (0.2) | 75.2 (0.1) | | BC5CDR | NER | 85.6 (0.1) | 87.8 (0.1) | | NCBI-Disease | NER | 86.6 (0.3) | 87.1 (0.8) | If using this model, please cite the following paper:

86,875
29

OLMo-2-1124-7B

We introduce OLMo 2, a new family of 7B and 13B models featuring a 9-point increase in MMLU, among other evaluation improvements, compared to the original OLMo 7B model. These gains come from training on OLMo-mix-1124 and Dolmino-mix-1124 datasets and staged training approach. OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details. | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length | |------|--------|---------|-------------|-----------------|----------------| | OLMo 2-7B | 4 Trillion | 32 | 4096 | 32 | 4096 | | OLMo 2-13B | 5 Trillion | 40 | 5120 | 40 | 4096 | The core models released in this batch include the following: | Stage | OLMo 2 7B | OLMo 2 13B | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | Base Model | allenai/OLMo-2-1124-7B | allenai/OLMo-2-1124-13B | | SFT | allenai/OLMo-2-1124-7B-SFT | allenai/OLMo-2-1124-13B-SFT | | DPO | allenai/OLMo-2-1124-7B-DPO | allenai/OLMo-2-1124-13B-DPO | | Final Models (RLVR) | allenai/OLMo-2-1124-7B-Instruct | allenai/OLMo-2-1124-13B-Instruct | | Reward Model (RM)| allenai/OLMo-2-1124-7B-RM | (Same as 7B) | OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: You can use OLMo with the standard HuggingFace transformers library: For faster performance, you can quantize the model using the following method: The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using: We have released checkpoints for these models. For pretraining, the naming convention is `stepXXX-tokensYYYB`. For checkpoints with ingredients of the soup, the naming convention is `stage2-ingredientN-stepXXX-tokensYYYB` To load a specific model revision with HuggingFace, simply add the argument `revision`: Or, you can access all the revisions for the models via the following code snippet: Fine-tuning Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available. 1. Fine-tune with the OLMo repository: 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are here. - Developed by: Allen Institute for AI (Ai2) - Model type: a Transformer style autoregressive language model. - Language(s) (NLP): English - License: The code and model are released under Apache 2.0. - Contact: Technical inquiries: `[email protected]`. Press: `[email protected]` - Date cutoff: Dec. 2023. - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo - Evaluation code: https://github.com/allenai/OLMo-Eval - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2501.00656 Evaluation Core model results for OLMo 2 7B and 13B models are found below. | Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMLUPro | TriviaQA | |-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------| | Open weights models: | | Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 | | Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 | | Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 | | Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 | | Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 | | Gemma-2-9B | 4.4·10²³ | 67.8 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 70.1 | 42 | 81.8 | | Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 | | Partially open models: | | StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 | | Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 | | Fully open models: | | Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 | | OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 | | MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 | | OLMo-0424-7B | 0.9·10²³ | 50.7 | 66.9 | 80.1 | 73.6 | 54.3 | 50 | 29.6 | 43.9 | 27.7 | 22.1 | 58.8 | | DCLM-7B | 1.0·10²³ | 56.9 | 79.8 | 82.3 | 77.3 | 64.4 | 39.3 | 28.8 | 47.5 | 46.1 | 31.3 | 72.1 | | OLMo-2-1124-7B | 1.8·10²³ | 62.9 | 79.8 | 83.8 | 77.2 | 63.7 | 60.8 | 36.9 | 50.4 | 67.5 | 31 | 78 | | OLMo-2-1124-13B | 4.6·10²³ | 68.3 | 83.5 | 86.4 | 81.5 | 67.5 | 70.7 | 46.7 | 54.2 | 75.1 | 35.1 | 81.9 | Pretraining | | OLMo 2 7B | OLMo 2 13B | |-------------------|------------|------------| | Pretraining Stage 1 (OLMo-Mix-1124) | 4 trillion tokens (1 epoch) | 5 trillion tokens (1.2 epochs) | | Pretraining Stage 2 (Dolmino-Mix-1124) | 50B tokens (3 runs) merged | 100B tokens (3 runs) 300B tokens (1 run) merged | | Post-training (Tulu 3 SFT OLMo mix) | SFT + DPO + PPO (preference mix) | SFT + DPO + PPO (preference mix) | Stage 1: Initial Pretraining - Dataset: OLMo-Mix-1124 (3.9T tokens) - Coverage: 90%+ of total pretraining budget - 7B Model: ~1 epoch - 13B Model: 1.2 epochs (5T tokens) Stage 2: Fine-tuning - Dataset: Dolmino-Mix-1124 (843B tokens) - Three training mixes: - 50B tokens - 100B tokens - 300B tokens - Mix composition: 50% high-quality data + academic/Q&A/instruction/math content Model Merging - 7B Model: 3 versions trained on 50B mix, merged via model souping - 13B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint Bias, Risks, and Limitations Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified. Model Card Contact For errors in this model card, contact `[email protected]`.

NaNK
license:apache-2.0
58,516
63

led-base-16384

As described in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan, led-base-16384 was initialized from bart-base since both models share the exact same architecture. To be able to process 16K tokens, bart-base's position embedding matrix was simply copied 16 times. This model is especially interesting for long-range summarization and question answering. This notebook shows how led-base-16384 can effectively be fine-tuned on a downstream task.

license:apache-2.0
56,674
50

OLMo-2-0425-1B-Instruct

OLMo 2 1B Instruct April 2025 is post-trained variant of the allenai/OLMo-2-0425-1B-RLVR1 model, which has undergone supervised finetuning on an OLMo-specific variant of the Tülu 3 dataset, further DPO training on this dataset, and final RLVR training on this dataset. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. Check out the OLMo 2 paper or Tülu 3 paper for more details! OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs, and associated training details. - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Apache 2.0 - Finetuned from model: allenai/OLMo-2-0425-1B-RLVR1 - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo-core - Evaluation code: https://github.com/allenai/olmes - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2501.00656 - Demo: https://playground.allenai.org/ OLMo 2 1B is supported in transformers v4.48 or higher: If using vLLM, you will need to install from the main branch until v0.7.4 is released. Please To load the model with HuggingFace, use the following snippet: NOTE: This is different than previous OLMo 2 and Tülu 3 models due to a minor change in configuration. It does NOT have the bos token before the rest. Our other models have at the beginning of the chat template. It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. To facilitate research on RL finetuning, we have released our intermediate checkpoints during the model's RLVR training. The model weights are saved every 20 training steps, and can be accessible in the revisions of the HuggingFace repository. For example, you can load with: The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). | Model | Average | AlpacaEval 2 LC | BBH | DROP | GSM8K | IFEval | MATH | MMLU | Safety | PopQA | TruthQA | |-------|---------|-----------------|-----|------|-------|--------|------|------|--------|-------|---------| | OLMo 1B 0724 | 24.4 | 2.4 | 29.9 | 27.9 | 10.8 | 25.3 | 2.2 | 36.6 | 52.0 | 12.1 | 44.3 | | SmolLM2 1.7B | 34.2 | 5.8 | 39.8 | 30.9 | 45.3 | 51.6 | 20.3 | 34.3 | 52.4 | 16.4 | 45.3 | | Gemma 3 1B | 38.3 | 20.4 | 39.4 | 25.1 | 35.0 | 60.6 | 40.3 | 38.9 | 70.2 | 9.6 | 43.8 | | Llama 3.1 1B | 39.3 | 10.1 | 40.2 | 32.2 | 45.4 | 54.0 | 21.6 | 46.7 | 87.2 | 13.8 | 41.5 | | Qwen 2.5 1.5B | 41.7 | 7.4 | 45.8 | 13.4 | 66.2 | 44.2 | 40.6 | 59.7 | 77.6 | 15.5 | 46.5 | | --- | | | | | | | | | | | | | OLMo 2 1B SFT | 36.9 | 2.4 | 32.8 | 33.8 | 52.1 | 50.5 | 13.2 | 36.4 | 93.2 | 12.7 | 42.1 | | OLMo 2 1B DPO | 40.6 | 9.5 | 33.0 | 34.5 | 59.0 | 67.1 | 14.1 | 39.9 | 89.9 | 12.3 | 46.4 | | OLMo 2 1B | 42.7 | 9.1 | 35.0 | 34.6 | 68.3 | 70.1 | 20.7 | 40.0 | 87.6 | 12.9 | 48.7 | OLMo 2 is licensed under the Apache 2.0 license. OLMo 2 is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

NaNK
license:apache-2.0
50,162
52

Llama-3.1-Tulu-3-8B-SFT

Tülu3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques. Tülu3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Llama 3.1 Community License Agreement - Finetuned from model: meta-llama/Llama-3.1-8B - Training Repository: https://github.com/allenai/open-instruct - Eval Repository: https://github.com/allenai/olmes - Paper: https://arxiv.org/abs/2411.15124 - Demo: https://playground.allenai.org/ | Stage | Llama 3.1 8B | Llama 3.1 70B | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | Base Model | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-70B | | SFT | allenai/Llama-3.1-Tulu-3-8B-SFT | allenai/Llama-3.1-Tulu-3-70B-SFT | | DPO | allenai/Llama-3.1-Tulu-3-8B-DPO | allenai/Llama-3.1-Tulu-3-70B-DPO | | Final Models (RLVR) | allenai/Llama-3.1-Tulu-3-8B | allenai/Llama-3.1-Tulu-3-70B | | Reward Model (RM)| allenai/Llama-3.1-Tulu-3-8B-RM | (Same as 8B) | | Stage | Llama 3.1 405B | |-----------|-------------------| | Base Model | meta-llama/llama-3.1-405B | | SFT | allenai/llama-3.1-Tulu-3-405B-SFT | | DPO | allenai/llama-3.1-Tulu-3-405B-DPO | | Final Model (RLVR) | allenai/llama-3.1-Tulu-3-405B | | Reward Model (RM)| (Same as 8B) To load the model with HuggingFace, use the following snippet: As a Llama base model, the model can be easily served with: Note that given the long chat template of Llama, you may want to use `--maxmodellen=8192`. It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. The Tülu3 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base Llama 3.1 models, however it is likely to have included a mix of Web data and technical sources like books and code. See the Falcon 180B model card for an example of this. | Benchmark (eval) | Tülu 3 SFT 8B | Tülu 3 DPO 8B | Tülu 3 8B | Llama 3.1 8B Instruct | Qwen 2.5 7B Instruct | Magpie 8B | Gemma 2 9B Instruct | Ministral 8B Instruct | |---------------------------------|----------------|----------------|------------|------------------------|----------------------|-----------|---------------------|-----------------------| | Avg. | 60.4 | 64.4 | 64.8 | 62.2 | 57.8 | 44.7 | 55.2 | 58.3 | | MMLU (0 shot, CoT) | 65.9 | 68.7 | 68.2 | 71.2 | 76.6 | 62.0 | 74.6 | 68.5 | | PopQA (15 shot) | 29.3 | 29.3 | 29.1 | 20.2 | 18.1 | 22.5 | 28.3 | 20.2 | | TruthfulQA (6 shot) | 46.8 | 56.1 | 55.0 | 55.1 | 63.1 | 57.0 | 61.4 | 55.5 | | BigBenchHard (3 shot, CoT) | 67.9 | 65.8 | 66.0 | 62.8 | 21.7 | 0.9 | 2.5 | 56.2 | | DROP (3 shot) | 61.3 | 62.5 | 62.6 | 61.5 | 54.4 | 49.4 | 58.8 | 56.2 | | MATH (4 shot CoT, Flex) | 31.5 | 42.0 | 43.7 | 42.5 | 14.8 | 5.1 | 29.8 | 40.0 | | GSM8K (8 shot, CoT) | 76.2 | 84.3 | 87.6 | 83.4 | 83.8 | 61.2 | 79.7 | 80.0 | | HumanEval (pass@10) | 86.2 | 83.9 | 83.9 | 86.3 | 93.1 | 75.4 | 71.7 | 91.0 | | HumanEval+ (pass@10) | 81.4 | 78.6 | 79.2 | 82.9 | 89.7 | 69.1 | 67.0 | 88.5 | | IFEval (prompt loose) | 72.8 | 81.1 | 82.4 | 80.6 | 74.7 | 38.8 | 69.9 | 56.4 | | AlpacaEval 2 (LC % win) | 12.4 | 33.5 | 34.5 | 24.2 | 29.0 | 49.0 | 43.7 | 31.4 | | Safety (6 task avg.) | 93.1 | 87.2 | 85.5 | 75.2 | 75.0 | 46.4 | 75.5 | 56.2 | | Benchmark (eval) | Tülu 3 70B SFT | Tülu 3 DPO 70B | Tülu 3 70B | Llama 3.1 70B Instruct | Qwen 2.5 72B Instruct | Hermes 3 Llama 3.1 70B | Nemotron Llama 3.1 70B | |---------------------------------|-----------------|-----------------|-------------|-------------------------|-----------------------|------------------------|-------------------------| | Avg. | 72.6 | 75.9 | 76.0 | 73.4 | 71.5 | 68.3 | 65.5 | | MMLU (0 shot, CoT) | 78.9 | 83.3 | 83.1 | 85.3 | 85.5 | 80.4 | 83.8 | | PopQA (15 shot) | 48.6 | 46.3 | 46.5 | 46.4 | 30.6 | 48.1 | 36.4 | | TruthfulQA (6 shot) | 55.7 | 67.9 | 67.6 | 66.8 | 69.9 | 66.5 | 62.6 | | BigBenchHard (3 shot, CoT) | 82.7 | 81.8 | 82.0 | 73.8 | 67.2 | 82.1 | 0.7 | | DROP (3 shot) | 77.2 | 74.1 | 74.3 | 77.0 | 34.2 | 73.2 | 68.8 | | MATH (4 shot CoT, Flex) | 53.7 | 62.3 | 63.0 | 56.4 | 74.3 | 41.9 | 55.0 | | GSM8K (8 shot, CoT) | 91.1 | 93.5 | 93.5 | 93.7 | 89.5 | 90.0 | 84.7 | | HumanEval (pass@10) | 92.9 | 92.4 | 92.4 | 93.6 | 94.0 | 89.6 | 94.1 | | HumanEval+ (pass@10) | 87.3 | 88.4 | 88.0 | 89.5 | 90.8 | 85.9 | 85.5 | | IFEval (prompt loose) | 82.1 | 82.6 | 83.2 | 88.0 | 87.6 | 76.0 | 79.9 | | AlpacaEval 2 (LC % win) | 26.3 | 49.6 | 49.8 | 33.4 | 47.7 | 28.4 | 66.1 | | Safety (6 task avg.) | 94.4 | 89.0 | 88.3 | 76.5 | 87.0 | 57.9 | 69.0 | | Benchmark (eval) | Tülu 3 405B SFT | Tülu 3 405B DPO | Tülu 3 405B | Llama 3.1 405B Instruct | Nous Hermes 3 405B | Deepseek V3 | GPT 4o (11-24) | |-----------------|----------------|----------------|-------------|------------------------|-------------------|-------------|----------------| | Avg w/o Safety | 76.3 | 79.0 | 80.0 | 78.1 | 74.4 | 79.0 | 80.5 | | Avg w/ Safety | 77.5 | 79.6 | 80.7 | 79.0 | 73.5 | 75.9 | 81.6 | | MMLU (5 shot, CoT) | 84.4 | 86.6 | 87.0 | 88.0 | 84.9 | 82.1 | 87.9 | | PopQA (3 shot) | 55.7 | 55.4 | 55.5 | 52.9 | 54.2 | 44.9 | 53.6 | | BigBenchHard (0 shot, CoT) | 88.0 | 88.8 | 88.6 | 87.1 | 87.7 | 89.5 | 83.3 | | MATH (4 shot, Flex) | 63.4 | 59.9 | 67.3 | 66.6 | 58.4 | 72.5 | 68.8 | | GSM8K (8 shot, CoT) | 93.6 | 94.2 | 95.5 | 95.4 | 92.7 | 94.1 | 91.7 | | HumanEval (pass@10) | 95.7 | 97.2 | 95.9 | 95.9 | 92.3 | 94.6 | 97.0 | | HumanEval+ (pass@10) | 93.3 | 93.9 | 92.9 | 90.3 | 86.9 | 91.6 | 92.7 | | IFEval (prompt loose) | 82.4 | 85.0 | 86.0 | 88.4 | 81.9 | 88.0 | 84.8 | | AlpacaEval 2 (LC % win) | 30.4 | 49.8 | 51.4 | 38.5 | 30.2 | 53.5 | 65.0 | | Safety (6 task avg.) | 87.7 | 85.5 | 86.7 | 86.8 | 65.8 | 72.2 | 90.9 | SFT: - Learning Rate: 5E-6 (8B), 2E-6 (70B, 405B) - Effective Batch Size: 128 (8B, 70B), 256 (405B) - Max. Sequence Length: 4096 - Loss Accumulation: Sum (see https://unsloth.ai/blog/gradient) - Learning Rate Schedule: Linear - LR Warmup Ratio: 0.03 - Num. Epochs: 2 All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement. Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. Tülu3 is intended for research and educational use. For more information, please see our Responsible Use Guidelines. If Tülu3 or any of the related materials were helpful to your work, please cite:

NaNK
llama
41,394
36

FlexOlmo-7x7B-1T

NaNK
license:apache-2.0
39,433
33

OLMoE-1B-7B-0924-Instruct

> OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in September 2024 (0924) that has been adapted via SFT and DPO from OLMoE-1B-7B. It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source. This information and more can also be found on the OLMoE GitHub repository. - Paper: https://arxiv.org/abs/2409.02060 - Pretraining Checkpoints, Code, Data and Logs. - SFT (Supervised Fine-Tuning) Checkpoints, Code, Data and Logs. - DPO/KTO (Direct Preference Optimization/Kahneman-Tversky Optimization), Checkpoints, Preference Data, DPO code, KTO code and Logs. Install `transformers` from source until a release after this PR & `torch` and run: Branches: - `main`: Preference tuned via DPO model of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT (`main` branch) - `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT - `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924) - `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto` branches correspond to the other checkpoints mentioned in the paper. | Task (→) | MMLU | GSM8k | BBH | Human-Eval | Alpaca-Eval 1.0 | XSTest | IFEval | Avg | |---------------|------|-------|------|------------|-----------------|--------|--------|------| | Setup (→) | 0-shot | 8-shot CoT | 3-shot | 0-shot | 0-shot | 0-shot | 0-shot | | | Metric (→) | EM | EM | EM | Pass@10 | %win | F1 | Loose Acc | | | | | | | | | | | | | OLMo-1B (0724) | 25.0 | 7.0 | 22.5 | 16.0 | - | 67.6 | 20.5 | - | | +SFT | 36.0 | 12.5 | 27.2 | 21.2 | 41.5 | 81.9 | 26.1 | 35.9 | | +DPO | 36.7 | 12.5 | 30.6 | 22.0 | 50.9 | 79.8 | 24.2 | 37.4 | | OLMo-7B (0724) | 50.8 | 32.5 | 36.9 | 32.3 | - | 80.8 | 19.6 | - | | +SFT | 54.2 | 25.0 | 35.7 | 38.5 | 70.9 | 86.1 | 39.7 | 49.3 | | +DPO | 52.8 | 9.0 | 16.6 | 35.0 | 83.5 | 87.5 | 37.9 | 49.1 | | JetMoE-2B-9B | 45.6 | 43.0 | 37.2 | 54.6 | - | 68.2 | 20.0 | - | | +SFT | 46.1 | 53.5 | 35.6 | 64.8 | 69.3 | 55.6 | 30.5 | 50.4 | | DeepSeek-3B-16B | 37.7 | 18.5 | 39.4 | 48.3 | - | 65.9 | 13.5 | - | | +Chat | 48.5 | 46.5 | 40.8 | 70.1 | 74.8 | 85.6 | 32.3 | 57.0 | | Qwen1.5-3B-14B | 60.4 | 13.5 | 27.2 | 60.2 | - | 73.4 | 20.9 | - | | +Chat | 58.9 | 55.5 | 21.3 | 59.7 | 83.9 | 85.6 | 36.2 | 57.3 | | OLMoE (This Model) | 49.8 | 3.0 | 33.6 | 22.4 | - | 59.7 | 16.6 | - | | +SFT | 51.4 | 40.5 | 38.0 | 51.6 | 69.2 | 84.1 | 43.3 | 54.0 | | +DPO | 51.9 | 45.5 | 37.0 | 54.8 | 84.0 | 82.6 | 48.1 | 57.7 |

NaNK
license:apache-2.0
36,854
93

specter

SPECTER is a pre-trained language model to generate document-level embedding of documents. It is pre-trained on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. If you're coming here because you want to embed papers, SPECTER has now been superceded by SPECTER2. Use that instead. Paper: SPECTER: Document-level Representation Learning using Citation-informed Transformers Authors: Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld

license:apache-2.0
34,898
62

OLMo-2-1124-7B-Instruct

Upon the initial release of OLMo-2 models, we realized the post-trained models did not share the pre-tokenization logic that the base models use. As a result, we have trained new post-trained models. The new models are available under the same names as the original models, but we have made the old models available with a postfix "-preview". See OLMo 2 Preview Post-trained Models for the colleciton of the legacy models. OLMo 2 7B Instruct November 2024 is post-trained variant of the OLMo-2 7B November 2024 model, which has undergone supervised finetuning on an OLMo-specific variant of the Tülu 3 dataset and further DPO training on this dataset, and finally RLVR training using this data. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. Check out the OLMo 2 paper or Tülu 3 paper for more details! OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details. The core models released in this batch include the following: | Stage | OLMo 2 7B | OLMo 2 13B | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | Base Model | allenai/OLMo2-7B-1124 | allenai/OLMo-2-13B-1124 | | SFT | allenai/OLMo-2-1124-7B-SFT | allenai/OLMo-2-1124-13B-SFT | | DPO | allenai/OLMo-2-1124-7B-DPO | allenai/OLMo-2-1124-13B-DPO | | Final Models (RLVR) | allenai/OLMo-2-1124-7B-Instruct | allenai/OLMo-2-1124-13B-Instruct | | Reward Model (RM)| allenai/OLMo-2-1124-7B-RM | allenai/OLMo-2-1124-13B-RM | - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Apache 2.0 - Finetuned from model: allenai/OLMo-2-7B-1124-DPO - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo - Evaluation code: https://github.com/allenai/olmes - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2501.00656 - Demo: https://playground.allenai.org/ OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: To load the model with HuggingFace, use the following snippet: It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). See the Falcon 180B model card for an example of this. | Model | Average | AlpacaEval | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA | |-------|---------|------------|-----|------|--------|---------|------|-------|---------|-------|---------| | Open weights models | | Gemma-2-9B-it | 51.9 | 43.7 | 2.5 | 58.8 | 79.7 | 69.9 | 29.8 | 69.1 | 75.5 | 28.3 | 61.4 | | Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 | | Mistral-Nemo-Instruct-2407 | 50.9 | 45.8 | 54.6 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 | | Qwen-2.5-7B-Instruct | 57.1 | 29.7 | 25.3 | 54.4 | 83.8 | 74.7 | 69.9 | 76.6 | 75.0 | 18.1 | 63.1 | | Llama-3.1-8B-Instruct | 58.9 | 25.8 | 69.7 | 61.7 | 83.4 | 80.6 | 42.5 | 71.3 | 70.2 | 28.4 | 55.1 | | Tülu 3 8B | 60.4 | 34.0 | 66.0 | 62.6 | 87.6 | 82.4 | 43.7 | 68.2 | 75.4 | 29.1 | 55.0 | | Qwen-2.5-14B-Instruct | 60.8 | 34.6 | 34.0 | 50.5 | 83.9 | 82.4 | 70.6 | 81.1 | 79.3 | 21.1 | 70.8 | | Fully open models | | OLMo-7B-Instruct | 28.2 | 5.2 | 35.3 | 30.7 | 14.3 | 32.2 | 2.1 | 46.3 | 54.0 | 17.1 | 44.5 | | OLMo-7B-0424-Instruct | 33.1 | 8.5 | 34.4 | 47.9 | 23.2 | 39.2 | 5.2 | 48.9 | 49.3 | 18.9 | 55.2 | | OLMoE-1B-7B-0924-Instruct | 35.5 | 8.5 | 37.2 | 34.3 | 47.2 | 46.2 | 8.4 | 51.6 | 51.6 | 20.6 | 49.1 | | MAP-Neo-7B-Instruct | 42.9 | 17.6 | 26.4 | 48.2 | 69.4 | 35.9 | 31.5 | 56.5 | 73.7 | 18.4 | 51.6 | | OLMo-2-7B-SFT | 50.2 | 10.2 | 49.7 | 59.6 | 74.6 | 66.9 | 25.3 | 61.1 | 82.1 | 23.6 | 48.6 | | OLMo-2-7B-DPO | 54.2 | 27.9 | 46.7 | 60.2 | 82.6 | 73.0 | 30.3 | 60.8 | 81.0 | 23.5 | 56.0 | | OLMo-2-13B-SFT | 55.3 | 11.5 | 59.6 | 71.3 | 76.3 | 68.6 | 29.5 | 68.0 | 82.3 | 29.4 | 57.1 | | OLMo-2-13B-DPO | 60.6 | 38.3 | 57.9 | 71.5 | 82.3 | 80.2 | 35.2 | 67.9 | 79.7 | 29.0 | 63.9 | | OLMo-2-7B-1124–Instruct | 54.8 | 29.1 | 46.6 | 60.5 | 85.1 | 72.3 | 32.5 | 61.3 | 80.6 | 23.2 | 56.5 | | OLMo-2-13B-1124-Instruct | 62.0 | 39.5 | 58.8 | 71.5 | 87.4 | 82.6 | 39.2 | 68.5 | 79.1 | 28.8 | 64.3 | OLMo 2 is licensed under the Apache 2.0 license. OLMo 2 is intended for research and educational use. For more information, please see our Responsible Use Guidelines. This model has been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: Gemma Terms of Use.

NaNK
license:apache-2.0
31,764
43

Molmo-7B-D-0924

NaNK
license:apache-2.0
28,786
549

OLMoE-1B-7B-0125

> OLMoE-1B-7B is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in January 2025 (0125) that is 100% open-source. It is an improved version of OLMoE-09-24, see the paper appendix for details. This information and more can also be found on the OLMoE GitHub repository. - Paper: arxiv.org/abs/2409.02060 - Pretraining Checkpoints, Code, Data and Logs. - SFT (Supervised Fine-Tuning) Checkpoints, Code, Data and Logs. - DPO/KTO (Direct Preference Optimization/Kahneman-Tversky Optimization), Checkpoints, Preference Data, DPO code, KTO code and Logs. Install `transformers` (version `4.45.0` or greater) & `torch` and run: You can list all revisions/branches by installing `huggingface-hub` & running: Important branches: - `step1200000-tokens5033B`: Pretraining checkpoint used for annealing. There are a few more checkpoints after this one but we did not use them. - `main`: Checkpoint annealed from `step1200000-tokens5033B` for an additional 100B tokens (23,842 steps). We use this checkpoint for our adaptation (https://huggingface.co/allenai/OLMoE-1B-7B-0125-SFT & https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct). - `fp32`: FP32 version of `main`. The model weights were stored in FP32 during training but we did not observe any performance drop from casting them to BF16 after training so we upload all weights in BF16. If you want the original FP32 checkpoint for `main` you can use this one. You will find that it yields slightly different results but should perform around the same on benchmarks. | Model | Active Params | Open Data | MMLU | HellaSwag | ARC-Chall. | ARC-Easy | PIQA | WinoGrande | |-----------------------------|---------------|-----------|------|-----------|------------|----------|------|------------| | LMs with ~1B active parameters | | | | | | | | | | OLMoE-1B-7B-0125 | 1.3B | ✅ | 56.3 | 81.7 | 67.5 | 84.4 | 78.7 | 70.6 | | OLMoE-1B-7B-0924 | 1.3B | ✅ | 54.1 | 80.0 | 62.1 | 84.2 | 79.8 | 70.2 | | DCLM-1B | 1.4B | ✅ | 48.5 | 75.1 | 57.6 | 79.5 | 76.6 | 68.1 | | TinyLlama-1B | 1.1B | ✅ | 33.6 | 60.8 | 38.1 | 69.5 | 71.7 | 60.1 | | OLMo-1B (0724) | 1.3B | ✅ | 32.1 | 67.5 | 36.4 | 53.5 | 74.0 | 62.9 | | Pythia-1B | 1.1B | ✅ | 31.1 | 48.0 | 31.4 | 63.4 | 68.9 | 52.7 | | LMs with ~2-3B active parameters | | | | | | | | | | Qwen1.5-3B-14B | 2.7B | ❌ | 62.4 | 80.0 | 77.4 | 91.6 | 81.0 | 72.3 | | Gemma2-3B | 2.6B | ❌ | 53.3 | 74.6 | 67.5 | 84.3 | 78.5 | 71.8 | | JetMoE-2B-9B | 2.2B | ❌ | 49.1 | 81.7 | 61.4 | 81.9 | 80.3 | 70.7 | | DeepSeek-3B-16B | 2.9B | ❌ | 45.5 | 80.4 | 53.4 | 82.7 | 80.1 | 73.2 | | StableLM-2B | 1.6B | ❌ | 40.4 | 70.3 | 50.6 | 75.3 | 75.6 | 65.8 | | OpenMoE-3B-9B | 2.9B | ✅ | 27.4 | 44.4 | 29.3 | 50.6 | 63.3 | 51.9 | | LMs with ~7-9B active parameters | | | | | | | | | | Gemma2-9B | 9.2B | ❌ | 70.6 | 87.3 | 89.5 | 95.5 | 86.1 | 78.8 | | Llama3.1-8B | 8.0B | ❌ | 66.9 | 81.6 | 79.5 | 91.7 | 81.1 | 76.6 | | DCLM-7B | 6.9B | ✅ | 64.4 | 82.3 | 79.8 | 92.3 | 80.1 | 77.3 | | Mistral-7B | 7.3B | ❌ | 64.0 | 83.0 | 78.6 | 90.8 | 82.8 | 77.9 | | OLMo-7B (0724) | 6.9B | ✅ | 54.9 | 80.5 | 68.0 | 85.7 | 79.3 | 73.2 | | Llama2-7B | 6.7B | ❌ | 46.2 | 78.9 | 54.2 | 84.0 | 77.5 | 71.7 |

NaNK
license:apache-2.0
24,364
31

OLMoE-1B-7B-0125-Instruct

OLMoE-1B-7B-0125-Instruct January 2025 is post-trained variant of the OLMoE-1B-7B January 2025 model, which has undergone supervised finetuning on an OLMo-specific variant of the Tülu 3 dataset and further DPO training on this dataset, and finally RLVR training using this data. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. Check out the OLMoE paper or Tülu 3 paper for more details! OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details. The core models released in this batch include the following: | Stage | OLMoE 1B-7B | |----------------------|----------------------------------------------------------------------------------------------------------| | Base Model | allenai/OLMoE-1B-7B-0125 | | SFT | allenai/OLMoE-1B-7B-0125-SFT | | DPO | allenai/OLMoE-1B-7B-0125-DPO | | Final Models (RLVR) | allenai/OLMoE-1B-7B-0125-Instruct | | Reward Model (RM)| allenai/OLMoE-1B-7B-0125-RM | - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Apache 2.0 - Finetuned from model: allenai/OLMoE-1B-7B-0125-DPO - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo - Evaluation code: https://github.com/allenai/olmes - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2409.02060 - Demo: https://playground.allenai.org/ OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: To load the model with HuggingFace, use the following snippet: It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). See the Falcon 180B model card for an example of this. | Benchmark (eval) | OLMoE-1B-7B-0125-Instruct | OLMoE-1B-7B-0924-Instruct | OLMoE-1B-7B-0125-DPO | OLMoE-1B-7B-0125-SFT | OLMoE-1B-7B-0924-SFT | |--------------------------------|---------------------------|--------------------------|----------------------|---------------------|---------------------| | Avg. | 45.62 | 38.44 | 45.05 | 41.76 | 37.05 | | MMLU (CoT) | 55.08 | 54.57 | 54.93 | 55.26 | 54.32 | | PopQA | 19.75 | 20.56 | 19.65 | 20.12 | 21.01 | | TruthfulQA | 50.56 | 49.14 | 49.99 | 45.48 | 44.66 | | BigBenchHard (CoT) | 38.61 | 36.78 | 37.37 | 37.31 | 36.55 | | DROP | 47.87 | 34.48 | 48.38 | 48.57 | 34.71 | | MATH (Flex) | 21.41 | 8.16 | 20.36 | 21.38 | 8.15 | | GSM8K | 72.40 | 47.38 | 64.59 | 55.72 | 42.46 | | HumanEval | 62.30 | 63.04 | 61.92 | 62.58 | 63.72 | | HumanEval+ | 54.37 | 58.93 | 57.61 | 55.67 | 57.40 | | IFEval | 66.36 | 45.29 | 65.62 | 56.56 | 41.22 | | AlpacaEval | 17.99 | 7.54 | 19.50 | 5.83 | 6.38 | | Safety (average) | 90.40 | 51.40 | 91.40 | 94.50 | 65.80 | OLMoE is licensed under the Apache 2.0 license. OLMoE is intended for research and educational use. For more information, please see our Responsible Use Guidelines. This model has been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: Gemma Terms of Use.

NaNK
license:apache-2.0
24,158
56

olmOCR-2-7B-1025

Full BF16 version of olmOCR-2-7B-1025-FP8. We recommend using the FP8 version for all practical purposes except further fine tuning. This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been additionally fine tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases. Quick links: - 📃 Paper - 🤗 SFT Dataset - 🤗 RL Dataset - 🛠️ Code - 🎮 Demo The best way to use this model is via the olmOCR toolkit. The toolkit comes with an efficient inference setup via VLLM that can handle millions of documents at scale. This model scores the following scores on olmOCR-bench when used with the olmOCR toolkit toolkit which automatically renders, rotates, and retries pages as needed. Model ArXiv Old Scans Math Tables Old Scans Headers and Footers Multi column Long tiny text Base Overall olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025 82.9 82.1 84.3 48.3 95.7 84.3 81.4 99.7 82.3 ± 1.1 olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025-FP8 83.0 82.3 84.9 47.7 96.1 83.7 81.9 99.7 82.4 ± 1.1 This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit. If you want to prompt this model manually instead of using the olmOCR toolkit, please see the code below. In normal usage, the olmOCR toolkit builds the prompt by rendering the PDF page, and extracting relevant text blocks and image metadata. To duplicate that you will need to This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.

NaNK
license:apache-2.0
22,260
75

OLMo-1B-hf

This model is licensed under Apache 2.0 and is associated with the dataset allenai/dolma.

NaNK
license:apache-2.0
20,083
25

wildguard

license:apache-2.0
17,735
35

Llama-3.1-Tulu-3-8B

NaNK
llama
16,191
175

ivila-row-layoutlm-finetuned-s2vl-v2

16,140
2

Molmo-7B-O-0924

NaNK
license:apache-2.0
15,931
161

OLMo-2-1124-7B-SFT

NaNK
license:apache-2.0
14,832
1

OLMo-2-0325-32B-Instruct

OLMo 2 32B Instruct March 2025 is post-trained variant of the OLMo-2 32B March 2025 model, which has undergone supervised finetuning on an OLMo-specific variant of the Tülu 3 dataset, further DPO training on this dataset, and final RLVR training on this dataset. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. Check out the OLMo 2 paper or Tülu 3 paper for more details! OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs, and associated training details. - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Apache 2.0 - Finetuned from model: allenai/OLMo-2-0325-32B-DPO - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo-core - Evaluation code: https://github.com/allenai/olmes - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2501.00656 - Demo: https://playground.allenai.org/ OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: To load the model with HuggingFace, use the following snippet: NOTE: This is different than previous OLMo 2 and Tülu 3 models due to a minor change in configuration. It does NOT have the bos token before the rest. Our other models have at the beginning of the chat template. It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. To facilitate research on RL finetuning, we have released our intermediate checkpoints during the model's RLVR training. The model weights are saved every 20 training steps, and can be accessible in the revisions of the HuggingFace repository. For example, you can load with: The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). See the Falcon 180B model card for an example of this. | Model | Average | AlpacaEval 2 LC | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA | |-------|---------|------|-----|------|-------|--------|------|------|--------|-------|---------| | Closed API models | | | | | | | | | | | | | GPT-3.5 Turbo 0125 | 59.6 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9 | | GPT 4o Mini 2024-07-18 | 65.7 | 49.7 | 65.9 | 36.3 | 83.0 | 83.5 | 67.9 | 82.2 | 84.9 | 39.0 | 64.8 | | Open weights models | | | | | | | | | | | | | Mistral-Nemo-Instruct-2407 | 50.9 | 45.8 | 54.6 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 | | Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 | | Gemma-2-27b-it | 61.3 | 49.0 | 72.7 | 67.5 | 80.7 | 63.2 | 35.1 | 70.7 | 75.9 | 33.9 | 64.6 | | Qwen2.5-32B | 66.5 | 39.1 | 82.3 | 48.3 | 87.5 | 82.4 | 77.9 | 84.7 | 82.4 | 26.1 | 70.6 | | Mistral-Small-24B | 67.6 | 43.2 | 80.1 | 78.5 | 87.2 | 77.3 | 65.9 | 83.7 | 66.5 | 24.4 | 68.1 | | Llama-3.1-70B | 70.0 | 32.9 | 83.0 | 77.0 | 94.5 | 88.0 | 56.2 | 85.2 | 76.4 | 46.5 | 66.8 | | Llama-3.3-70B | 73.0 | 36.5 | 85.8 | 78.0 | 93.6 | 90.8 | 71.8 | 85.9 | 70.4 | 48.2 | 66.1 | | Gemma-3-27b-it | - | 63.4 | 83.7 | 69.2 | 91.1 | - | - | 81.8 | - | 30.9 | - | | Fully open models | | | | | | | | | | | | | OLMo-2-7B-1124-Instruct | 55.7 | 31.0 | 48.5 | 58.9 | 85.2 | 75.6 | 31.3 | 63.9 | 81.2 | 24.6 | 56.3 | | OLMo-2-13B-1124-Instruct | 61.4 | 37.5 | 58.4 | 72.1 | 87.4 | 80.4 | 39.7 | 68.6 | 77.5 | 28.8 | 63.9 | | OLMo-2-32B-0325-SFT | 61.7 | 16.9 | 69.7 | 77.2 | 78.4 | 72.4 | 35.9 | 76.1 | 93.8 | 35.4 | 61.3 | | OLMo-2-32B-0325-DPO | 68.8 | 44.1 | 70.2 | 77.5 | 85.7 | 83.8 | 46.8 | 78.0 | 91.9 | 36.4 | 73.5 | | OLMo-2-32B-0325-Instruct | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 | Below is the training curves for `allenai/OLMo-2-0325-32B-Instruct`. The model was trained using 5 8xH100 nodes. Below are the core eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct` (note we took step `320` as the final checkpoint, corresponding to episode `573,440`): Below are the other eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct`: The command below is copied directly from the tracked training job: OLMo 2 is licensed under the Apache 2.0 license. OLMo 2 is intended for research and educational use. For more information, please see our Responsible Use Guidelines. This model has been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: Gemma Terms of Use.

NaNK
license:apache-2.0
13,857
148

OLMoE-1B-7B-0924

This model is licensed under the Apache 2.0 license and is designed for the English language.

NaNK
license:apache-2.0
13,699
136

olmOCR-7B-0725-FP8

NaNK
license:apache-2.0
11,123
18

scibert_scivocab_cased

9,801
15

olmOCR-7B-0225-preview

This is a preview release of the olmOCR model that's fine tuned from Qwen2-VL-7B-Instruct using the olmOCR-mix-0225 dataset. Quick links: - 📃 Paper - 🤗 Dataset - 🛠️ Code - 🎮 Demo The best way to use this model is via the olmOCR toolkit. The toolkit comes with an efficient inference setup via sglang that can handle millions of documents at scale. This model expects as input a single document image, rendered such that the longest dimension is 1024 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit. If you want to prompt this model manually instead of using the olmOCR toolkit, please see the code below. In normal usage, the olmOCR toolkit builds the prompt by rendering the PDF page, and extracting relevant text blocks and image metadata. To duplicate that you will need to olmOCR is licensed under the Apache 2.0 license. olmOCR is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

NaNK
license:apache-2.0
9,631
703

tulu-2-dpo-7b

NaNK
llama
8,411
20

led-large-16384

license:apache-2.0
8,070
31

specter2_aug2023refresh_base

license:apache-2.0
7,978
3

tk-instruct-11b-def

NaNK
license:apache-2.0
7,701
17

tulu-2-7b

NaNK
llama
7,439
10

olmOCR-7B-0825

This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-0225 dataset. Quick links: - 📃 Paper - 🤗 Dataset - 🛠️ Code - 🎮 Demo The best way to use ...

NaNK
license:apache-2.0
6,530
59

OLMo-2-1124-13B

NaNK
license:apache-2.0
5,776
66

OLMo-2-0425-1B-DPO

NaNK
license:apache-2.0
5,598
3

Olmo-3-7B-Instruct

NaNK
license:apache-2.0
5,309
57

Olmo-3-7B-Think

NaNK
license:apache-2.0
5,253
40

MolmoAct-7B-D-LIBERO-Spatial-0812

MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family here. Learn more about MolmoAct in our announcement blog post or the paper. MolmoAct 7B-D LIBERO-Spatial is based on Qwen2.5-7B and uses SigLip2 as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's Pre-training Mixture, then mid-trained on MolmoAct Dataset, and finally post-trained on LIBERO-Spatial. This model is intended to be used for replicating our results on LIBERO-Spatial. This checkpoint is a preview of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. Quick links: - 📂 All Models - 📂 All Data - 📄 Paper - 💻 Code - 🎥 Blog Post - 🎥 Video This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines. MolmoAct offers the ability to inspect a visual trace of its intended actions in space before they occur, allowing users to ensure safe behavior by proactively auditing and adjusting the actions of any hardware acting under the model’s instructions. MolmoAct’s action space is bounded within the data provided, and compliance is built into the model to prevent excessive force when resistance is detected. Please follow the hardware manufacturer’s guidelines when using this model with a robot and perform all operations in a safely configured environment.

NaNK
license:apache-2.0
5,190
0

OLMo-2-0325-32B

NaNK
license:apache-2.0
4,369
63

OLMo-2-0325-32B-Instruct-GGUF

NaNK
4,367
18

Llama-3.1-Tulu-3-8B-DPO

License: llama3.1 Language: en

NaNK
llama
4,295
28

MolmoAct-7B-D-LIBERO-Object-0812

MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family here. Learn more about MolmoAct in our announcement blog post or the paper. MolmoAct 7B-D LIBERO-Object is based on Qwen2.5-7B and uses SigLip2 as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's Pre-training Mixture, then mid-trained on MolmoAct Dataset, and finally post-trained on LIBERO-Long. This model is intended to be used for replicating our results on LIBERO-Object. This checkpoint is a preview of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. Quick links: - 📂 All Models - 📂 All Data - 📄 Paper - 💻 Code - 🎥 Blog Post - 🎥 Video This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines. MolmoAct offers the ability to inspect a visual trace of its intended actions in space before they occur, allowing users to ensure safe behavior by proactively auditing and adjusting the actions of any hardware acting under the model’s instructions. MolmoAct’s action space is bounded within the data provided, and compliance is built into the model to prevent excessive force when resistance is detected. Please follow the hardware manufacturer’s guidelines when using this model with a robot and perform all operations in a safely configured environment.

NaNK
license:apache-2.0
4,260
0

Olmo-3-1125-32B

NaNK
license:apache-2.0
4,229
70

longformer-large-4096

4,052
15

OLMo-7B

NaNK
license:apache-2.0
3,827
648

OLMo-2-0425-1B-SFT

NaNK
license:apache-2.0
3,648
4

Olmo-3-7B-Think-DPO

NaNK
license:apache-2.0
3,160
2

Olmo-3-1025-7B

NaNK
license:apache-2.0
3,028
26

MolmoAct-7B-D-Pretrain-RT-1-0812

NaNK
license:apache-2.0
2,954
5

OLMo-1B

NaNK
license:apache-2.0
2,942
107

Flex-reddit-2x7B-1T

NaNK
license:apache-2.0
2,661
5

specter2

2,649
69

OLMo-2-1124-13B-Instruct

NaNK
license:apache-2.0
2,631
46

olmOCR-7B-0725

NaNK
license:apache-2.0
2,522
61

Olmo-3-7B-Instruct-SFT

NaNK
license:apache-2.0
2,511
1

MolmoAct-7B-D-LIBERO-Goal-0812

MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family here. Learn more about MolmoAct in our announcement blog post or the paper. MolmoAct 7B-D LIBERO-Goal is based on Qwen2.5-7B and uses SigLip2 as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's Pre-training Mixture, then mid-trained on MolmoAct Dataset, and finally post-trained on LIBERO-Long. This model is intended to be used for replicating our results on LIBERO-Goal. This checkpoint is a preview of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. Quick links: - 📂 All Models - 📂 All Data - 📄 Paper - 💻 Code - 🎥 Blog Post - 🎥 Video This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines. MolmoAct offers the ability to inspect a visual trace of its intended actions in space before they occur, allowing users to ensure safe behavior by proactively auditing and adjusting the actions of any hardware acting under the model’s instructions. MolmoAct’s action space is bounded within the data provided, and compliance is built into the model to prevent excessive force when resistance is detected. Please follow the hardware manufacturer’s guidelines when using this model with a robot and perform all operations in a safely configured environment.

NaNK
license:apache-2.0
2,471
0

MolmoAct-7B-D-Pretrain-0812

NaNK
license:apache-2.0
2,440
8

Olmo-3-32B-Think

NaNK
license:apache-2.0
2,128
103

OLMoE-1B-7B-0924-Instruct-GGUF

NaNK
license:apache-2.0
2,102
10

OLMo-7B-hf

Language model for English with Apache 2.0 license.

NaNK
license:apache-2.0
2,069
15

tulu-2-dpo-70b

NaNK
llama
2,065
157

tulu-2-dpo-13b

NaNK
llama
1,925
20

Olmo-3-7B-Think-SFT

NaNK
license:apache-2.0
1,749
5

OLMo-2-1124-7B-DPO

NaNK
license:apache-2.0
1,749
1

OLMo-7B-0724-Instruct-hf

NaNK
license:apache-2.0
1,604
6

MolmoE-1B-0924

NaNK
license:apache-2.0
1,509
153

truthfulqa-truth-judge-llama2-7B

This model is built based on LLaMa2 7B in replacement of the truthfulness/informativeness judge models that were originally introduced in the TruthfulQA paper. That model is based on OpenAI's Curie engine using their finetuning API. However, as of February 08, 2024, OpenAI has taken down its Curie engine, and thus, we cannot use it for TruthfulQA evaluation anymore. So, we decided to train the judge models using an open model (i.e., LLaMa), which can make the evaluation more accessible and reproducible. We released two models for the truthfulness and informativeness evaluation, respectively. The training code and validation results of these models can be found here These models are only intended for the TruthfulQA evaluation. They are intended to generalize to the evaluation of new models on the fixed set of prompts, but they may fail to generalize to new prompts. You can try the model using the following scripts:

NaNK
llama
1,497
6

Llama-3.1-8B-Instruct-RM-RB2

NaNK
llama
1,449
1

OLMo-2-0425-1B-RLVR1

NaNK
license:apache-2.0
1,438
2

OLMo-2-1124-13B-DPO

NaNK
license:apache-2.0
1,413
0

Olmo-3-7B-Instruct-DPO

NaNK
license:apache-2.0
1,382
2

OLMo-7B-0724-hf

NaNK
license:apache-2.0
1,360
16

led-large-16384-arxiv

license:apache-2.0
1,138
33

Molmo-72B-0924

NaNK
license:apache-2.0
1,040
294

specter2_aug2023refresh

900
4

open-instruct-human-mix-65b

NaNK
llama
857
4

open-instruct-pythia-6.9b-tulu

This model is a 6.9B Pythia model finetuned on a mixture of instruction datasets (FLAN V2, CoT, Dolly, Open Assistant 1, GPT4-Alpaca, Code-Alpaca, and ShareGPT). This was trained as part of the paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources. The codebase used to train and evaluate this model can be found at https://github.com/allenai/open-instruct. This model is licensed under the AI model license given in LICENSE.txt, with the original model license at pythialicense.txt. Usage Simply download and use - this model is not a diff, unlike the other open-instruct models. The model is trained to use the following format (note the newlines): For best results, format all inputs in this manner. Make sure to include a newline after ` `, this can affect generation quality quite a bit. Here is the performance of this model across benchmarks explored in our paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources: | MMLU 0-shot | MMLU 5-shot | GSM Direct | GSM CoT | BBH Direct | BBH CoT | TydiQA Gold-Passage | TydiQA Closed-book | Codex-Eval Pass@1 | Codex-Eval Pass@10 | AlpacaFarm vs Davinci-003 | Average | |:-----------:|:-----------:|:----------:|:-------:|:----------:|:-------:|:-------------------:|:------------------:|:-----------------:|:------------------:|:-------------------------:|---------| | 34.1 | 34.6 | 3.5 | 15.5 | 31.3 | 27.8 | 33.4 | 3.8 | 14.3 | 21.4 | 9.2 | 19.8 | If you use this model, please cite our work, the Pythia paper, and the original datasets:

NaNK
854
6

unifiedqa-t5-base

753
11

unifiedqa-t5-large

749
3

digital-socrates-7b

NaNK
llama
735
6

olmOCR-7B-0225-preview-GGUF

NaNK
729
27

digital-socrates-13b

NaNK
llama
722
10

MolmoAct 7B D 0812

NaNK
license:apache-2.0
714
41

Llama-3.1-Tulu-3-70B

License: llama3.1 Language: en

NaNK
llama
708
60

Llama-3.1-Tulu-3.1-8B

NaNK
llama
697
38

OLMoE-1B-7B-0125-Instruct-GGUF

NaNK
697
18

OLMo-2-1124-7B-Instruct-GGUF

NaNK
license:apache-2.0
582
7

unifiedqa-v2-t5-base-1363200

574
2

OLMo-2-0325-32B-SFT

NaNK
license:apache-2.0
561
4

OLMo-2-0425-1B-Instruct-GGUF

NaNK
494
14

OLMo-2-0325-32B-DPO

NaNK
license:apache-2.0
493
3

Olmo-3-32B-Think-DPO

NaNK
license:apache-2.0
492
1

OLMo-7B-0424-hf

NaNK
license:apache-2.0
485
14

Olmo-3-32B-Think-SFT

NaNK
license:apache-2.0
481
2

OLMoE-1B-7B-0924-GGUF

NaNK
license:apache-2.0
435
10

OLMo-7B-Instruct-hf

This model is licensed under Apache 2.0 and is associated with the dataset allenai/dolma.

NaNK
license:apache-2.0
407
6

OLMo-7B-Twin-2T-hf

NaNK
license:apache-2.0
397
1

Llama-3.1-Tulu-3-8B-RM

License: llama3.1 Language: en

NaNK
llama
378
19

GraspMolmo

NaNK
license:mit
369
8

olmOCR-7B-0225-preview-FP8

NaNK
license:apache-2.0
351
9

DataDecide-c4-150M

license:apache-2.0
351
0

Olmo-3-7B-RL-Zero-Math

NaNK
license:apache-2.0
327
5

Olmo-3-7B-RLZero-Math

NaNK
license:apache-2.0
327
5

OLMo-2-1124-13B-GGUF

NaNK
license:apache-2.0
284
2

OLMo-2-1124-13B-RM

NaNK
license:apache-2.0
277
2

OLMo-2-0425-1B-early-training

NaNK
license:apache-2.0
265
6

tulu-2-13b

NaNK
llama
264
5

OLMo-2-1124-13B-Instruct-GGUF

NaNK
license:apache-2.0
232
9

dsp_roberta_base_dapt_biomed_tapt_rct_500

226
1

hvila-block-layoutlm-finetuned-grotoap2

223
0

truthfulqa-info-judge-llama2-7B

NaNK
llama
222
1

DataDecide-fineweb-edu-20M

license:apache-2.0
214
0

OLMo-7B-0424

NaNK
license:apache-2.0
209
49

PRIMERA

209
24

ACE2-ERA5

Ai2 Climate Emulator (ACE) is a family of models designed to simulate atmospheric variability from the time scale of days to centuries. Disclaimer: ACE models are research tools and should not be used for operational climate predictions. ACE2-ERA5 is trained on the ERA5 dataset and is described in ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses. As part of that paper, the repository containing training and evaluation scripts and configuration files used for this model is located here. 1. Download this repository. Optionally, you can just download a subset of the `forcingdata` and `initialconditions` for the period you are interested in. 2. Update paths in the `inferenceconfig.yaml`. Specifically, update `experimentdir`, `checkpointpath`, `initialcondition.path` and `forcingloader.dataset.path`. 3. Install code dependencies with `pip install fme`. 4. Run inference with `python -m fme.ace.inference inferenceconfig.yaml`. Briefly, the strengths of ACE2-ERA5 are: - accurate atmospheric warming response to combined increase of sea surface temperature and CO2 over last 80 years - highly accurate atmospheric response to El Niño sea surface temperature variability - good representation of the geographic distribution of tropical cyclones - accurate Madden Julian Oscillation variability - realistic stratospheric polar vortex strength and variability - exact conservation of global dry air mass and moisture Some known weaknesses are: - the individual sensitivities to changing sea surface temperature and CO2 are not entirely realistic - the medium-range (3-10 day) weather forecast skill is not state of the art - not expected to generalize accurately for large perturbations of certain inputs (e.g. doubling of CO2)

license:apache-2.0
208
12

unifiedqa-v2-t5-3b-1363200

NaNK
192
2

longformer-large-4096-finetuned-triviaqa

184
7

OLMo-2-1124-13B-Instruct-preview

NaNK
license:apache-2.0
182
57

OlmoEarth-v1-Base

182
4

Olmo-3-7B-RL-Zero-Mix

NaNK
license:apache-2.0
182
3

Olmo-3-7B-RLZero-Mix

NaNK
license:apache-2.0
182
3

unifiedqa-v2-t5-large-1363200

177
4

OLMo-7B-0724-SFT-hf

NaNK
license:apache-2.0
168
4

cs_roberta_base

168
1

MolmoAct-7B-D-LIBERO-Long-0812

MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family here. Learn more about MolmoAct in our announcement blog post or the paper. MolmoAct 7B-D LIBERO-Long is based on Qwen2.5-7B and uses SigLip2 as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's Pre-training Mixture, then mid-trained on MolmoAct Dataset, and finally post-trained on LIBERO-Long. This model is intended to be used for replicating our results on LIBERO-Long. This checkpoint is a preview of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. Quick links: - 📂 All Models - 📂 All Data - 📄 Paper - 💻 Code - 🎥 Blog Post - 🎥 Video This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines. MolmoAct offers the ability to inspect a visual trace of its intended actions in space before they occur, allowing users to ensure safe behavior by proactively auditing and adjusting the actions of any hardware acting under the model’s instructions. MolmoAct’s action space is bounded within the data provided, and compliance is built into the model to prevent excessive force when resistance is detected. Please follow the hardware manufacturer’s guidelines when using this model with a robot and perform all operations in a safely configured environment.

NaNK
license:apache-2.0
166
0

MolmoAct-7B-O-0812

NaNK
license:apache-2.0
157
5

OLMo-2-1124-13B-SFT

NaNK
license:apache-2.0
150
0

OLMo-2-1124-7B-RM

NaNK
license:apache-2.0
146
3

wmt19-de-en-6-6-big

license:apache-2.0
143
7

Llama 3.1 Tulu 3 405B

NaNK
llama
139
109

OLMoE-1B-7B-0125-GGUF

NaNK
139
5

Llama-3.1-Tulu-3-70B-SFT

Language model with capabilities in English. License: llama3.1.

NaNK
llama
137
6

open-instruct-stanford-alpaca-7b

NaNK
llama
128
12

ACE2-ERA5-training-artifacts

license:apache-2.0
124
0

OLMo-2-0425-1B-GGUF

NaNK
120
0

FlexOlmo-7x7B-1T-RT

NaNK
license:apache-2.0
118
6

tk-instruct-base-def-pos

license:apache-2.0
115
9

Llama-3.1-Tulu-3-70B-DPO

License: llama3.1 Language: en

NaNK
llama
114
9

tulu-2-70b

NaNK
llama
114
8

DataDecide-dolma1_7-no-code-750M

license:apache-2.0
114
0

unifiedqa-v2-t5-base-1251000

112
0

DataDecide-dolma1_7-750M

license:apache-2.0
112
0

OLMoE-1B-7B-0125-SFT

NaNK
license:apache-2.0
111
2

hvila-block-layoutlm-finetuned-docbank

111
1

DataDecide-dolma1_7-no-math-code-750M

license:apache-2.0
110
1

DataDecide-dolma1_6plus-530M

license:apache-2.0
109
0

OLMo-7B-Instruct

NaNK
license:apache-2.0
108
53

PRIMERA-multinews

108
9

DataDecide-dolma1_7-no-flan-750M

license:apache-2.0
108
0

DataDecide-dolma1_6plus-750M

license:apache-2.0
108
0

DataDecide-dolma1_7-no-code-530M

license:apache-2.0
108
0

OLMo-2-1124-7B-DPO-Preview

NaNK
license:apache-2.0
106
2

Flex-math-2x7B-1T

NaNK
license:apache-2.0
105
2

DataDecide-dolma1_7-no-reddit-750M

license:apache-2.0
105
0

DataDecide-dolma1_7-530M

license:apache-2.0
105
0

DataDecide-dolma1_7-no-flan-530M

license:apache-2.0
105
0

Flex-pes2o-2x7B-1T

NaNK
license:apache-2.0
105
0

DataDecide-dolma1_7-no-math-code-530M

license:apache-2.0
104
0

Olmo-3-7B-RL-Zero-IF

NaNK
license:apache-2.0
103
3

DataDecide-dolma1_7-no-reddit-530M

license:apache-2.0
102
0

OLMo-2-1124-13B-SFT-Preview

NaNK
license:apache-2.0
101
3

OLMo-2-1124-13B-DPO-Preview

NaNK
license:apache-2.0
101
3

Flex-news-2x7B-1T

NaNK
license:apache-2.0
101
1

OLMoE-1B-7B-0924-SFT

NaNK
license:apache-2.0
100
19

multicite-multilabel-scibert

license:mit
100
2

Flex-code-2x7B-1T

NaNK
license:apache-2.0
98
1

t5-small-squad2-question-generation

97
45

DataDecide-dolma1_7-150M

license:apache-2.0
97
0

DataDecide-dolma1_6plus-150M

license:apache-2.0
97
0

DataDecide-dolma1_7-no-reddit-300M

license:apache-2.0
97
0

DataDecide-dolma1_7-300M

license:apache-2.0
97
0

Olmo-3-7B-RLZero-IF

NaNK
license:apache-2.0
96
3

Flex-public-7B-1T

NaNK
96
3

DataDecide-dolma1_7-no-flan-300M

license:apache-2.0
96
0

DataDecide-dolma1_6plus-300M

license:apache-2.0
96
0

Flex-creative-2x7B-1T

NaNK
license:apache-2.0
95
4

DataDecide-dolma1_7-no-math-code-300M

license:apache-2.0
95
0

DataDecide-dolma1_7-no-math-code-150M

license:apache-2.0
94
0

DataDecide-dolma1_7-no-code-300M

license:apache-2.0
94
0

OLMo-7B-1024-preview

NaNK
license:apache-2.0
93
1

OLMo-7B-SFT-hf

NaNK
license:apache-2.0
92
1

DataDecide-dolma1_7-no-flan-150M

license:apache-2.0
92
0

DataDecide-dolma1_7-1B

NaNK
license:apache-2.0
92
0

specter2_adhoc_query

91
6

Llama-3.1-Tulu-3-405B-SFT

NaNK
llama
90
11

Llama-3.1-Tulu-3-405B-DPO

NaNK
llama
90
6

DataDecide-dolma1_6plus-1B

NaNK
license:apache-2.0
89
0

DataDecide-dolma1_7-no-flan-1B

NaNK
license:apache-2.0
89
0

tulu-v2.5-ppo-13b-hh-rlhf-60k

NaNK
llama
88
0

llama-3-tulu-2-8b

NaNK
llama
88
0

DataDecide-dolma1_7-no-code-150M

license:apache-2.0
88
0

DataDecide-dolma1_7-no-reddit-150M

license:apache-2.0
88
0

DataDecide-dolma1_7-no-math-code-1B

NaNK
license:apache-2.0
88
0

DataDecide-dclm-baseline-qc-20p-750M

license:apache-2.0
88
0

DataDecide-dclm-baseline-qc-7p-fw2-750M

license:apache-2.0
87
1

OLMo-2-1124-13B-Instruct-RLVR2

NaNK
license:apache-2.0
87
0

DataDecide-fineweb-edu-750M

license:apache-2.0
87
0

DataDecide-falcon-and-cc-qc-10p-750M

license:apache-2.0
87
0

DataDecide-dclm-baseline-750M

license:apache-2.0
87
0

DataDecide-falcon-and-cc-qc-orig-10p-750M

license:apache-2.0
87
0

DataDecide-dclm-baseline-qc-7p-fw3-750M

license:apache-2.0
87
0

tulu-13b

NaNK
llama
86
8

scitulu-7b

NaNK
llama
86
3

open-instruct-opt-6.7b-tulu

NaNK
86
2

DataDecide-dolma1_7-no-reddit-1B

NaNK
license:apache-2.0
86
1

DataDecide-c4-750M

license:apache-2.0
86
1

tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts

NaNK
llama
86
0

DataDecide-dolma1_7-no-code-1B

NaNK
license:apache-2.0
86
0

DataDecide-falcon-and-cc-qc-tulu-10p-750M

license:apache-2.0
86
0

DataDecide-falcon-and-cc-750M

license:apache-2.0
86
0

dsp_roberta_base_dapt_biomed_tapt_rct_180K

85
0

DataDecide-dclm-baseline-50p-dolma1.7-50p-750M

license:apache-2.0
85
0

DataDecide-dclm-baseline-qc-fw-3p-750M

license:apache-2.0
85
0

DataDecide-falcon-and-cc-qc-20p-750M

license:apache-2.0
85
0

DataDecide-falcon-750M

license:apache-2.0
85
0

tulu-v2.5-dpo-13b-shp2

NaNK
llama
84
0

tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm

NaNK
llama
84
0

DataDecide-c4-90M

license:apache-2.0
84
0

DataDecide-dclm-baseline-25p-dolma1.7-75p-750M

license:apache-2.0
84
0

DataDecide-dclm-baseline-qc-10p-750M

license:apache-2.0
84
0

DataDecide-dclm-baseline-qc-7p-fw2-530M

license:apache-2.0
84
0

scitulu-70b

NaNK
llama
83
6

llama-3.1-tulu-2-8b

NaNK
llama
83
5

OLMo-2-1124-13B-Instruct-RLVR1

NaNK
license:apache-2.0
83
2

tulu-v2.5-dpo-13b-hh-rlhf

NaNK
llama
83
1

tulu-v2.5-dpo-13b-nectar-60k

NaNK
llama
83
1

OLMoE-1B-7B-0125-DPO

NaNK
license:apache-2.0
83
1

tulu-v1-llama2-13b

NaNK
llama
83
0

tulu-v2.5-ppo-13b-stackexchange-60k

NaNK
llama
83
0

DataDecide-fineweb-pro-750M

license:apache-2.0
83
0

DataDecide-dclm-baseline-75p-dolma1.7-25p-750M

license:apache-2.0
83
0

DataDecide-dclm-baseline-qc-10p-530M

license:apache-2.0
83
0

DataDecide-falcon-and-cc-qc-10p-530M

license:apache-2.0
83
0

Llama-3.1-Tulu-3-70B-broken

NaNK
llama
82
4

open-instruct-code-alpaca-7b

NaNK
llama
82
2

open-instruct-llama2-sharegpt-7b

NaNK
llama
82
1

codetulu-2-7b

NaNK
llama
82
1

llama-3-tulu-2-dpo-8b

NaNK
llama
82
1

dsp_roberta_base_tapt_sciie_3219

82
0

open-instruct-code-alpaca-13b

NaNK
llama
82
0

DataDecide-falcon-and-cc-qc-20p-530M

license:apache-2.0
82
0

DataDecide-dclm-baseline-530M

license:apache-2.0
82
0

DataDecide-c4-530M

license:apache-2.0
82
0

OLMo-2-1124-7B-SFT-Preview

NaNK
license:apache-2.0
81
3

Llama-3.1-Tulu-3-8B-SFT-no-safety-data

NaNK
llama
81
1

Llama-3.1-Tulu-3-8B-SFT-no-persona-data

NaNK
llama
81
1

open-instruct-llama2-sharegpt-dpo-7b

NaNK
llama
81
0

tulu-v2.5-dpo-13b-argilla-orca-pairs

NaNK
llama
81
0

tulu-v2.5-dpo-13b-nectar

NaNK
llama
81
0

tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm

NaNK
llama
81
0

DataDecide-dclm-baseline-qc-20p-530M

license:apache-2.0
81
0

DataDecide-dclm-baseline-75p-dolma1.7-25p-530M

license:apache-2.0
81
0

DataDecide-falcon-and-cc-530M

license:apache-2.0
81
0

DataDecide-falcon-530M

license:apache-2.0
81
0

DataDecide-falcon-and-cc-qc-orig-10p-530M

license:apache-2.0
81
0

tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm

NaNK
llama
80
6

codetulu-2-13b

NaNK
llama
80
3

llama-3-tulu-v2.5-8b-uf-mean-8b-uf-rm

NaNK
llama
80
3

codetulu-2-34b

NaNK
llama
80
2

llama-3.1-tulu-2-dpo-8b

NaNK
llama
80
2

llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm

NaNK
llama
80
1

open-instruct-sni-7b

NaNK
llama
80
0

tulu-v1-llama2-7b

NaNK
llama
80
0

tulu-v2.5-dpo-13b-stackexchange

NaNK
llama
80
0

tulu-v2.5-dpo-13b-capybara

NaNK
llama
80
0

tulu-v2.5-dpo-13b-hh-rlhf-60k

NaNK
llama
80
0

llama-3-tulu-2-dpo-70b

NaNK
llama
80
0

OLMo-7B-0424-SFT-hf

NaNK
license:apache-2.0
80
0

DataDecide-dclm-baseline-qc-fw-10p-750M

license:apache-2.0
80
0

DataDecide-falcon-and-cc-qc-tulu-10p-530M

license:apache-2.0
80
0

DataDecide-dclm-baseline-25p-dolma1.7-75p-530M

license:apache-2.0
80
0

DataDecide-dclm-baseline-qc-fw-10p-530M

license:apache-2.0
80
0

bhaskara

license:apache-2.0
79
14

Llama-3.1-Tulu-3-8B-SFT-no-wildchat-data

NaNK
llama
79
1

tulu-v2.5-dpo-13b-chatbot-arena-2024

NaNK
llama
79
0

tulu-v2.5-dpo-13b-alpacafarm-human-pref

NaNK
llama
79
0

tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref

NaNK
llama
79
0

tulu-v2.5-ppo-13b-nectar-60k

NaNK
llama
79
0

tulu-v2.5-ppo-13b-chatbot-arena-2023

NaNK
llama
79
0

llama-3.1-tulu-2-dpo-70b

NaNK
llama
79
0

DataDecide-c4-1B

NaNK
license:apache-2.0
79
0

DataDecide-fineweb-edu-530M

license:apache-2.0
79
0

DataDecide-dclm-baseline-qc-7p-fw3-530M

license:apache-2.0
79
0

DataDecide-dclm-baseline-50p-dolma1.7-50p-530M

license:apache-2.0
79
0

DataDecide-dclm-baseline-qc-fw-3p-530M

license:apache-2.0
79
0

OLMo-2-1124-7B-Instruct-preview

NaNK
license:apache-2.0
78
47

OLMo-2-1124-7B-GGUF

NaNK
license:apache-2.0
78
4

open-instruct-human-mix-13b

NaNK
llama
78
1

Llama-3.1-Tulu-3-8B-SFT-no-math-data

NaNK
llama
78
1

tulu-v2.5-dpo-13b-uf-overall

NaNK
llama
78
0

OLMo-Ladder-760M-0.5xC

78
0

tulu-7b

NaNK
llama
77
9

llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm-mixed-prompts

NaNK
llama
77
2

open-instruct-cot-7b

NaNK
llama
77
1

open-instruct-gpt4-alpaca-7b

NaNK
llama
77
1

open-instruct-human-mix-30b

NaNK
llama
77
1

OLMo-7B-0424-Instruct-hf

NaNK
license:apache-2.0
77
1

dsp_roberta_base_tapt_imdb_70000

77
0

open-instruct-baize-7b

NaNK
llama
77
0

open-instruct-self-instruct-13b

NaNK
llama
77
0

open-instruct-oasst1-13b

NaNK
llama
77
0

open-instruct-sharegpt-13b

NaNK
llama
77
0

open-instruct-sharegpt-30b

NaNK
llama
77
0

tulu-v2.5-dpo-13b-uf-mean

NaNK
llama
77
0

tulu-v2.5-dpo-13b-helpsteer

NaNK
llama
77
0

tulu-v2.5-ppo-13b-uf-mean

NaNK
llama
77
0

DataDecide-fineweb-pro-530M

license:apache-2.0
77
0

open-instruct-sharegpt-7b

NaNK
llama
76
0

open-instruct-human-mix-7b

NaNK
llama
76
0

DataDecide-falcon-and-cc-qc-10p-300M

license:apache-2.0
76
0

open-instruct-stanford-alpaca-13b

NaNK
llama
75
2

open-instruct-dolly-7b

NaNK
llama
75
0

open-instruct-self-instruct-7b

NaNK
llama
75
0

llama-3-tulu-2-70b

NaNK
llama
75
0

tulu-65b

NaNK
llama
74
21

tulu-30b

NaNK
llama
74
18

open-instruct-sharegpt-65b

NaNK
llama
74
2

dsp_roberta_base_dapt_biomed_tapt_chemprot_4169

74
0

dsp_roberta_base_dapt_news_tapt_ag_115K

74
0

open-instruct-flan-v2-13b

NaNK
llama
74
0

llama-3.1-tulu-2-70b

NaNK
llama
74
0

DataDecide-dclm-baseline-300M

license:apache-2.0
74
0

tulu-v1-llama2-70b

NaNK
llama
73
1

dsp_roberta_base_dapt_reviews_tapt_imdb_70000

73
0

dsp_roberta_base_tapt_hyperpartisan_news_5015

73
0

dsp_roberta_base_tapt_hyperpartisan_news_515

73
0

open-instruct-sni-13b

NaNK
llama
73
0

tulu-v2.5-dpo-13b-prm-phase-2

NaNK
llama
73
0

tulu-v2.5-dpo-13b-chatbot-arena-2023

NaNK
llama
73
0

Llama-3-8B-Instruct-Analyzer

NaNK
llama
72
3

dsp_roberta_base_dapt_reviews_tapt_imdb_20000

72
0

dsp_roberta_base_tapt_rct_180K

72
0

open-instruct-baize-13b

NaNK
llama
72
0

DataDecide-c4-300M

license:apache-2.0
72
0

DataDecide-dclm-baseline-qc-7p-fw3-300M

license:apache-2.0
72
0

open-instruct-flan-v2-7b

NaNK
llama
71
1

open-instruct-unnatural-instructions-7b

NaNK
llama
71
1

dsp_roberta_base_tapt_imdb_20000

71
0

reviews_roberta_base

71
0

open-instruct-oasst1-7b

NaNK
llama
71
0

open-instruct-cot-13b

NaNK
llama
71
0

DataDecide-dclm-baseline-qc-20p-300M

license:apache-2.0
71
0

DataDecide-fineweb-edu-300M

license:apache-2.0
71
0

DataDecide-dclm-baseline-75p-dolma1.7-25p-300M

license:apache-2.0
71
0

DataDecide-dclm-baseline-25p-dolma1.7-75p-300M

license:apache-2.0
71
0

DataDecide-dclm-baseline-qc-fw-3p-300M

license:apache-2.0
71
0

DataDecide-falcon-300M

license:apache-2.0
71
0

dsp_roberta_base_dapt_reviews_tapt_amazon_helpfulness_115K

70
0

dsp_roberta_base_tapt_amazon_helpfulness_115K

70
0

DataDecide-dclm-baseline-75p-dolma1.7-25p-150M

license:apache-2.0
70
0

DataDecide-dclm-baseline-qc-7p-fw2-300M

license:apache-2.0
70
0

DataDecide-dclm-baseline-50p-dolma1.7-50p-300M

license:apache-2.0
70
0

DataDecide-falcon-and-cc-qc-orig-10p-300M

license:apache-2.0
70
0

DataDecide-falcon-and-cc-qc-tulu-10p-300M

license:apache-2.0
70
0

DataDecide-falcon-and-cc-300M

license:apache-2.0
70
0

wmt16-en-de-12-1

license:apache-2.0
69
1

open-instruct-gpt4-alpaca-13b

NaNK
llama
69
1

dsp_roberta_base_tapt_chemprot_4169

69
0

news_roberta_base

69
0

open-instruct-dolly-13b

NaNK
llama
69
0

DataDecide-dclm-baseline-qc-10p-300M

license:apache-2.0
69
0

DataDecide-falcon-and-cc-qc-20p-300M

license:apache-2.0
69
0

MolmoAct-7B-D-Captioner-0812

NaNK
license:apache-2.0
69
0

dsp_roberta_base_tapt_citation_intent_1688

68
0

dsp_roberta_base_tapt_rct_500

68
0

DataDecide-falcon-and-cc-qc-10p-150M

license:apache-2.0
68
0

DataDecide-falcon-and-cc-qc-tulu-10p-150M

license:apache-2.0
68
0

DataDecide-fineweb-pro-300M

license:apache-2.0
68
0

open-instruct-unnatural-instructions-13b

NaNK
llama
67
1

DataDecide-falcon-and-cc-qc-20p-150M

license:apache-2.0
67
1

dsp_roberta_base_dapt_cs_tapt_sciie_3219

67
0

dsp_roberta_base_dapt_news_tapt_hyperpartisan_news_515

67
0

DataDecide-dclm-baseline-150M

license:apache-2.0
67
0

DataDecide-falcon-and-cc-150M

license:apache-2.0
67
0

DataDecide-falcon-and-cc-qc-orig-10p-150M

license:apache-2.0
67
0

DataDecide-falcon-and-cc-qc-tulu-10p-1B

NaNK
license:apache-2.0
67
0

DataDecide-dclm-baseline-qc-10p-150M

license:apache-2.0
67
0

DataDecide-dclm-baseline-qc-fw-10p-300M

license:apache-2.0
67
0

DataDecide-dclm-baseline-qc-7p-fw3-150M

license:apache-2.0
66
0

DataDecide-fineweb-pro-150M

license:apache-2.0
66
0

DataDecide-dclm-baseline-25p-dolma1.7-75p-150M

license:apache-2.0
66
0

DataDecide-dclm-baseline-50p-dolma1.7-50p-150M

license:apache-2.0
66
0

DataDecide-dclm-baseline-qc-fw-10p-150M

license:apache-2.0
66
0

DataDecide-dclm-baseline-60M

license:apache-2.0
65
0

DataDecide-dclm-baseline-qc-fw-3p-150M

license:apache-2.0
65
0

DataDecide-falcon-150M

license:apache-2.0
65
0

DataDecide-dclm-baseline-25p-dolma1.7-75p-1B

NaNK
license:apache-2.0
65
0

DataDecide-dclm-baseline-qc-7p-fw3-1B

NaNK
license:apache-2.0
65
0

DataDecide-dclm-baseline-qc-7p-fw2-1B

NaNK
license:apache-2.0
65
0

DataDecide-dclm-baseline-75p-dolma1.7-25p-1B

NaNK
license:apache-2.0
65
0

DataDecide-dclm-baseline-qc-20p-1B

NaNK
license:apache-2.0
65
0

DataDecide-dclm-baseline-qc-fw-3p-1B

NaNK
license:apache-2.0
65
0

tulu-v2.5-dpo-13b-stackexchange-60k

NaNK
llama
64
1

dsp_roberta_base_tapt_ag_115K

64
0

DataDecide-fineweb-edu-150M

license:apache-2.0
64
0

DataDecide-dclm-baseline-qc-7p-fw2-150M

license:apache-2.0
64
0

DataDecide-falcon-and-cc-qc-10p-1B

NaNK
license:apache-2.0
64
0

DataDecide-fineweb-edu-1B

NaNK
license:apache-2.0
64
0

DataDecide-dclm-baseline-qc-20p-150M

license:apache-2.0
64
0

DataDecide-dclm-baseline-qc-fw-10p-1B

NaNK
license:apache-2.0
64
0

DataDecide-falcon-and-cc-qc-20p-90M

license:apache-2.0
63
0

DataDecide-dclm-baseline-1B

NaNK
license:apache-2.0
63
0

DataDecide-falcon-and-cc-1B

NaNK
license:apache-2.0
63
0

DataDecide-dolma1_7-no-flan-90M

license:apache-2.0
62
1

dsp_roberta_base_dapt_cs_tapt_citation_intent_1688

62
0

dsp_roberta_base_dapt_news_tapt_hyperpartisan_news_5015

62
0

DataDecide-falcon-and-cc-qc-tulu-10p-90M

license:apache-2.0
62
0

DataDecide-dclm-baseline-75p-dolma1.7-25p-60M

license:apache-2.0
62
0

DataDecide-dolma1_7-60M

license:apache-2.0
62
0

DataDecide-falcon-and-cc-qc-10p-60M

license:apache-2.0
62
0

DataDecide-dolma1_7-no-code-60M

license:apache-2.0
62
0

DataDecide-c4-60M

license:apache-2.0
62
0

DataDecide-fineweb-pro-90M

license:apache-2.0
62
0

DataDecide-falcon-and-cc-qc-orig-10p-1B

NaNK
license:apache-2.0
62
0

DataDecide-dclm-baseline-qc-10p-1B

NaNK
license:apache-2.0
62
0

DataDecide-dolma1_7-no-math-code-90M

license:apache-2.0
61
0

DataDecide-dolma1_7-no-math-code-60M

license:apache-2.0
61
0

DataDecide-falcon-and-cc-60M

license:apache-2.0
61
0

DataDecide-dclm-baseline-qc-7p-fw2-90M

license:apache-2.0
61
0

DataDecide-falcon-1B

NaNK
license:apache-2.0
61
0

DataDecide-dclm-baseline-50p-dolma1.7-50p-1B

NaNK
license:apache-2.0
61
0

DataDecide-dclm-baseline-50p-dolma1.7-50p-60M

license:apache-2.0
60
0

DataDecide-dclm-baseline-50p-dolma1.7-50p-90M

license:apache-2.0
60
0

DataDecide-dclm-baseline-qc-20p-90M

license:apache-2.0
60
0

DataDecide-dclm-baseline-qc-20p-60M

license:apache-2.0
60
0

DataDecide-dolma1_7-no-flan-60M

license:apache-2.0
60
0

DataDecide-falcon-and-cc-qc-20p-60M

license:apache-2.0
60
0

DataDecide-dclm-baseline-qc-10p-60M

license:apache-2.0
60
0

DataDecide-dolma1_7-no-reddit-60M

license:apache-2.0
60
0

DataDecide-dolma1_7-no-reddit-90M

license:apache-2.0
60
0

DataDecide-fineweb-edu-60M

license:apache-2.0
60
0

DataDecide-falcon-and-cc-qc-orig-10p-60M

license:apache-2.0
60
0

DataDecide-fineweb-pro-1B

NaNK
license:apache-2.0
60
0

DataDecide-dolma1_7-90M

license:apache-2.0
59
1

DataDecide-falcon-60M

license:apache-2.0
59
0

DataDecide-dclm-baseline-25p-dolma1.7-75p-60M

license:apache-2.0
59
0

DataDecide-falcon-and-cc-qc-10p-90M

license:apache-2.0
59
0

DataDecide-dclm-baseline-qc-7p-fw3-60M

license:apache-2.0
59
0

DataDecide-dclm-baseline-qc-fw-10p-60M

license:apache-2.0
59
0

DataDecide-dolma1_6plus-60M

license:apache-2.0
59
0

DataDecide-dolma1_6plus-90M

license:apache-2.0
59
0

DataDecide-fineweb-edu-90M

license:apache-2.0
59
0

DataDecide-falcon-and-cc-qc-orig-10p-90M

license:apache-2.0
59
0

DataDecide-dolma1_7-no-code-90M

license:apache-2.0
59
0

DataDecide-dclm-baseline-qc-7p-fw2-60M

license:apache-2.0
59
0

DataDecide-fineweb-pro-60M

license:apache-2.0
59
0

DataDecide-falcon-and-cc-qc-20p-1B

NaNK
license:apache-2.0
59
0

llama-3-tulu-2-8b-uf-mean-rm

NaNK
llama
58
0

DataDecide-dclm-baseline-qc-fw-3p-60M

license:apache-2.0
58
0

DataDecide-dclm-baseline-75p-dolma1.7-25p-90M

license:apache-2.0
58
0

DataDecide-falcon-and-cc-90M

license:apache-2.0
58
0

DataDecide-dclm-baseline-qc-10p-90M

license:apache-2.0
58
0

DataDecide-dclm-baseline-qc-fw-10p-90M

license:apache-2.0
58
0

DataDecide-dclm-baseline-90M

license:apache-2.0
58
0

Olmo-3-7B-RL-Zero-Code

NaNK
license:apache-2.0
57
6

Olmo-3-7B-RLZero-Code

NaNK
license:apache-2.0
57
6

DataDecide-falcon-and-cc-qc-tulu-10p-60M

license:apache-2.0
57
1

DataDecide-dclm-baseline-qc-fw-3p-90M

license:apache-2.0
57
0

DataDecide-falcon-90M

license:apache-2.0
57
0

DataDecide-dclm-baseline-25p-dolma1.7-75p-90M

license:apache-2.0
57
0

DataDecide-dclm-baseline-qc-7p-fw3-90M

license:apache-2.0
57
0

macaw-large

license:apache-2.0
55
14

aspire-biencoder-biomed-spec

license:apache-2.0
53
0

Llama-3.1-Tulu-3-8B-RL-RM-RB2

NaNK
llama
53
0

System4_explain_FigLang2022

license:cc-by-4.0
51
2

PRIMERA-multixscience

49
5

led-base-16384-ms2

NaNK
49
4

aspire-sentence-embedder

48
3

t5-small-next-word-generator-qoogle

48
2

unifiedqa-t5-3b

NaNK
47
0

Llama-3.1-Tulu-3-8B-DPO-RM-RB2

NaNK
llama
45
0

macaw-11b

NaNK
license:apache-2.0
44
7

entailer-large

license:apache-2.0
44
2

longformer-base-4096-extra.pos.embd.only

44
1

Llama-3.1-Tulu-3-8B-SFT-RM-RB2

NaNK
llama
44
0

uio2-large

43
2

longformer-scico

license:apache-2.0
42
3

Llama-3.1-70B-Instruct-RM-RB2

NaNK
llama
41
1

System2_FigLang2022

license:cc-by-4.0
40
0

tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value

NaNK
llama
40
0

uio2-xxl

39
2

DataDecide-dclm-baseline-qc-7p-fw3-20M

license:apache-2.0
38
0

DataDecide-fineweb-pro-20M

license:apache-2.0
38
0

led-base-16384-cochrane

37
3

DataDecide-fineweb-pro-8M

license:apache-2.0
37
1

tulu-v2.5-13b-preference-mix-rm

NaNK
llama
37
0

tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value

NaNK
llama
37
0

DataDecide-falcon-and-cc-14M

license:apache-2.0
37
0

DataDecide-dclm-baseline-25p-dolma1.7-75p-6M

license:apache-2.0
37
0

OLMo-7B-Twin-2T

NaNK
license:apache-2.0
36
21

PRIMERA-wcep

36
2

wmt16-en-de-dist-12-1

license:apache-2.0
36
1

primera-multi_lexsum-source-long

36
1

DataDecide-falcon-and-cc-qc-orig-10p-4M

license:apache-2.0
36
1

longformer-large-4096-extra.pos.embd.only

36
0