allenai

✓ VerifiedResearch Lab

Allen Institute for AI, nonprofit AI research

500 models • 29 total models in database

Sort by:

longformer-base-4096

--- language: en license: apache-2.0 ---

Quantized to FP8 Version of olmOCR-2-7B-1025, using llmcompressor. This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been additionally fine tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases. Quick links: - 📃 Paper - 🤗 SFT Dataset - 🤗 RL Dataset - 🛠️ Code - 🎮 Demo The best way to use this model is via the olmOCR toolkit. The toolkit comes with an efficient inference setup via VLLM that can handle millions of documents at scale. This model scores the following scores on olmOCR-bench when used with the olmOCR toolkit toolkit which automatically renders, rotates, and retries pages as needed. Model ArXiv Old Scans Math Tables Old Scans Headers and Footers Multi column Long tiny text Base Overall olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025 82.9 82.1 84.3 48.3 95.7 84.3 81.4 99.7 82.3 ± 1.1 olmOCR pipeline v0.4.0 with olmOCR-2-7B-1025-FP8 83.0 82.3 84.9 47.7 96.1 83.7 81.9 99.7 82.4 ± 1.1 This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit. If you want to prompt this model manually instead of using the olmOCR toolkit, please see the code below. In normal usage, the olmOCR toolkit builds the prompt by rendering the PDF page, and extracting relevant text blocks and image metadata. To duplicate that you will need to This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.

NaNK

Tülu3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques. Tülu3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Llama 3.1 Community License Agreement - Finetuned from model: meta-llama/Llama-3.1-8B - Training Repository: https://github.com/allenai/open-instruct - Eval Repository: https://github.com/allenai/olmes - Paper: https://arxiv.org/abs/2411.15124 - Demo: https://playground.allenai.org/ | Stage | Llama 3.1 8B | Llama 3.1 70B | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | Base Model | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-70B | | SFT | allenai/Llama-3.1-Tulu-3-8B-SFT | allenai/Llama-3.1-Tulu-3-70B-SFT | | DPO | allenai/Llama-3.1-Tulu-3-8B-DPO | allenai/Llama-3.1-Tulu-3-70B-DPO | | Final Models (RLVR) | allenai/Llama-3.1-Tulu-3-8B | allenai/Llama-3.1-Tulu-3-70B | | Reward Model (RM)| allenai/Llama-3.1-Tulu-3-8B-RM | (Same as 8B) | | Stage | Llama 3.1 405B | |-----------|-------------------| | Base Model | meta-llama/llama-3.1-405B | | SFT | allenai/llama-3.1-Tulu-3-405B-SFT | | DPO | allenai/llama-3.1-Tulu-3-405B-DPO | | Final Model (RLVR) | allenai/llama-3.1-Tulu-3-405B | | Reward Model (RM)| (Same as 8B) To load the model with HuggingFace, use the following snippet: As a Llama base model, the model can be easily served with: Note that given the long chat template of Llama, you may want to use `--maxmodellen=8192`. It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. The Tülu3 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base Llama 3.1 models, however it is likely to have included a mix of Web data and technical sources like books and code. See the Falcon 180B model card for an example of this. | Benchmark (eval) | Tülu 3 SFT 8B | Tülu 3 DPO 8B | Tülu 3 8B | Llama 3.1 8B Instruct | Qwen 2.5 7B Instruct | Magpie 8B | Gemma 2 9B Instruct | Ministral 8B Instruct | |---------------------------------|----------------|----------------|------------|------------------------|----------------------|-----------|---------------------|-----------------------| | Avg. | 60.4 | 64.4 | 64.8 | 62.2 | 57.8 | 44.7 | 55.2 | 58.3 | | MMLU (0 shot, CoT) | 65.9 | 68.7 | 68.2 | 71.2 | 76.6 | 62.0 | 74.6 | 68.5 | | PopQA (15 shot) | 29.3 | 29.3 | 29.1 | 20.2 | 18.1 | 22.5 | 28.3 | 20.2 | | TruthfulQA (6 shot) | 46.8 | 56.1 | 55.0 | 55.1 | 63.1 | 57.0 | 61.4 | 55.5 | | BigBenchHard (3 shot, CoT) | 67.9 | 65.8 | 66.0 | 62.8 | 21.7 | 0.9 | 2.5 | 56.2 | | DROP (3 shot) | 61.3 | 62.5 | 62.6 | 61.5 | 54.4 | 49.4 | 58.8 | 56.2 | | MATH (4 shot CoT, Flex) | 31.5 | 42.0 | 43.7 | 42.5 | 14.8 | 5.1 | 29.8 | 40.0 | | GSM8K (8 shot, CoT) | 76.2 | 84.3 | 87.6 | 83.4 | 83.8 | 61.2 | 79.7 | 80.0 | | HumanEval (pass@10) | 86.2 | 83.9 | 83.9 | 86.3 | 93.1 | 75.4 | 71.7 | 91.0 | | HumanEval+ (pass@10) | 81.4 | 78.6 | 79.2 | 82.9 | 89.7 | 69.1 | 67.0 | 88.5 | | IFEval (prompt loose) | 72.8 | 81.1 | 82.4 | 80.6 | 74.7 | 38.8 | 69.9 | 56.4 | | AlpacaEval 2 (LC % win) | 12.4 | 33.5 | 34.5 | 24.2 | 29.0 | 49.0 | 43.7 | 31.4 | | Safety (6 task avg.) | 93.1 | 87.2 | 85.5 | 75.2 | 75.0 | 46.4 | 75.5 | 56.2 | | Benchmark (eval) | Tülu 3 70B SFT | Tülu 3 DPO 70B | Tülu 3 70B | Llama 3.1 70B Instruct | Qwen 2.5 72B Instruct | Hermes 3 Llama 3.1 70B | Nemotron Llama 3.1 70B | |---------------------------------|-----------------|-----------------|-------------|-------------------------|-----------------------|------------------------|-------------------------| | Avg. | 72.6 | 75.9 | 76.0 | 73.4 | 71.5 | 68.3 | 65.5 | | MMLU (0 shot, CoT) | 78.9 | 83.3 | 83.1 | 85.3 | 85.5 | 80.4 | 83.8 | | PopQA (15 shot) | 48.6 | 46.3 | 46.5 | 46.4 | 30.6 | 48.1 | 36.4 | | TruthfulQA (6 shot) | 55.7 | 67.9 | 67.6 | 66.8 | 69.9 | 66.5 | 62.6 | | BigBenchHard (3 shot, CoT) | 82.7 | 81.8 | 82.0 | 73.8 | 67.2 | 82.1 | 0.7 | | DROP (3 shot) | 77.2 | 74.1 | 74.3 | 77.0 | 34.2 | 73.2 | 68.8 | | MATH (4 shot CoT, Flex) | 53.7 | 62.3 | 63.0 | 56.4 | 74.3 | 41.9 | 55.0 | | GSM8K (8 shot, CoT) | 91.1 | 93.5 | 93.5 | 93.7 | 89.5 | 90.0 | 84.7 | | HumanEval (pass@10) | 92.9 | 92.4 | 92.4 | 93.6 | 94.0 | 89.6 | 94.1 | | HumanEval+ (pass@10) | 87.3 | 88.4 | 88.0 | 89.5 | 90.8 | 85.9 | 85.5 | | IFEval (prompt loose) | 82.1 | 82.6 | 83.2 | 88.0 | 87.6 | 76.0 | 79.9 | | AlpacaEval 2 (LC % win) | 26.3 | 49.6 | 49.8 | 33.4 | 47.7 | 28.4 | 66.1 | | Safety (6 task avg.) | 94.4 | 89.0 | 88.3 | 76.5 | 87.0 | 57.9 | 69.0 | | Benchmark (eval) | Tülu 3 405B SFT | Tülu 3 405B DPO | Tülu 3 405B | Llama 3.1 405B Instruct | Nous Hermes 3 405B | Deepseek V3 | GPT 4o (11-24) | |-----------------|----------------|----------------|-------------|------------------------|-------------------|-------------|----------------| | Avg w/o Safety | 76.3 | 79.0 | 80.0 | 78.1 | 74.4 | 79.0 | 80.5 | | Avg w/ Safety | 77.5 | 79.6 | 80.7 | 79.0 | 73.5 | 75.9 | 81.6 | | MMLU (5 shot, CoT) | 84.4 | 86.6 | 87.0 | 88.0 | 84.9 | 82.1 | 87.9 | | PopQA (3 shot) | 55.7 | 55.4 | 55.5 | 52.9 | 54.2 | 44.9 | 53.6 | | BigBenchHard (0 shot, CoT) | 88.0 | 88.8 | 88.6 | 87.1 | 87.7 | 89.5 | 83.3 | | MATH (4 shot, Flex) | 63.4 | 59.9 | 67.3 | 66.6 | 58.4 | 72.5 | 68.8 | | GSM8K (8 shot, CoT) | 93.6 | 94.2 | 95.5 | 95.4 | 92.7 | 94.1 | 91.7 | | HumanEval (pass@10) | 95.7 | 97.2 | 95.9 | 95.9 | 92.3 | 94.6 | 97.0 | | HumanEval+ (pass@10) | 93.3 | 93.9 | 92.9 | 90.3 | 86.9 | 91.6 | 92.7 | | IFEval (prompt loose) | 82.4 | 85.0 | 86.0 | 88.4 | 81.9 | 88.0 | 84.8 | | AlpacaEval 2 (LC % win) | 30.4 | 49.8 | 51.4 | 38.5 | 30.2 | 53.5 | 65.0 | | Safety (6 task avg.) | 87.7 | 85.5 | 86.7 | 86.8 | 65.8 | 72.2 | 90.9 | SFT: - Learning Rate: 5E-6 (8B), 2E-6 (70B, 405B) - Effective Batch Size: 128 (8B, 70B), 256 (405B) - Max. Sequence Length: 4096 - Loss Accumulation: Sum (see https://unsloth.ai/blog/gradient) - Learning Rate Schedule: Linear - LR Warmup Ratio: 0.03 - Num. Epochs: 2 All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement. Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. Tülu3 is intended for research and educational use. For more information, please see our Responsible Use Guidelines. If Tülu3 or any of the related materials were helpful to your work, please cite:

NaNK

llama

41,394

FlexOlmo-7x7B-1T

NaNK

license:apache-2.0

39,433

OLMoE-1B-7B-0924-Instruct

> OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in September 2024 (0924) that has been adapted via SFT and DPO from OLMoE-1B-7B. It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source. This information and more can also be found on the OLMoE GitHub repository. - Paper: https://arxiv.org/abs/2409.02060 - Pretraining Checkpoints, Code, Data and Logs. - SFT (Supervised Fine-Tuning) Checkpoints, Code, Data and Logs. - DPO/KTO (Direct Preference Optimization/Kahneman-Tversky Optimization), Checkpoints, Preference Data, DPO code, KTO code and Logs. Install `transformers` from source until a release after this PR & `torch` and run: Branches: - `main`: Preference tuned via DPO model of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT (`main` branch) - `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT - `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924) - `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto` branches correspond to the other checkpoints mentioned in the paper. | Task (→) | MMLU | GSM8k | BBH | Human-Eval | Alpaca-Eval 1.0 | XSTest | IFEval | Avg | |---------------|------|-------|------|------------|-----------------|--------|--------|------| | Setup (→) | 0-shot | 8-shot CoT | 3-shot | 0-shot | 0-shot | 0-shot | 0-shot | | | Metric (→) | EM | EM | EM | Pass@10 | %win | F1 | Loose Acc | | | | | | | | | | | | | OLMo-1B (0724) | 25.0 | 7.0 | 22.5 | 16.0 | - | 67.6 | 20.5 | - | | +SFT | 36.0 | 12.5 | 27.2 | 21.2 | 41.5 | 81.9 | 26.1 | 35.9 | | +DPO | 36.7 | 12.5 | 30.6 | 22.0 | 50.9 | 79.8 | 24.2 | 37.4 | | OLMo-7B (0724) | 50.8 | 32.5 | 36.9 | 32.3 | - | 80.8 | 19.6 | - | | +SFT | 54.2 | 25.0 | 35.7 | 38.5 | 70.9 | 86.1 | 39.7 | 49.3 | | +DPO | 52.8 | 9.0 | 16.6 | 35.0 | 83.5 | 87.5 | 37.9 | 49.1 | | JetMoE-2B-9B | 45.6 | 43.0 | 37.2 | 54.6 | - | 68.2 | 20.0 | - | | +SFT | 46.1 | 53.5 | 35.6 | 64.8 | 69.3 | 55.6 | 30.5 | 50.4 | | DeepSeek-3B-16B | 37.7 | 18.5 | 39.4 | 48.3 | - | 65.9 | 13.5 | - | | +Chat | 48.5 | 46.5 | 40.8 | 70.1 | 74.8 | 85.6 | 32.3 | 57.0 | | Qwen1.5-3B-14B | 60.4 | 13.5 | 27.2 | 60.2 | - | 73.4 | 20.9 | - | | +Chat | 58.9 | 55.5 | 21.3 | 59.7 | 83.9 | 85.6 | 36.2 | 57.3 | | OLMoE (This Model) | 49.8 | 3.0 | 33.6 | 22.4 | - | 59.7 | 16.6 | - | | +SFT | 51.4 | 40.5 | 38.0 | 51.6 | 69.2 | 84.1 | 43.3 | 54.0 | | +DPO | 51.9 | 45.5 | 37.0 | 54.8 | 84.0 | 82.6 | 48.1 | 57.7 |

NaNK

license:apache-2.0

36,854

OLMo 2 32B Instruct March 2025 is post-trained variant of the OLMo-2 32B March 2025 model, which has undergone supervised finetuning on an OLMo-specific variant of the Tülu 3 dataset, further DPO training on this dataset, and final RLVR training on this dataset. Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. Check out the OLMo 2 paper or Tülu 3 paper for more details! OLMo is a series of Open Language Models designed to enable the science of language models. These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs, and associated training details. - Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. - Language(s) (NLP): Primarily English - License: Apache 2.0 - Finetuned from model: allenai/OLMo-2-0325-32B-DPO - Project Page: https://allenai.org/olmo - Repositories: - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo-core - Evaluation code: https://github.com/allenai/olmes - Further fine-tuning code: https://github.com/allenai/open-instruct - Paper: https://arxiv.org/abs/2501.00656 - Demo: https://playground.allenai.org/ OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: To load the model with HuggingFace, use the following snippet: NOTE: This is different than previous OLMo 2 and Tülu 3 models due to a minor change in configuration. It does NOT have the bos token before the rest. Our other models have at the beginning of the chat template. It is embedded within the tokenizer as well, for `tokenizer.applychattemplate`. In Ai2 demos, we use this system prompt by default: The model has not been trained with a specific system prompt in mind. To facilitate research on RL finetuning, we have released our intermediate checkpoints during the model's RLVR training. The model weights are saved every 20 training steps, and can be accessible in the revisions of the HuggingFace repository. For example, you can load with: The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). See the Falcon 180B model card for an example of this. | Model | Average | AlpacaEval 2 LC | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA | |-------|---------|------|-----|------|-------|--------|------|------|--------|-------|---------| | Closed API models | | | | | | | | | | | | | GPT-3.5 Turbo 0125 | 59.6 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9 | | GPT 4o Mini 2024-07-18 | 65.7 | 49.7 | 65.9 | 36.3 | 83.0 | 83.5 | 67.9 | 82.2 | 84.9 | 39.0 | 64.8 | | Open weights models | | | | | | | | | | | | | Mistral-Nemo-Instruct-2407 | 50.9 | 45.8 | 54.6 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 | | Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 | | Gemma-2-27b-it | 61.3 | 49.0 | 72.7 | 67.5 | 80.7 | 63.2 | 35.1 | 70.7 | 75.9 | 33.9 | 64.6 | | Qwen2.5-32B | 66.5 | 39.1 | 82.3 | 48.3 | 87.5 | 82.4 | 77.9 | 84.7 | 82.4 | 26.1 | 70.6 | | Mistral-Small-24B | 67.6 | 43.2 | 80.1 | 78.5 | 87.2 | 77.3 | 65.9 | 83.7 | 66.5 | 24.4 | 68.1 | | Llama-3.1-70B | 70.0 | 32.9 | 83.0 | 77.0 | 94.5 | 88.0 | 56.2 | 85.2 | 76.4 | 46.5 | 66.8 | | Llama-3.3-70B | 73.0 | 36.5 | 85.8 | 78.0 | 93.6 | 90.8 | 71.8 | 85.9 | 70.4 | 48.2 | 66.1 | | Gemma-3-27b-it | - | 63.4 | 83.7 | 69.2 | 91.1 | - | - | 81.8 | - | 30.9 | - | | Fully open models | | | | | | | | | | | | | OLMo-2-7B-1124-Instruct | 55.7 | 31.0 | 48.5 | 58.9 | 85.2 | 75.6 | 31.3 | 63.9 | 81.2 | 24.6 | 56.3 | | OLMo-2-13B-1124-Instruct | 61.4 | 37.5 | 58.4 | 72.1 | 87.4 | 80.4 | 39.7 | 68.6 | 77.5 | 28.8 | 63.9 | | OLMo-2-32B-0325-SFT | 61.7 | 16.9 | 69.7 | 77.2 | 78.4 | 72.4 | 35.9 | 76.1 | 93.8 | 35.4 | 61.3 | | OLMo-2-32B-0325-DPO | 68.8 | 44.1 | 70.2 | 77.5 | 85.7 | 83.8 | 46.8 | 78.0 | 91.9 | 36.4 | 73.5 | | OLMo-2-32B-0325-Instruct | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 | Below is the training curves for `allenai/OLMo-2-0325-32B-Instruct`. The model was trained using 5 8xH100 nodes. Below are the core eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct` (note we took step `320` as the final checkpoint, corresponding to episode `573,440`): Below are the other eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct`: The command below is copied directly from the tracked training job: OLMo 2 is licensed under the Apache 2.0 license. OLMo 2 is intended for research and educational use. For more information, please see our Responsible Use Guidelines. This model has been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: Gemma Terms of Use.

OLMoE-1B-7B-0924

This model is licensed under the Apache 2.0 license and is designed for the English language.

olmOCR-7B-0725-FP8

NaNK

license:apache-2.0

11,123

scibert_scivocab_cased

—

9,801

olmOCR-7B-0225-preview

This is a preview release of the olmOCR model that's fine tuned from Qwen2-VL-7B-Instruct using the olmOCR-mix-0225 dataset. Quick links: - 📃 Paper - 🤗 Dataset - 🛠️ Code - 🎮 Demo The best way to use this model is via the olmOCR toolkit. The toolkit comes with an efficient inference setup via sglang that can handle millions of documents at scale. This model expects as input a single document image, rendered such that the longest dimension is 1024 pixels. The prompt must then contain the additional metadata from the document, and the easiest way to generate this is to use the methods provided by the olmOCR toolkit. If you want to prompt this model manually instead of using the olmOCR toolkit, please see the code below. In normal usage, the olmOCR toolkit builds the prompt by rendering the PDF page, and extracting relevant text blocks and image metadata. To duplicate that you will need to olmOCR is licensed under the Apache 2.0 license. olmOCR is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

tulu-2-dpo-7b

NaNK

llama

8,411

led-large-16384

license:apache-2.0

8,070

specter2_aug2023refresh_base

license:apache-2.0

7,978

tk-instruct-11b-def

NaNK

license:apache-2.0

7,701

tulu-2-7b

NaNK

llama

7,439

olmOCR-7B-0825

This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-0225 dataset. Quick links: - 📃 Paper - 🤗 Dataset - 🛠️ Code - 🎮 Demo The best way to use ...

OLMo-2-0325-32B

longformer-large-4096

—

4,052

OLMo-7B

NaNK

Flex-reddit-2x7B-1T

NaNK

license:apache-2.0

2,661

specter2

—

2,649

OLMo-2-1124-13B-Instruct

Olmo-3-7B-Think-SFT

NaNK

license:apache-2.0

1,749

OLMo-2-1124-7B-DPO

OLMo-2-0425-1B-RLVR1

NaNK

license:apache-2.0

1,438

OLMo-2-1124-13B-DPO

NaNK

license:apache-2.0

1,413

Olmo-3-7B-Instruct-DPO

NaNK

license:apache-2.0

1,382

OLMo-7B-0724-hf

NaNK

license:apache-2.0

1,360

led-large-16384-arxiv

license:apache-2.0

1,138

Molmo-72B-0924

specter2_aug2023refresh

—

900

open-instruct-human-mix-65b

NaNK

llama

857

open-instruct-pythia-6.9b-tulu

This model is a 6.9B Pythia model finetuned on a mixture of instruction datasets (FLAN V2, CoT, Dolly, Open Assistant 1, GPT4-Alpaca, Code-Alpaca, and ShareGPT). This was trained as part of the paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources. The codebase used to train and evaluate this model can be found at https://github.com/allenai/open-instruct. This model is licensed under the AI model license given in LICENSE.txt, with the original model license at pythialicense.txt. Usage Simply download and use - this model is not a diff, unlike the other open-instruct models. The model is trained to use the following format (note the newlines): For best results, format all inputs in this manner. Make sure to include a newline after ` `, this can affect generation quality quite a bit. Here is the performance of this model across benchmarks explored in our paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources: | MMLU 0-shot | MMLU 5-shot | GSM Direct | GSM CoT | BBH Direct | BBH CoT | TydiQA Gold-Passage | TydiQA Closed-book | Codex-Eval Pass@1 | Codex-Eval Pass@10 | AlpacaFarm vs Davinci-003 | Average | |:-----------:|:-----------:|:----------:|:-------:|:----------:|:-------:|:-------------------:|:------------------:|:-----------------:|:------------------:|:-------------------------:|---------| | 34.1 | 34.6 | 3.5 | 15.5 | 31.3 | 27.8 | 33.4 | 3.8 | 14.3 | 21.4 | 9.2 | 19.8 | If you use this model, please cite our work, the Pythia paper, and the original datasets:

NaNK

—

854

unifiedqa-t5-base

—

753

unifiedqa-t5-large

—

749

digital-socrates-7b

NaNK

llama

735

olmOCR-7B-0225-preview-GGUF

NaNK

—

729

digital-socrates-13b

NaNK

llama

722

MolmoAct 7B D 0812

NaNK

license:apache-2.0

714

Llama-3.1-Tulu-3-70B

License: llama3.1 Language: en

NaNK

llama

708

Llama-3.1-Tulu-3.1-8B

NaNK

llama

697

OLMoE-1B-7B-0125-Instruct-GGUF

NaNK

—

697

OLMo-2-1124-7B-Instruct-GGUF

NaNK

license:apache-2.0

582

This model is licensed under Apache 2.0 and is associated with the dataset allenai/dolma.

NaNK

license:apache-2.0

407

OLMo-7B-Twin-2T-hf

NaNK

license:apache-2.0

397

Llama-3.1-Tulu-3-8B-RM

License: llama3.1 Language: en

NaNK

llama

378

GraspMolmo

NaNK

license:mit

369

olmOCR-7B-0225-preview-FP8

unifiedqa-v2-t5-3b-1363200

NaNK

—

192

longformer-large-4096-finetuned-triviaqa

—

184

OLMo-2-1124-13B-Instruct-preview

OLMo-7B-0724-SFT-hf

NaNK

license:apache-2.0

168

cs_roberta_base

—

168

MolmoAct-7B-D-LIBERO-Long-0812

MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family here. Learn more about MolmoAct in our announcement blog post or the paper. MolmoAct 7B-D LIBERO-Long is based on Qwen2.5-7B and uses SigLip2 as the vision backbone, which is initialized using Molmo's pre-training approach. It is first pre-trained on MolmoAct's Pre-training Mixture, then mid-trained on MolmoAct Dataset, and finally post-trained on LIBERO-Long. This model is intended to be used for replicating our results on LIBERO-Long. This checkpoint is a preview of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility. Quick links: - 📂 All Models - 📂 All Data - 📄 Paper - 💻 Code - 🎥 Blog Post - 🎥 Video This model is licensed under Apache 2.0. It is intended for research and educational use. For more information, please see our Responsible Use Guidelines. MolmoAct offers the ability to inspect a visual trace of its intended actions in space before they occur, allowing users to ensure safe behavior by proactively auditing and adjusting the actions of any hardware acting under the model’s instructions. MolmoAct’s action space is bounded within the data provided, and compliance is built into the model to prevent excessive force when resistance is detected. Please follow the hardware manufacturer’s guidelines when using this model with a robot and perform all operations in a safely configured environment.

NaNK

license:apache-2.0

166