princeton-nlp
✓ VerifiedResearch LabPrinceton Natural Language Processing Group
sup-simcse-bert-base-uncased
unsup-simcse-bert-base-uncased
sup-simcse-roberta-large
sup-simcse-roberta-base
Llama-3-8B-ProLong-64k-Instruct
ProLong ( Pr incet o n long -context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET). To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively). Authors: Tianyu Gao\, Alexander Wettig\, Howard Yen, Danqi Chen ( equal contribution) - princetonnlp/Llama-3-8B-ProLong-64k-Base - princetonnlp/Llama-3-8B-ProLong-64k-Instruct ← you are here! - princetonnlp/Llama-3-8B-ProLong-512k-Base - ⭐ princetonnlp/Llama-3-8B-ProLong-512k-Instruct Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct. Base model: meta-llama/Meta-Llama-3-8B-Instruct Long-context continued training: 20B tokens on 64K training data (princeton-nlp/prolong-data-64K), and 20B tokens on 512K training data (princeton-nlp/prolong-data-512K) Supervised fine-tuning (SFT): UltraChat Maximum context window: 512K tokens ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.
Llama-3-8B-ProLong-512k-Instruct
ProLong ( Pr incet o n long -context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET). To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively). Authors: Tianyu Gao\, Alexander Wettig\, Howard Yen, Danqi Chen ( equal contribution) - princetonnlp/Llama-3-8B-ProLong-64k-Base - princetonnlp/Llama-3-8B-ProLong-64k-Instruct - princetonnlp/Llama-3-8B-ProLong-512k-Base - ⭐ princetonnlp/Llama-3-8B-ProLong-512k-Instruct ← you are here! Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct. Base model: meta-llama/Meta-Llama-3-8B-Instruct Long-context continued training: 20B tokens on 64K training data (princeton-nlp/prolong-data-64K), and 20B tokens on 512K training data (princeton-nlp/prolong-data-512K) Supervised fine-tuning (SFT): UltraChat Maximum context window: 512K tokens ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.
Llama-3-8B-ProLong-64k-Base
ProLong ( Pr incet o n long -context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET). To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively). Authors: Tianyu Gao\, Alexander Wettig\, Howard Yen, Danqi Chen ( equal contribution) - princetonnlp/Llama-3-8B-ProLong-64k-Base ← you are here! - princetonnlp/Llama-3-8B-ProLong-64k-Instruct - princetonnlp/Llama-3-8B-ProLong-512k-Base - ⭐ princetonnlp/Llama-3-8B-ProLong-512k-Instruct Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct. Base model: meta-llama/Meta-Llama-3-8B-Instruct Long-context continued training: 20B tokens on 64K training data (princeton-nlp/prolong-data-64K), and 20B tokens on 512K training data (princeton-nlp/prolong-data-512K) Supervised fine-tuning (SFT): UltraChat Maximum context window: 512K tokens ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.
Llama-3-8B-ProLong-512k-Base
ProLong ( Pr incet o n long -context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET). To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively). Authors: Tianyu Gao\, Alexander Wettig\, Howard Yen, Danqi Chen ( equal contribution) - princetonnlp/Llama-3-8B-ProLong-64k-Base - princetonnlp/Llama-3-8B-ProLong-64k-Instruct - princetonnlp/Llama-3-8B-ProLong-512k-Base ← you are here! - ⭐ princetonnlp/Llama-3-8B-ProLong-512k-Instruct Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct. Base model: meta-llama/Meta-Llama-3-8B-Instruct Long-context continued training: 20B tokens on 64K training data (princeton-nlp/prolong-data-64K), and 20B tokens on 512K training data (princeton-nlp/prolong-data-512K) Supervised fine-tuning (SFT): UltraChat Maximum context window: 512K tokens ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.
Sheared-LLaMA-1.3B
Paper: https://arxiv.org/pdf/2310.06694.pdf Code: https://github.com/princeton-nlp/LLM-Shearing Models: Sheared-LLaMA-1.3B, Sheared-LLaMA-2.7B Pruned Models without Continued Pre-training: Sheared-LLaMA-1.3B-Pruned, Sheared-LLaMA-2.7B-Pruned Instruction-tuned Models: Sheared-LLaMA-1.3B-ShareGPT, Sheared-LLaMA-2.7B-ShareGPT Sheared-LLaMA-1.3B is a model pruned and further pre-trained from meta-llama/Llama-2-7b-hf. We dynamically load data from different domains in the RedPajama dataset to prune and contune pre-train the model. We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded with HuggingFace via - Smaller-scale - Same vocabulary as LLaMA1 and LLaMA2 - Derived with a budget of 50B tokens by utilizing existing strong LLMs We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models. | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | LLaMA2-7B | 2T | 64.6 | | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | OPT-1.3B | 300B | 48.2 | | Pythia-1.4B | 300B | 48.9 | | Sheared-LLaMA-1.3B | 50B | 51.0 | | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | OPT-2.7B | 300B | 51.4 | | Pythia-2.8B | 300B | 52.5 | | INCITE-Base-3B | 800B | 54.7 | | Open-LLaMA-3B-v1 | 1T | 55.1 | | Open-LLaMA-3B-v2 | 1T | 55.7 | | Sheared-LLaMA-2.7B | 50B | 56.7 | Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 31.47 | | ARC (25-shot) | 32.85 | | HellaSwag (10-shot) | 60.91 | | MMLU (5-shot) | 25.71 | | TruthfulQA (0-shot) | 37.14 | | Winogrande (5-shot) | 58.64 | | GSM8K (5-shot) | 0.45 | | DROP (3-shot) | 4.56 |
unsup-simcse-roberta-base
warm-start__sft__nothink__Llama-3.1-8B-Instruct
gemma-2-9b-it-SimPO
SimPO (Simple Preference Optimization) is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. Please refer to our preprint and github repo for more details. We fine-tuned google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm with the SimPO objective. - Developed by: Yu Meng, Mengzhou Xia, Danqi Chen - Model type: Causal Language Model - License: gemma - Finetuned from model: google/gemma-2-9b-it - Repository: https://github.com/princeton-nlp/SimPO - Paper: https://arxiv.org/pdf/2405.14734 We use princeton-nlp/gemma2-ultrafeedback-armorm as the preference optimization dataset. The hyperparameters used can be found in the training script. Fine-tuning the google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm takes around 100 mins to finish on 8xH100 GPUs. | models | AE2 LC | AE2 WR | AE2 Length | AH | AH Length | GSM | GSM Length | MMLU | MMLU Length | |-----------------------------------|:------:|:------:|:----------:|:----:|:---------:|:----:|:----------:|:----:|:-----------:| | google/gemma-2-9b-it | 51.1 | 38.1 | 1571 | 40.8 | 545 | 87.4 | 395 | 72.7 | 515 | | princeton-nlp/gemma-2-9b-it-DPO | 67.8 | 65.4 | 2016 | 58.9 | 717 | 88.5 | 392 | 72.2 | 624 | | princeton-nlp/gemma-2-9b-it-SimPO | 72.4 | 65.9 | 1833 | 59.1 | 693 | 88.0 | 341 | 72.2 | 441 | The model architecture is based on google/gemma-2-9b-it. We use the SimPO training objective proposed in our preprint. Training was done using the alignment-handbook library.
QuRater-1.3B
Llama-3-Base-8B-SFT
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Sheared-LLaMA-2.7B-ShareGPT
Sheared-LLaMA-2.7B
Paper: https://arxiv.org/pdf/2310.06694.pdf Code: https://github.com/princeton-nlp/LLM-Shearing Models: Sheared-LLaMA-1.3B, Sheared-LLaMA-2.7B Pruned Models without Continued Pre-training: Sheared-LLaMA-1.3B-Pruned, Sheared-LLaMA-2.7B-Pruned Instruction-tuned Models: Sheared-LLaMA-1.3B-ShareGPT, Sheared-LLaMA-2.7B-ShareGPT Sheared-LLaMA-2.7B is a model pruned and further pre-trained from meta-llama/Llama-2-7b-hf. We dynamically load data from different domains in the RedPajama dataset. We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded into huggingface via - Smaller-scale - Same vocabulary as LLaMA1 and LLaMA2 - Derived with a budget of 50B tokens by utilizing existing strong LLMs We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models. | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | LLaMA2-7B | 2T | 64.6 | | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | OPT-1.3B | 300B | 48.2 | | Pythia-1.4B | 300B | 48.9 | | Sheared-LLaMA-1.3B | 50B | 51.0 | | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | OPT-2.7B | 300B | 51.4 | | Pythia-2.8B | 300B | 52.5 | | INCITE-Base-3B | 800B | 54.7 | | Open-LLaMA-3B-v1 | 1T | 55.1 | | Open-LLaMA-3B-v2 | 1T | 55.7 | | Sheared-LLaMA-2.7B | 50B | 56.7 |
Sheared-LLaMA-1.3B-ShareGPT
unsup-simcse-roberta-large
SWE-Llama-7b
Sheared-LLaMA-1.3B-Pruned
warm-start__sft__nothink__Qwen2.5-7B-Instruct
AutoCompressor-Llama-2-7b-6k
warm-start__grpo__nothink__Qwen2.5-7B-Instruct
warm-start__dpo__nothink__Qwen2.5-7B-Instruct
SWE-Llama-13b
sup-simcse-bert-large-uncased
Mistral-7B-Base-SFT-RRHF
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
unsup-simcse-bert-large-uncased
Llama-3-Instruct-8B-SimPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Base-8B-SFT-DPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Mistral-7B-Base-SFT-IPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Mistral-7B-Base-SFT-CPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Mistral-7B-Base-SFT-DPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Mistral-7B-Base-SFT-RDPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Mistral-7B-Base-SFT-SLiC-HF
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Mistral-7B-Base-SFT-KTO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Instruct-8B-DPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Mistral-7B-Base-SFT-SimPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Instruct-8B-DPO-v0.2
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-ORPO-v0.2
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
warm-start__dpo__think__Qwen2.5-7B
Llama-3-Instruct-8B-SimPO-v0.2
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Base-8B-SFT-SimPO
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
gemma-2-9b-it-DPO
This model was trained under the same setup as gemma-2-9b-it-SimPO, with the DPO objective. SimPO (Simple Preference Optimization) is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. Please refer to our preprint and github repo for more details. We fine-tuned google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm with the DPO objective. - Developed by: Yu Meng, Mengzhou Xia, Danqi Chen - Model type: Causal Language Model - License: gemma - Finetuned from model: google/gemma-2-9b-it - Repository: https://github.com/princeton-nlp/SimPO - Paper: https://arxiv.org/pdf/2405.14734 We use princeton-nlp/gemma2-ultrafeedback-armorm as the preference optimization dataset. We used the following hyperparameters: - learning rate: 5e-7 - batch size: 128 - beta: 0.01 The other hyperparameters are kept the same with our SimPO recipe. Fine-tuning the google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm takes around 150 mins to finish on 8xH100 GPUs. | models | AE2 LC | AE2 WR | AE2 Length | AH | AH Length | GSM | GSM Length | MMLU | MMLU Length | |-----------------------------------|:------:|:------:|:----------:|:----:|:---------:|:----:|:----------:|:----:|:-----------:| | google/gemma-2-9b-it | 51.1 | 38.1 | 1571 | 40.8 | 545 | 87.4 | 395 | 72.7 | 515 | | princeton-nlp/gemma-2-9b-it-DPO | 67.8 | 65.4 | 2016 | 58.9 | 717 | 88.5 | 392 | 72.2 | 624 | | princeton-nlp/gemma-2-9b-it-SimPO | 72.4 | 65.9 | 1833 | 59.1 | 693 | 88.0 | 341 | 72.2 | 441 | The model architecture is based on google/gemma-2-9b-it. We use the DPO training objective. Training was done using the alignment-handbook library.
Llama-3-Instruct-8B-IPO
Mistral-7B-Instruct-SimPO
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
mabel-bert-base-uncased
Llama-3-Instruct-8B-CPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-KTO-v0.2
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-KTO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Mistral-7B-Instruct-RDPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Base-8B-SFT-SLiC-HF
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-CPO-v0.2
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Base-8B-SFT-KTO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Base-8B-SFT-RRHF
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-RRHF
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-SLiC-HF
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-SLiC-HF-v0.2
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-RDPO-v0.2
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
warm-start__sft__think__Qwen2.5-7B
Llama-3-Base-8B-SFT-IPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Instruct-8B-ORPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Instruct-8B-RDPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Base-8B-SFT-ORPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Base-8B-SFT-RDPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Mistral-7B-Instruct-DPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llemma-7B-32K-MathMix
Mistral-7B-Instruct-IPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Llama-3-Base-8B-SFT-CPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-IPO-v0.2
Mistral-7B-Instruct-KTO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Mistral-7B-Instruct-ORPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.
Mistral-7B-Instruct-CPO
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Mistral-7B-Instruct-RRHF
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Mistral-7B-Instruct-SLiC-HF
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.
Llama-3-Instruct-8B-RRHF-v0.2
This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.