princeton-nlp

✓ VerifiedResearch Lab

Princeton Natural Language Processing Group

165 models • 52 total models in database
Sort by:

sup-simcse-bert-base-uncased

54,429
26

unsup-simcse-bert-base-uncased

19,947
5

sup-simcse-roberta-large

17,279
28

sup-simcse-roberta-base

10,573
9

Llama-3-8B-ProLong-64k-Instruct

ProLong ( Pr incet o n long -context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET). To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively). Authors: Tianyu Gao\, Alexander Wettig\, Howard Yen, Danqi Chen ( equal contribution) - princetonnlp/Llama-3-8B-ProLong-64k-Base - princetonnlp/Llama-3-8B-ProLong-64k-Instruct ← you are here! - princetonnlp/Llama-3-8B-ProLong-512k-Base - ⭐ princetonnlp/Llama-3-8B-ProLong-512k-Instruct Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct. Base model: meta-llama/Meta-Llama-3-8B-Instruct Long-context continued training: 20B tokens on 64K training data (princeton-nlp/prolong-data-64K), and 20B tokens on 512K training data (princeton-nlp/prolong-data-512K) Supervised fine-tuning (SFT): UltraChat Maximum context window: 512K tokens ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.

NaNK
llama
8,170
12

Llama-3-8B-ProLong-512k-Instruct

ProLong ( Pr incet o n long -context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET). To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively). Authors: Tianyu Gao\, Alexander Wettig\, Howard Yen, Danqi Chen ( equal contribution) - princetonnlp/Llama-3-8B-ProLong-64k-Base - princetonnlp/Llama-3-8B-ProLong-64k-Instruct - princetonnlp/Llama-3-8B-ProLong-512k-Base - ⭐ princetonnlp/Llama-3-8B-ProLong-512k-Instruct ← you are here! Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct. Base model: meta-llama/Meta-Llama-3-8B-Instruct Long-context continued training: 20B tokens on 64K training data (princeton-nlp/prolong-data-64K), and 20B tokens on 512K training data (princeton-nlp/prolong-data-512K) Supervised fine-tuning (SFT): UltraChat Maximum context window: 512K tokens ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.

NaNK
llama
8,085
23

Llama-3-8B-ProLong-64k-Base

ProLong ( Pr incet o n long -context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET). To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively). Authors: Tianyu Gao\, Alexander Wettig\, Howard Yen, Danqi Chen ( equal contribution) - princetonnlp/Llama-3-8B-ProLong-64k-Base ← you are here! - princetonnlp/Llama-3-8B-ProLong-64k-Instruct - princetonnlp/Llama-3-8B-ProLong-512k-Base - ⭐ princetonnlp/Llama-3-8B-ProLong-512k-Instruct Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct. Base model: meta-llama/Meta-Llama-3-8B-Instruct Long-context continued training: 20B tokens on 64K training data (princeton-nlp/prolong-data-64K), and 20B tokens on 512K training data (princeton-nlp/prolong-data-512K) Supervised fine-tuning (SFT): UltraChat Maximum context window: 512K tokens ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.

NaNK
llama
7,912
5

Llama-3-8B-ProLong-512k-Base

ProLong ( Pr incet o n long -context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET). To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively). Authors: Tianyu Gao\, Alexander Wettig\, Howard Yen, Danqi Chen ( equal contribution) - princetonnlp/Llama-3-8B-ProLong-64k-Base - princetonnlp/Llama-3-8B-ProLong-64k-Instruct - princetonnlp/Llama-3-8B-ProLong-512k-Base ← you are here! - ⭐ princetonnlp/Llama-3-8B-ProLong-512k-Instruct Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct. Base model: meta-llama/Meta-Llama-3-8B-Instruct Long-context continued training: 20B tokens on 64K training data (princeton-nlp/prolong-data-64K), and 20B tokens on 512K training data (princeton-nlp/prolong-data-512K) Supervised fine-tuning (SFT): UltraChat Maximum context window: 512K tokens ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.

NaNK
llama
7,902
9

Sheared-LLaMA-1.3B

Paper: https://arxiv.org/pdf/2310.06694.pdf Code: https://github.com/princeton-nlp/LLM-Shearing Models: Sheared-LLaMA-1.3B, Sheared-LLaMA-2.7B Pruned Models without Continued Pre-training: Sheared-LLaMA-1.3B-Pruned, Sheared-LLaMA-2.7B-Pruned Instruction-tuned Models: Sheared-LLaMA-1.3B-ShareGPT, Sheared-LLaMA-2.7B-ShareGPT Sheared-LLaMA-1.3B is a model pruned and further pre-trained from meta-llama/Llama-2-7b-hf. We dynamically load data from different domains in the RedPajama dataset to prune and contune pre-train the model. We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded with HuggingFace via - Smaller-scale - Same vocabulary as LLaMA1 and LLaMA2 - Derived with a budget of 50B tokens by utilizing existing strong LLMs We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models. | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | LLaMA2-7B | 2T | 64.6 | | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | OPT-1.3B | 300B | 48.2 | | Pythia-1.4B | 300B | 48.9 | | Sheared-LLaMA-1.3B | 50B | 51.0 | | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | OPT-2.7B | 300B | 51.4 | | Pythia-2.8B | 300B | 52.5 | | INCITE-Base-3B | 800B | 54.7 | | Open-LLaMA-3B-v1 | 1T | 55.1 | | Open-LLaMA-3B-v2 | 1T | 55.7 | | Sheared-LLaMA-2.7B | 50B | 56.7 | Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric | Value | |-----------------------|---------------------------| | Avg. | 31.47 | | ARC (25-shot) | 32.85 | | HellaSwag (10-shot) | 60.91 | | MMLU (5-shot) | 25.71 | | TruthfulQA (0-shot) | 37.14 | | Winogrande (5-shot) | 58.64 | | GSM8K (5-shot) | 0.45 | | DROP (3-shot) | 4.56 |

NaNK
llama
4,639
98

unsup-simcse-roberta-base

3,404
9

warm-start__sft__nothink__Llama-3.1-8B-Instruct

NaNK
llama
2,558
0

gemma-2-9b-it-SimPO

SimPO (Simple Preference Optimization) is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. Please refer to our preprint and github repo for more details. We fine-tuned google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm with the SimPO objective. - Developed by: Yu Meng, Mengzhou Xia, Danqi Chen - Model type: Causal Language Model - License: gemma - Finetuned from model: google/gemma-2-9b-it - Repository: https://github.com/princeton-nlp/SimPO - Paper: https://arxiv.org/pdf/2405.14734 We use princeton-nlp/gemma2-ultrafeedback-armorm as the preference optimization dataset. The hyperparameters used can be found in the training script. Fine-tuning the google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm takes around 100 mins to finish on 8xH100 GPUs. | models | AE2 LC | AE2 WR | AE2 Length | AH | AH Length | GSM | GSM Length | MMLU | MMLU Length | |-----------------------------------|:------:|:------:|:----------:|:----:|:---------:|:----:|:----------:|:----:|:-----------:| | google/gemma-2-9b-it | 51.1 | 38.1 | 1571 | 40.8 | 545 | 87.4 | 395 | 72.7 | 515 | | princeton-nlp/gemma-2-9b-it-DPO | 67.8 | 65.4 | 2016 | 58.9 | 717 | 88.5 | 392 | 72.2 | 624 | | princeton-nlp/gemma-2-9b-it-SimPO | 72.4 | 65.9 | 1833 | 59.1 | 693 | 88.0 | 341 | 72.2 | 441 | The model architecture is based on google/gemma-2-9b-it. We use the SimPO training objective proposed in our preprint. Training was done using the alignment-handbook library.

NaNK
license:mit
2,182
170

QuRater-1.3B

NaNK
llama
2,037
19

Llama-3-Base-8B-SFT

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
835
4

Sheared-LLaMA-2.7B-ShareGPT

NaNK
llama
795
8

Sheared-LLaMA-2.7B

Paper: https://arxiv.org/pdf/2310.06694.pdf Code: https://github.com/princeton-nlp/LLM-Shearing Models: Sheared-LLaMA-1.3B, Sheared-LLaMA-2.7B Pruned Models without Continued Pre-training: Sheared-LLaMA-1.3B-Pruned, Sheared-LLaMA-2.7B-Pruned Instruction-tuned Models: Sheared-LLaMA-1.3B-ShareGPT, Sheared-LLaMA-2.7B-ShareGPT Sheared-LLaMA-2.7B is a model pruned and further pre-trained from meta-llama/Llama-2-7b-hf. We dynamically load data from different domains in the RedPajama dataset. We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded into huggingface via - Smaller-scale - Same vocabulary as LLaMA1 and LLaMA2 - Derived with a budget of 50B tokens by utilizing existing strong LLMs We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models. | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | LLaMA2-7B | 2T | 64.6 | | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | OPT-1.3B | 300B | 48.2 | | Pythia-1.4B | 300B | 48.9 | | Sheared-LLaMA-1.3B | 50B | 51.0 | | Model | # Pre-training Tokens | Average Performance | | ------------------- | --------------------- | ------------------- | | OPT-2.7B | 300B | 51.4 | | Pythia-2.8B | 300B | 52.5 | | INCITE-Base-3B | 800B | 54.7 | | Open-LLaMA-3B-v1 | 1T | 55.1 | | Open-LLaMA-3B-v2 | 1T | 55.7 | | Sheared-LLaMA-2.7B | 50B | 56.7 |

NaNK
llama
787
60

Sheared-LLaMA-1.3B-ShareGPT

NaNK
llama
693
10

unsup-simcse-roberta-large

364
3

SWE-Llama-7b

NaNK
llama
251
13

Sheared-LLaMA-1.3B-Pruned

NaNK
llama
208
3

warm-start__sft__nothink__Qwen2.5-7B-Instruct

NaNK
172
0

AutoCompressor-Llama-2-7b-6k

NaNK
llama
155
2

warm-start__grpo__nothink__Qwen2.5-7B-Instruct

NaNK
126
0

warm-start__dpo__nothink__Qwen2.5-7B-Instruct

NaNK
125
0

SWE-Llama-13b

NaNK
llama
69
25

sup-simcse-bert-large-uncased

69
0

Mistral-7B-Base-SFT-RRHF

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
51
0

unsup-simcse-bert-large-uncased

32
1

Llama-3-Instruct-8B-SimPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
31
60

Llama-3-Base-8B-SFT-DPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
29
0

Mistral-7B-Base-SFT-IPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
27
0

Mistral-7B-Base-SFT-CPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
25
1

Mistral-7B-Base-SFT-DPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
25
0

Mistral-7B-Base-SFT-RDPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
25
0

Mistral-7B-Base-SFT-SLiC-HF

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
25
0

Mistral-7B-Base-SFT-KTO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
24
0

Llama-3-Instruct-8B-DPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
22
0

Mistral-7B-Base-SFT-SimPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
19
0

Llama-3-Instruct-8B-DPO-v0.2

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
18
0

Llama-3-Instruct-8B-ORPO-v0.2

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
17
0

warm-start__dpo__think__Qwen2.5-7B

NaNK
15
0

Llama-3-Instruct-8B-SimPO-v0.2

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
14
8

Llama-3-Base-8B-SFT-SimPO

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

NaNK
llama
14
1

gemma-2-9b-it-DPO

This model was trained under the same setup as gemma-2-9b-it-SimPO, with the DPO objective. SimPO (Simple Preference Optimization) is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. Please refer to our preprint and github repo for more details. We fine-tuned google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm with the DPO objective. - Developed by: Yu Meng, Mengzhou Xia, Danqi Chen - Model type: Causal Language Model - License: gemma - Finetuned from model: google/gemma-2-9b-it - Repository: https://github.com/princeton-nlp/SimPO - Paper: https://arxiv.org/pdf/2405.14734 We use princeton-nlp/gemma2-ultrafeedback-armorm as the preference optimization dataset. We used the following hyperparameters: - learning rate: 5e-7 - batch size: 128 - beta: 0.01 The other hyperparameters are kept the same with our SimPO recipe. Fine-tuning the google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm takes around 150 mins to finish on 8xH100 GPUs. | models | AE2 LC | AE2 WR | AE2 Length | AH | AH Length | GSM | GSM Length | MMLU | MMLU Length | |-----------------------------------|:------:|:------:|:----------:|:----:|:---------:|:----:|:----------:|:----:|:-----------:| | google/gemma-2-9b-it | 51.1 | 38.1 | 1571 | 40.8 | 545 | 87.4 | 395 | 72.7 | 515 | | princeton-nlp/gemma-2-9b-it-DPO | 67.8 | 65.4 | 2016 | 58.9 | 717 | 88.5 | 392 | 72.2 | 624 | | princeton-nlp/gemma-2-9b-it-SimPO | 72.4 | 65.9 | 1833 | 59.1 | 693 | 88.0 | 341 | 72.2 | 441 | The model architecture is based on google/gemma-2-9b-it. We use the DPO training objective. Training was done using the alignment-handbook library.

NaNK
13
9

Llama-3-Instruct-8B-IPO

NaNK
llama
13
0

Mistral-7B-Instruct-SimPO

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

NaNK
12
2

mabel-bert-base-uncased

12
0

Llama-3-Instruct-8B-CPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
12
0

Llama-3-Instruct-8B-KTO-v0.2

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
12
0

Llama-3-Instruct-8B-KTO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
11
0

Mistral-7B-Instruct-RDPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
11
0

Llama-3-Base-8B-SFT-SLiC-HF

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
11
0

Llama-3-Instruct-8B-CPO-v0.2

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
11
0

Llama-3-Base-8B-SFT-KTO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
10
0

Llama-3-Base-8B-SFT-RRHF

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
10
0

Llama-3-Instruct-8B-RRHF

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
10
0

Llama-3-Instruct-8B-SLiC-HF

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
10
0

Llama-3-Instruct-8B-SLiC-HF-v0.2

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
10
0

Llama-3-Instruct-8B-RDPO-v0.2

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
10
0

warm-start__sft__think__Qwen2.5-7B

NaNK
10
0

Llama-3-Base-8B-SFT-IPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
9
1

Llama-3-Instruct-8B-ORPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
9
0

Llama-3-Instruct-8B-RDPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
9
0

Llama-3-Base-8B-SFT-ORPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
9
0

Llama-3-Base-8B-SFT-RDPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
llama
9
0

Mistral-7B-Instruct-DPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
9
0

Llemma-7B-32K-MathMix

NaNK
llama
8
0

Mistral-7B-Instruct-IPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
8
0

Llama-3-Base-8B-SFT-CPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
8
0

Llama-3-Instruct-8B-IPO-v0.2

NaNK
llama
8
0

Mistral-7B-Instruct-KTO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
7
0

Mistral-7B-Instruct-ORPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward Please refer to our repository for more details.

NaNK
7
0

Mistral-7B-Instruct-CPO

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
7
0

Mistral-7B-Instruct-RRHF

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
7
0

Mistral-7B-Instruct-SLiC-HF

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
7
0

Llama-3-Instruct-8B-RRHF-v0.2

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward. Please refer to our repository for more details.

NaNK
llama
7
0

AutoCompressor-2.7b-6k

NaNK
license:apache-2.0
5
2

RMT-1.3b-30k

NaNK
license:apache-2.0
5
1

warm-start__sft__think__Llama-3.1-8B-Instruct

NaNK
llama
5
0

warm-start__grpo__think__Qwen2.5-7B-Instruct

NaNK
5
0

Sheared-Pythia-160m

license:apache-2.0
4
4

datamux_base_qqp_20_best

4
0

mabel-roberta-large

license:apache-2.0
4
0

warm-start__sft__think__Llama-3.1-8B

NaNK
llama
4
0

warm-start__sft__nothink__Qwen2.5-7B

NaNK
4
0

warm-start__ppo__nothink__Qwen2.5-7B

NaNK
4
0

warm-start__grpo__think__Qwen2.5-7B

NaNK
4
0

Sheared-LLaMA-2.7B-Pruned

NaNK
llama
3
3

AutoCompressor-1.3b-30k

NaNK
license:apache-2.0
3
1

datamux-qqp-2

3
0

muxbert_base_mnli_gaussian_hadamard_index_pos_2

3
0

muxbert_base_qnli_gaussian_hadamard_index_pos_10

3
0

muxbert_base_qqp_gaussian_hadamard_index_2

3
0

muxbert_base_sst2_gaussian_hadamard_index_10

3
0

warm-start__sft__think__Qwen2.5-7B-Instruct

NaNK
3
0

warm-start__sft__nothink__Llama-3.1-8B

NaNK
llama
3
0

warm-start__dpo__think__Llama-3.1-8B-Instruct

NaNK
llama
3
0

warm-start__ppo__nothink__Llama-3.1-8B

NaNK
llama
3
0

zero__base__nothink__Qwen2.5-7B

NaNK
3
0

zero__dpo__think__Llama-3.1-8B

NaNK
llama
3
0

zero__dpo__nothink__Llama-3.1-8B

NaNK
llama
3
0

zero__ppo__nothink__Qwen2.5-7B

NaNK
3
0

zero__grpo__think__Qwen2.5-7B

NaNK
3
0

efficient_mlm_m0.15

2
1

ptp

2
1

datamux-retrieval-20

2
0

datamux-sst2-20

2
0

densephrases-multi-query-sqd

2
0

CoFi-MNLI-s95

2
0

efficient_mlm_m0.15-801010

2
0

CoFi-MRPC-s95

2
0

bert_base_1

2
0

muxbert_base_gaussian_attention_v2_index_pos_2

2
0

muxbert_base_mnli_gaussian_hadamard_index_5

2
0

muxbert_base_qnli_gaussian_hadamard_index_10

2
0

muxbert_base_sst2_gaussian_hadamard_index_5

2
0

FullAttention-2.7b-4k

NaNK
license:apache-2.0
2
0

SWE-Llama-7b-peft

NaNK
2
0

zero__base__think__Qwen2.5-7B

NaNK
2
0

zero__ppo__think__Qwen2.5-7B

NaNK
2
0

zero__grpo__nothink__Qwen2.5-7B

NaNK
2
0

RMT-2.7b-8k

NaNK
license:apache-2.0
1
5

AutoCompressor-2.7b-30k

NaNK
license:apache-2.0
1
2

Llemma-34B-MathMix

NaNK
llama
1
1

CoFi-QNLI-s60

1
0

CoFi-SST2-s95

1
0

CoFi-SQuAD-s60

1
0

efficient_mlm_m0.40-801010

1
0

efficient_mlm_m0.30

1
0

efficient_mlm_m0.60

1
0

CoFi-MRPC-s60

1
0

datamux_base_mnli_20_best

1
0

FullAttention-Llama-2-7b-6k

NaNK
llama
1
0

lm-1.3B-select_30B_tokens_by-inverse_required_expertise-top_k

NaNK
llama
1
0

lm-1.3B-select_30B_tokens_by-uniform-sampling-curriculum-low_to_high-required_expertise

NaNK
llama
1
0

lm-1.3B-select_30B_tokens_by-writing_style-sample_with_temperature1.0

NaNK
llama
1
0

lm-1.3B-select_30B_tokens_by-perplexity-bottom_k

NaNK
llama
1
0

warm-start__dpo__think__Llama-3.1-8B

NaNK
llama
1
0

warm-start__dpo__think__Qwen2.5-7B-Instruct

NaNK
1
0

warm-start__dpo__nothink__Llama-3.1-8B

NaNK
llama
1
0

warm-start__dpo__nothink__Qwen2.5-7B

NaNK
1
0

warm-start__dpo__nothink__Llama-3.1-8B-Instruct

NaNK
llama
1
0

warm-start__ppo__think__Llama-3.1-8B

NaNK
llama
1
0

warm-start__ppo__think__Qwen2.5-7B

NaNK
1
0

warm-start__ppo__think__Llama-3.1-8B-Instruct

NaNK
llama
1
0

warm-start__ppo__think__Qwen2.5-7B-Instruct

NaNK
1
0

warm-start__ppo__nothink__Llama-3.1-8B-Instruct

NaNK
llama
1
0

warm-start__ppo__nothink__Qwen2.5-7B-Instruct

NaNK
1
0

zero__base__think__Llama-3.1-8B

NaNK
llama
1
0

zero__base__nothink__Llama-3.1-8B

NaNK
llama
1
0

zero__dpo__think__Qwen2.5-7B

NaNK
1
0

zero__dpo__nothink__Qwen2.5-7B

NaNK
1
0

zero__ppo__think__Llama-3.1-8B

NaNK
llama
1
0

zero__grpo__think__Llama-3.1-8B

NaNK
llama
1
0

zero__grpo__nothink__Llama-3.1-8B

NaNK
llama
1
0

warm-start__grpo__think__Llama-3.1-8B

NaNK
llama
1
0

warm-start__grpo__think__Llama-3.1-8B-Instruct

NaNK
llama
1
0

warm-start__grpo__nothink__Llama-3.1-8B

NaNK
llama
1
0

warm-start__grpo__nothink__Qwen2.5-7B

NaNK
1
0

warm-start__grpo__nothink__Llama-3.1-8B-Instruct

NaNK
llama
1
0

densephrases-multi

0
3

efficient_mlm_fairseq_ckpt

0
1

bert_base_sst2_1

0
1

screenshot-llama-1.3b-from-sheared-llama

NaNK
screenshot-llama
0
1

screenshot-llama-380m

screenshot-llama
0
1