McGill-NLP

82 models • 2 total models in database
Sort by:

LLM2Vec-Mistral-7B-Instruct-v2-mntp

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders > LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. - Repository: https://github.com/McGill-NLP/llm2vec - Paper: https://arxiv.org/abs/2404.05961 Questions If you have any question about the code, feel free to email Parishad (`[email protected]`) and Vaibhav (`[email protected]`).

NaNK
license:mit
89,428
10

LLM2Vec-Mistral-7B-Instruct-v2-mntp-unsup-simcse

NaNK
license:mit
27,669
7

LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised

NaNK
license:mit
8,295
50

LLM2Vec-Meta-Llama-3-8B-Instruct-mntp

NaNK
llama
3,251
17

LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse

NaNK
license:mit
658
4

delethink-24k-1.5b

TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 24K tokens while only requiring an 8K active context at any time via chunked rollouts and short carryovers. - Initialized from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, trained with the Delethink RL paradigm. See the paper for full details. Links - Repo: https://github.com/McGill-NLP/the-markovian-thinker - Paper: https://arxiv.org/abs/2510.06557v1 - Collection: The Markovian Thinker Model Summary - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` - Objective: Reinforcement Learning for long-form reasoning with bounded, chunked thinking (Delethink) trained for 1000 steps. - Delethink 24K budget: uses 8K “context size” chunks, short “markovian” carryovers, and up to 5 chunk iterations for ~24K total thinking tokens. - Intended use: Math/logic reasoning with step-by-step derivations; final answer typically formatted inside LaTeX `\boxed{}`. - Library compatibility: Works well with SGLang for chunked inference; also usable with Transformers for standard generation (chunking requires manual orchestration; see paper for an example). Intended Uses and Limitations - Intended uses: - Long-form reasoning on math and related tasks. - Bounded-context rollouts with repeated chunking and short carryovers. - Not intended for: - Safety-sensitive applications without human oversight. - Use cases requiring faithful, verifiable citations to external sources. - Limitations: - May hallucinate, make arithmetic/algebraic mistakes, or produce inconsistent plans. - The chunked rollout procedure is needed to realize Delethink’s efficiency advantages. How Delethink Works (Concept) Let: - C = contextsize per chunk (active KV memory) - m = markoviansize = number of tokens carried over to the next chunk - I = iterationcap = maximum number of chunks Effective thinking budget is: - C + (I − 1) × (C − m) For this checkpoint, we recommend: - C = 8192 - m = 4096 - I ≤ 5 This yields an effective budget ≈ 8192 + 4 × (8192 − 4096) = 24576 tokens of thinking. Prompting - Use the model’s chat template and request a step-by-step solution with a final boxed answer: - “Please reason step by step, and put your final answer within \boxed{}.” Suggested generation settings - temperature: 0.6 - topp: 1.0 - topk: -1 Safety and Use - This model can produce incorrect or misleading reasoning steps and answers. Always verify results. - Do not deploy in high-stakes domains without human oversight.

NaNK
license:apache-2.0
614
5

LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse

NaNK
license:mit
376
1

LLM2Vec-Sheared-LLaMA-mntp

llama
344
5

LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised

NaNK
license:mit
268
13

LLM2Vec-Meta-Llama-31-8B-Instruct-mntp

NaNK
llama
187
1

LLM2Vec-Llama-2-7b-chat-hf-mntp

NaNK
llama
143
0

longcot-24k-1.5b

TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 24K tokens while only requiring an 8K active context at any time via chunked rollouts and short carryovers. - Initialized from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, trained with the Delethink RL paradigm. See the paper for full details. Links - Repo: https://github.com/McGill-NLP/the-markovian-thinker - Paper: https://arxiv.org/abs/2510.06557v1 - Collection: The Markovian Thinker Model Summary - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` - Objective: Reinforcement Learning using standard LongCoT, trained for 1000 steps. - Thinking 24K budget; uses the entire context. - Intended use: Math/logic reasoning with step-by-step derivations; final answer typically formatted inside LaTeX `\boxed{}`. - Library compatibility: Works well with SGLang for chunked inference; also usable with Transformers for standard generation. Intended Uses and Limitations - Intended uses: - Long-form reasoning on math and related tasks. - Bounded-context rollouts with repeated chunking and short carryovers. - Not intended for: - Safety-sensitive applications without human oversight. - Use cases requiring faithful, verifiable citations to external sources. - Limitations: - May hallucinate, make arithmetic/algebraic mistakes, or produce inconsistent plans. - The chunked rollout procedure is needed to realize Delethink’s efficiency advantages. Prompting - Use the model’s chat template and request a step-by-step solution with a final boxed answer: - “Please reason step by step, and put your final answer within \boxed{}.” Suggested generation settings - temperature: 0.6 - topp: 1.0 - topk: -1 Safety and Use - This model can produce incorrect or misleading reasoning steps and answers. Always verify results. - Do not deploy in high-stakes domains without human oversight.

NaNK
license:apache-2.0
114
1

LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised

NaNK
license:mit
112
3

LLM2Vec-Sheared-LLaMA-mntp-unsup-simcse

license:mit
108
1

LLM2Vec-Meta-Llama-32-3B-Instruct-mntp

NaNK
llama
72
0

LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised

NaNK
license:mit
64
4

AfriqueQwen-14B

NaNK
llamafactory
50
2

Llama-3-8B-Web

NaNK
llama
43
214

A3-Qwen3.5-9B

NaNK
35
0

Delethink 96k 1.5b

TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 96K tokens while only requiring an 8K active context at any time via chunked rollouts and short carr...

NaNK
license:apache-2.0
34
3

A3-Qwen3.5-4B

NaNK
33
0

A3-Qwen3.5-2B

NaNK
33
0

LLM2Vec-Qwen3-4B-mntp

NaNK
license:mit
24
0

nano-aha-moment-3b

NaNK
20
2

LLM2Vec-Qwen25-15B-Instruct-mntp

NaNK
license:mit
20
0

roberta-large-faithcritic

license:mit
19
1

AfriqueGemma-4B

NaNK
llamafactory
19
0

LLM2Vec-Qwen3-8B-mntp

NaNK
license:mit
19
0

LLM2Vec-Qwen3-17B-mntp

NaNK
license:mit
19
0

longcot-8k-1.5b

TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 24K tokens while only requiring an 8K active context at any time via chunked rollouts and short carryovers. - Initialized from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, trained with the Delethink RL paradigm. See the paper for full details. Links - Repo: https://github.com/McGill-NLP/the-markovian-thinker - Paper: https://arxiv.org/abs/2510.06557v1 - Collection: The Markovian Thinker Model Summary - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` - Objective: Reinforcement Learning using standard LongCoT, trained for 1000 steps. - Thinking 8K budget; uses the entire context. - Intended use: Math/logic reasoning with step-by-step derivations; final answer typically formatted inside LaTeX `\boxed{}`. - Library compatibility: Works well with SGLang for chunked inference; also usable with Transformers for standard generation. Intended Uses and Limitations - Intended uses: - Long-form reasoning on math and related tasks. - Bounded-context rollouts with repeated chunking and short carryovers. - Not intended for: - Safety-sensitive applications without human oversight. - Use cases requiring faithful, verifiable citations to external sources. - Limitations: - May hallucinate, make arithmetic/algebraic mistakes, or produce inconsistent plans. - The chunked rollout procedure is needed to realize Delethink’s efficiency advantages. Prompting - Use the model’s chat template and request a step-by-step solution with a final boxed answer: - “Please reason step by step, and put your final answer within \boxed{}.” Suggested generation settings - temperature: 0.6 - topp: 1.0 - topk: -1 Safety and Use - This model can produce incorrect or misleading reasoning steps and answers. Always verify results. - Do not deploy in high-stakes domains without human oversight.

NaNK
license:apache-2.0
19
0

LLM2Vec-Sheared-LLaMA-mntp-supervised

license:mit
18
5

Sheared-LLaMA-2.7B-weblinx

NaNK
license:llama2
17
2

pix2act-large-weblinx

license:apache-2.0
17
1

AfriqueQwen-8B

NaNK
llamafactory
16
1

AfriqueGemma-12B

NaNK
llamafactory
16
0

AfriqueLlama-8B

NaNK
llama
15
0

MindAct-base-weblinx

license:apache-2.0
13
0

Delethink 96k Base 1.5b

NaNK
12
1

LLM2Vec-Qwen3-06B-mntp

NaNK
license:mit
11
0

LLM2Vec-Qwen25-7B-Instruct-mntp

NaNK
license:mit
9
0

LLM2Vec-Qwen25-3B-Instruct-mntp

NaNK
license:mit
9
0

LLM2Vec-Qwen25-05B-Instruct-mntp

NaNK
license:mit
5
0

bart-qg-nq-checkpoint

license:cc-by-4.0
5
0

bart-qg-mlquestions-backtraining

license:cc-by-4.0
5
0

LLM2Vec-Llama-2-7b-chat-hf-mntp-unsup-simcse

NaNK
license:mit
5
0

electra-medal

4
3

Llama-2-13b-chat-weblinx

NaNK
license:llama2
4
3

flan-t5-base-weblinx

license:apache-2.0
4
1

dpr-statcan-conversation_encoder-basic_info_fr

4
0

AURORA

license:mit
3
4

fuyu-8b-weblinx

NaNK
license:cc-by-nc-4.0
3
1

dpr-conversation_encoder-basic_info

3
0

dpr-statcan-conversation_encoder-title_fr

3
0

AfroXLMR-large-76L-Injongo-slot

NaNK
license:cc-by-4.0
3
0

MiniLM-L6-dmr

2
5

Llama-2-7b-chat-weblinx

NaNK
license:llama2
2
2

tapas-statcan-large-metadata_encoder-title

2
0

tapas-statcan-large-conversation_encoder-title_and_member

2
0

AfroXLMR-large-76L-Injongo-intent

NaNK
license:cc-by-4.0
2
0

gemma-2-9b-it-Injongo-intent

NaNK
llama-factory
2
0

gemma-2-9b-it-Injongo-slot

NaNK
llama-factory
2
0

gte-base-dmr

1
2

pix2act-base-weblinx

license:apache-2.0
1
1

dpr-statcan-metadata_encoder-basic_info

1
0

dpr-statcan-metadata_encoder-title

1
0

dpr-statcan-metadata_encoder-basic_info_fr

1
0

dpr-statcan-metadata_encoder-title_fr

1
0

tapas-statcan-large-metadata_encoder-title_and_member

1
0

tapas-statcan-large-conversation_encoder-cell_tokens

1
0

MindAct-large-weblinx

license:apache-2.0
1
0

MindAct-xl-weblinx

license:apache-2.0
1
0

ssa-comet-mtl

license:apache-2.0
0
3

LLM2Vec-Gen-Qwen3-4B

NaNK
license:mit
0
1

LLM2Vec-Gen-Llama32-1B

NaNK
base_model:meta-llama/Llama-3.2-1B
0
1

LLM2Vec-Gen-Qwen25-7B

NaNK
license:mit
0
1

LLM2Vec-Gen-Qwen3-06B

NaNK
license:mit
0
1

Sheared-LLaMA-1.3B-weblinx

NaNK
license:llama2
0
1

flan-t5-large-weblinx

0
1

flan-t5-xl-weblinx

license:apache-2.0
0
1

bge-small-dmr

0
1

codellm_1b_alibi

NaNK
license:apache-2.0
0
1

ssa-comet-qe

license:apache-2.0
0
1