McGill-NLP
LLM2Vec-Mistral-7B-Instruct-v2-mntp
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders > LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. - Repository: https://github.com/McGill-NLP/llm2vec - Paper: https://arxiv.org/abs/2404.05961 Questions If you have any question about the code, feel free to email Parishad (`[email protected]`) and Vaibhav (`[email protected]`).
LLM2Vec-Mistral-7B-Instruct-v2-mntp-unsup-simcse
LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised
LLM2Vec-Meta-Llama-3-8B-Instruct-mntp
LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse
delethink-24k-1.5b
TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 24K tokens while only requiring an 8K active context at any time via chunked rollouts and short carryovers. - Initialized from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, trained with the Delethink RL paradigm. See the paper for full details. Links - Repo: https://github.com/McGill-NLP/the-markovian-thinker - Paper: https://arxiv.org/abs/2510.06557v1 - Collection: The Markovian Thinker Model Summary - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` - Objective: Reinforcement Learning for long-form reasoning with bounded, chunked thinking (Delethink) trained for 1000 steps. - Delethink 24K budget: uses 8K “context size” chunks, short “markovian” carryovers, and up to 5 chunk iterations for ~24K total thinking tokens. - Intended use: Math/logic reasoning with step-by-step derivations; final answer typically formatted inside LaTeX `\boxed{}`. - Library compatibility: Works well with SGLang for chunked inference; also usable with Transformers for standard generation (chunking requires manual orchestration; see paper for an example). Intended Uses and Limitations - Intended uses: - Long-form reasoning on math and related tasks. - Bounded-context rollouts with repeated chunking and short carryovers. - Not intended for: - Safety-sensitive applications without human oversight. - Use cases requiring faithful, verifiable citations to external sources. - Limitations: - May hallucinate, make arithmetic/algebraic mistakes, or produce inconsistent plans. - The chunked rollout procedure is needed to realize Delethink’s efficiency advantages. How Delethink Works (Concept) Let: - C = contextsize per chunk (active KV memory) - m = markoviansize = number of tokens carried over to the next chunk - I = iterationcap = maximum number of chunks Effective thinking budget is: - C + (I − 1) × (C − m) For this checkpoint, we recommend: - C = 8192 - m = 4096 - I ≤ 5 This yields an effective budget ≈ 8192 + 4 × (8192 − 4096) = 24576 tokens of thinking. Prompting - Use the model’s chat template and request a step-by-step solution with a final boxed answer: - “Please reason step by step, and put your final answer within \boxed{}.” Suggested generation settings - temperature: 0.6 - topp: 1.0 - topk: -1 Safety and Use - This model can produce incorrect or misleading reasoning steps and answers. Always verify results. - Do not deploy in high-stakes domains without human oversight.
LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse
LLM2Vec-Sheared-LLaMA-mntp
LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised
LLM2Vec-Meta-Llama-31-8B-Instruct-mntp
LLM2Vec-Llama-2-7b-chat-hf-mntp
longcot-24k-1.5b
TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 24K tokens while only requiring an 8K active context at any time via chunked rollouts and short carryovers. - Initialized from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, trained with the Delethink RL paradigm. See the paper for full details. Links - Repo: https://github.com/McGill-NLP/the-markovian-thinker - Paper: https://arxiv.org/abs/2510.06557v1 - Collection: The Markovian Thinker Model Summary - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` - Objective: Reinforcement Learning using standard LongCoT, trained for 1000 steps. - Thinking 24K budget; uses the entire context. - Intended use: Math/logic reasoning with step-by-step derivations; final answer typically formatted inside LaTeX `\boxed{}`. - Library compatibility: Works well with SGLang for chunked inference; also usable with Transformers for standard generation. Intended Uses and Limitations - Intended uses: - Long-form reasoning on math and related tasks. - Bounded-context rollouts with repeated chunking and short carryovers. - Not intended for: - Safety-sensitive applications without human oversight. - Use cases requiring faithful, verifiable citations to external sources. - Limitations: - May hallucinate, make arithmetic/algebraic mistakes, or produce inconsistent plans. - The chunked rollout procedure is needed to realize Delethink’s efficiency advantages. Prompting - Use the model’s chat template and request a step-by-step solution with a final boxed answer: - “Please reason step by step, and put your final answer within \boxed{}.” Suggested generation settings - temperature: 0.6 - topp: 1.0 - topk: -1 Safety and Use - This model can produce incorrect or misleading reasoning steps and answers. Always verify results. - Do not deploy in high-stakes domains without human oversight.
LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised
LLM2Vec-Sheared-LLaMA-mntp-unsup-simcse
LLM2Vec-Meta-Llama-32-3B-Instruct-mntp
LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised
AfriqueQwen-14B
Llama-3-8B-Web
A3-Qwen3.5-9B
Delethink 96k 1.5b
TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 96K tokens while only requiring an 8K active context at any time via chunked rollouts and short carr...
A3-Qwen3.5-4B
A3-Qwen3.5-2B
LLM2Vec-Qwen3-4B-mntp
nano-aha-moment-3b
LLM2Vec-Qwen25-15B-Instruct-mntp
roberta-large-faithcritic
AfriqueGemma-4B
LLM2Vec-Qwen3-8B-mntp
LLM2Vec-Qwen3-17B-mntp
longcot-8k-1.5b
TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 24K tokens while only requiring an 8K active context at any time via chunked rollouts and short carryovers. - Initialized from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, trained with the Delethink RL paradigm. See the paper for full details. Links - Repo: https://github.com/McGill-NLP/the-markovian-thinker - Paper: https://arxiv.org/abs/2510.06557v1 - Collection: The Markovian Thinker Model Summary - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` - Objective: Reinforcement Learning using standard LongCoT, trained for 1000 steps. - Thinking 8K budget; uses the entire context. - Intended use: Math/logic reasoning with step-by-step derivations; final answer typically formatted inside LaTeX `\boxed{}`. - Library compatibility: Works well with SGLang for chunked inference; also usable with Transformers for standard generation. Intended Uses and Limitations - Intended uses: - Long-form reasoning on math and related tasks. - Bounded-context rollouts with repeated chunking and short carryovers. - Not intended for: - Safety-sensitive applications without human oversight. - Use cases requiring faithful, verifiable citations to external sources. - Limitations: - May hallucinate, make arithmetic/algebraic mistakes, or produce inconsistent plans. - The chunked rollout procedure is needed to realize Delethink’s efficiency advantages. Prompting - Use the model’s chat template and request a step-by-step solution with a final boxed answer: - “Please reason step by step, and put your final answer within \boxed{}.” Suggested generation settings - temperature: 0.6 - topp: 1.0 - topk: -1 Safety and Use - This model can produce incorrect or misleading reasoning steps and answers. Always verify results. - Do not deploy in high-stakes domains without human oversight.