McGill-NLP

82 models • 2 total models in database

Sort by:

LLM2Vec-Mistral-7B-Instruct-v2-mntp

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders > LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. - Repository: https://github.com/McGill-NLP/llm2vec - Paper: https://arxiv.org/abs/2404.05961 Questions If you have any question about the code, feel free to email Parishad (`[email protected]`) and Vaibhav (`[email protected]`).

LLM2Vec-Mistral-7B-Instruct-v2-mntp-unsup-simcse

LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised

LLM2Vec-Meta-Llama-3-8B-Instruct-mntp

LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse

delethink-24k-1.5b

TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 24K tokens while only requiring an 8K active context at any time via chunked rollouts and short carryovers. - Initialized from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, trained with the Delethink RL paradigm. See the paper for full details. Links - Repo: https://github.com/McGill-NLP/the-markovian-thinker - Paper: https://arxiv.org/abs/2510.06557v1 - Collection: The Markovian Thinker Model Summary - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` - Objective: Reinforcement Learning for long-form reasoning with bounded, chunked thinking (Delethink) trained for 1000 steps. - Delethink 24K budget: uses 8K “context size” chunks, short “markovian” carryovers, and up to 5 chunk iterations for ~24K total thinking tokens. - Intended use: Math/logic reasoning with step-by-step derivations; final answer typically formatted inside LaTeX `\boxed{}`. - Library compatibility: Works well with SGLang for chunked inference; also usable with Transformers for standard generation (chunking requires manual orchestration; see paper for an example). Intended Uses and Limitations - Intended uses: - Long-form reasoning on math and related tasks. - Bounded-context rollouts with repeated chunking and short carryovers. - Not intended for: - Safety-sensitive applications without human oversight. - Use cases requiring faithful, verifiable citations to external sources. - Limitations: - May hallucinate, make arithmetic/algebraic mistakes, or produce inconsistent plans. - The chunked rollout procedure is needed to realize Delethink’s efficiency advantages. How Delethink Works (Concept) Let: - C = contextsize per chunk (active KV memory) - m = markoviansize = number of tokens carried over to the next chunk - I = iterationcap = maximum number of chunks Effective thinking budget is: - C + (I − 1) × (C − m) For this checkpoint, we recommend: - C = 8192 - m = 4096 - I ≤ 5 This yields an effective budget ≈ 8192 + 4 × (8192 − 4096) = 24576 tokens of thinking. Prompting - Use the model’s chat template and request a step-by-step solution with a final boxed answer: - “Please reason step by step, and put your final answer within \boxed{}.” Suggested generation settings - temperature: 0.6 - topp: 1.0 - topk: -1 Safety and Use - This model can produce incorrect or misleading reasoning steps and answers. Always verify results. - Do not deploy in high-stakes domains without human oversight.

license:apache-2.0

LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse

LLM2Vec-Sheared-LLaMA-mntp

LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised

LLM2Vec-Meta-Llama-31-8B-Instruct-mntp

LLM2Vec-Llama-2-7b-chat-hf-mntp

longcot-24k-1.5b

TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 24K tokens while only requiring an 8K active context at any time via chunked rollouts and short carryovers. - Initialized from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, trained with the Delethink RL paradigm. See the paper for full details. Links - Repo: https://github.com/McGill-NLP/the-markovian-thinker - Paper: https://arxiv.org/abs/2510.06557v1 - Collection: The Markovian Thinker Model Summary - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` - Objective: Reinforcement Learning using standard LongCoT, trained for 1000 steps. - Thinking 24K budget; uses the entire context. - Intended use: Math/logic reasoning with step-by-step derivations; final answer typically formatted inside LaTeX `\boxed{}`. - Library compatibility: Works well with SGLang for chunked inference; also usable with Transformers for standard generation. Intended Uses and Limitations - Intended uses: - Long-form reasoning on math and related tasks. - Bounded-context rollouts with repeated chunking and short carryovers. - Not intended for: - Safety-sensitive applications without human oversight. - Use cases requiring faithful, verifiable citations to external sources. - Limitations: - May hallucinate, make arithmetic/algebraic mistakes, or produce inconsistent plans. - The chunked rollout procedure is needed to realize Delethink’s efficiency advantages. Prompting - Use the model’s chat template and request a step-by-step solution with a final boxed answer: - “Please reason step by step, and put your final answer within \boxed{}.” Suggested generation settings - temperature: 0.6 - topp: 1.0 - topk: -1 Safety and Use - This model can produce incorrect or misleading reasoning steps and answers. Always verify results. - Do not deploy in high-stakes domains without human oversight.

license:apache-2.0

LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised

LLM2Vec-Sheared-LLaMA-mntp-unsup-simcse

LLM2Vec-Meta-Llama-32-3B-Instruct-mntp

LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised

AfriqueQwen-14B

Llama-3-8B-Web

A3-Qwen3.5-9B

Delethink 96k 1.5b

TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 96K tokens while only requiring an 8K active context at any time via chunked rollouts and short carr...

license:apache-2.0

A3-Qwen3.5-4B

A3-Qwen3.5-2B

LLM2Vec-Qwen3-4B-mntp

nano-aha-moment-3b

LLM2Vec-Qwen25-15B-Instruct-mntp

roberta-large-faithcritic

AfriqueGemma-4B

LLM2Vec-Qwen3-8B-mntp

LLM2Vec-Qwen3-17B-mntp

longcot-8k-1.5b

TL;DR - Markovian Thinking for RL in reasoning LLMs: replace the trivial MDP where state = prompt + all past thinking tokens (quadratic compute) with a bounded, fixed-size state, yielding linear compute in thinking tokens and constant memory by design. - Delethink RL trains a model to “think” in fixed-size chunks with bounded state.. - This 1.5B model uses an effective thinking budget of about 24K tokens while only requiring an 8K active context at any time via chunked rollouts and short carryovers. - Initialized from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`, trained with the Delethink RL paradigm. See the paper for full details. Links - Repo: https://github.com/McGill-NLP/the-markovian-thinker - Paper: https://arxiv.org/abs/2510.06557v1 - Collection: The Markovian Thinker Model Summary - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` - Objective: Reinforcement Learning using standard LongCoT, trained for 1000 steps. - Thinking 8K budget; uses the entire context. - Intended use: Math/logic reasoning with step-by-step derivations; final answer typically formatted inside LaTeX `\boxed{}`. - Library compatibility: Works well with SGLang for chunked inference; also usable with Transformers for standard generation. Intended Uses and Limitations - Intended uses: - Long-form reasoning on math and related tasks. - Bounded-context rollouts with repeated chunking and short carryovers. - Not intended for: - Safety-sensitive applications without human oversight. - Use cases requiring faithful, verifiable citations to external sources. - Limitations: - May hallucinate, make arithmetic/algebraic mistakes, or produce inconsistent plans. - The chunked rollout procedure is needed to realize Delethink’s efficiency advantages. Prompting - Use the model’s chat template and request a step-by-step solution with a final boxed answer: - “Please reason step by step, and put your final answer within \boxed{}.” Suggested generation settings - temperature: 0.6 - topp: 1.0 - topk: -1 Safety and Use - This model can produce incorrect or misleading reasoning steps and answers. Always verify results. - Do not deploy in high-stakes domains without human oversight.

license:apache-2.0

LLM2Vec-Sheared-LLaMA-mntp-supervised

Sheared-LLaMA-2.7B-weblinx

pix2act-large-weblinx

license:apache-2.0

AfriqueQwen-8B

AfriqueGemma-12B

AfriqueLlama-8B

MindAct-base-weblinx

license:apache-2.0

Delethink 96k Base 1.5b

LLM2Vec-Qwen3-06B-mntp

LLM2Vec-Qwen25-7B-Instruct-mntp

LLM2Vec-Qwen25-3B-Instruct-mntp

LLM2Vec-Qwen25-05B-Instruct-mntp

bart-qg-nq-checkpoint

license:cc-by-4.0

bart-qg-mlquestions-backtraining

license:cc-by-4.0

LLM2Vec-Llama-2-7b-chat-hf-mntp-unsup-simcse

electra-medal

Llama-2-13b-chat-weblinx

flan-t5-base-weblinx

license:apache-2.0

dpr-statcan-conversation_encoder-basic_info_fr

AURORA

fuyu-8b-weblinx

license:cc-by-nc-4.0

dpr-conversation_encoder-basic_info

dpr-statcan-conversation_encoder-title_fr

AfroXLMR-large-76L-Injongo-slot

license:cc-by-4.0

MiniLM-L6-dmr

Llama-2-7b-chat-weblinx

tapas-statcan-large-metadata_encoder-title

tapas-statcan-large-conversation_encoder-title_and_member

AfroXLMR-large-76L-Injongo-intent

license:cc-by-4.0

gemma-2-9b-it-Injongo-intent

gemma-2-9b-it-Injongo-slot

gte-base-dmr

pix2act-base-weblinx

license:apache-2.0

dpr-statcan-metadata_encoder-basic_info

dpr-statcan-metadata_encoder-title

dpr-statcan-metadata_encoder-basic_info_fr

dpr-statcan-metadata_encoder-title_fr

tapas-statcan-large-metadata_encoder-title_and_member

tapas-statcan-large-conversation_encoder-cell_tokens

MindAct-large-weblinx

license:apache-2.0

MindAct-xl-weblinx

license:apache-2.0

ssa-comet-mtl

license:apache-2.0

LLM2Vec-Gen-Qwen3-4B

LLM2Vec-Gen-Llama32-1B

base_model:meta-llama/Llama-3.2-1B

LLM2Vec-Gen-Qwen25-7B

LLM2Vec-Gen-Qwen3-06B

Sheared-LLaMA-1.3B-weblinx

flan-t5-large-weblinx

flan-t5-xl-weblinx

license:apache-2.0

bge-small-dmr

codellm_1b_alibi

license:apache-2.0

ssa-comet-qe

license:apache-2.0