MiniMaxAI

17 models • 6 total models in database

Sort by:

MiniMax-M2.5

MiniMax-M2

--- pipeline_tag: text-generation license: other license_name: modified-mit license_link: https://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE library_name: transformers ---

—

131,385

1,416

MiniMax-M1-40k

We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek R1, M1 consumes 25% of the FLOPs at a generation length of 100K tokens. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems ranging from traditional mathematical reasoning to sandbox-based, real-world software engineering environments. We develop an efficient RL scaling framework for M1 highlighting two perspectives: (1) We propose CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We train two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively. Experiments on standard benchmarks show that our models outperform other strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using, and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong foundation for next-generation language model agents to reason and tackle real-world challenges. Benchmark performance comparison of leading commercial and open-weight models across competition-level mathematics, coding, software engineering, agentic tool use, and long-context understanding tasks. We use the MiniMax-M1-80k model here for MiniMax-M1. | Category | Task | MiniMax-M1-80K | MiniMax-M1-40K | Qwen3-235B-A22B | DeepSeek-R1-0528 | DeepSeek-R1 | Seed-Thinking-v1.5 | Claude 4 Opus | Gemini 2.5 Pro (06-05) | OpenAI-o3 | |:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | | Extended Thinking | 80K | 40K | 32k | 64k | 32k | 32k | 64k | 64k | 100k | | Mathematics | AIME 2024 | 86.0 | 83.3 | 85.7 | 91.4 | 79.8 | 86.7 | 76.0 | 92.0 | 91.6 | | | AIME 2025 | 76.9 | 74.6 | 81.5 | 87.5 | 70.0 | 74.0 | 75.5 | 88.0 | 88.9 | | | MATH-500 | 96.8 | 96.0 | 96.2 | 98.0 | 97.3 | 96.7 | 98.2 | 98.8 | 98.1 | | General Coding | LiveCodeBench (24/8~25/5) | 65.0 | 62.3 | 65.9 | 73.1 | 55.9 | 67.5 | 56.6 | 77.1 | 75.8 | | | FullStackBench | 68.3 | 67.6 | 62.9 | 69.4 | 70.1 | 69.9 | 70.3 | -- | 69.3 | | Reasoning & Knowledge| GPQA Diamond | 70.0 | 69.2 | 71.1 | 81.0 | 71.5 | 77.3 | 79.6 | 86.4 | 83.3 | | | HLE (no tools) | 8.4\ | 7.2\ | 7.6\ | 17.7\ | 8.6\ | 8.2 | 10.7 | 21.6 | 20.3 | | | ZebraLogic | 86.8 | 80.1 | 80.3 | 95.1 | 78.7 | 84.4 | 95.1 | 91.6 | 95.8 | | | MMLU-Pro | 81.1 | 80.6 | 83.0 | 85.0 | 84.0 | 87.0 | 85.0 | 86.0 | 85.0 | | Software Engineering| SWE-bench Verified| 56.0 | 55.6 | 34.4 | 57.6 | 49.2 | 47.0 | 72.5 | 67.2 | 69.1 | | Long Context | OpenAI-MRCR (128k) | 73.4 | 76.1 | 27.7 | 51.5 | 35.8 | 54.3 | 48.9 | 76.8 | 56.5 | | | OpenAI-MRCR (1M) | 56.2 | 58.6 | -- | -- | -- | -- | -- | 58.8 | -- | | | LongBench-v2 | 61.5 | 61.0 | 50.1 | 52.1 | 58.3 | 52.5 | 55.6 | 65.0 | 58.8 | | Agentic Tool Use| TAU-bench (airline) | 62.0 | 60.0 | 34.7 | 53.5 | -- | 44.0 | 59.6 | 50.0 | 52.0 | | | TAU-bench (retail) | 63.5 | 67.8 | 58.6 | 63.9 | -- | 55.7 | 81.4 | 67.0 | 73.9 | | Factuality | SimpleQA | 18.5 | 17.9 | 11.0 | 27.8 | 30.1 | 12.9 | -- | 54.0 | 49.4 | | General Assistant| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 | Our models are evaluated with `temperature=1.0`, `topp=0.95`. SWE-bench methodology We report results derived from the Agentless scaffold. Departing from the original pipeline, our methodology employs a two-stage localization process (without any embedding-based retrieval mechanisms): initial coarse-grained file localization followed by fine-grained localization to specific files and code elements. The values for our models are calculated on the subset of n=486 verified tasks which work on our infrastructure. The excluded 14 test cases that were incompatible with our internal infrastructure are: `"astropyastropy-7606"`, `"astropyastropy-8707"`, `"astropyastropy-8872"`, `"djangodjango-10097"`, `"matplotlibmatplotlib-20488"`, `"psfrequests-2317"`, `"psfrequests-2931"`, `"psfrequests-5414"`, `"pylint-devpylint-6528"`, `"pylint-devpylint-7277"`, `"sphinx-docsphinx-10435"`, `"sphinx-docsphinx-7985"`, `"sphinx-docsphinx-8269"`, `"sphinx-docsphinx-8475"` TAU-bench methodology We evaluate TAU-Bench with GPT-4.1 as user model and without any custom tools. The maximum number of interaction steps is 40. Our general system prompt is: To achieve the best results with the Minimax-M1 model, we suggest focusing on two key points: Inference Parameters and the System Prompt. 3.1. Inference Parameters - Temperature: `1.0` - Topp: `0.95` This setting is optimal for encouraging creativity and diversity in the model's responses. It allows the model to explore a wider range of linguistic possibilities, preventing outputs that are too rigid or repetitive, while still maintaining strong logical coherence. 3.2. System Prompt Tailoring your system prompt to the specific task is crucial for guiding the model effectively. Below are suggested settings for different scenarios. A. General-Purpose Scenarios For common tasks like summarization, translation, Q&A, or creative writing: B. Web Development Scenarios For complex tasks like generating code for web pages: C. Mathematical Scenarios When dealing with problems that require calculation or logical deduction: Download the model from HuggingFace repository: - MiniMax-M1-40k - MiniMax-M1-80k For production deployment, we recommend using vLLM to serve MiniMax-M1. vLLM provides excellent performance for serving large language models with the following features: - 🔥 Outstanding service throughout performance - ⚡ Efficient and intelligent memory management - 📦 Powerful batch request processing capability - ⚙️ Deeply optimized underlying performance For detailed vLLM deployment instructions, please refer to our vLLM Deployment Guide. Alternatively, you can also deploy using Transformers directly. For detailed Transformers deployment instructions, you can see our MiniMax-M1 Transformers Deployment Guide. The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. MiniMax-M1 Function Call Guide provides detailed instructions on how to use the function calling feature of MiniMax-M1. 6. Chatbot & API For general use and evaluation, we provide a Chatbot with online search capabilities and the online API for developers. For general use and evaluation, we provide the MiniMax MCP Server with video generation, image generation, speech synthesis, and voice cloning for developers.

license:apache-2.0

14,436

181

MiniMax-Text-01-hf

—

10,333

SynLogic-Mix-3-32B

NaNK

license:mit

7,725

MiniMax-Text-01

MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model. The architecture of MiniMax-Text-01 is briefly described as follows: - Total Parameters: 456B - Activated Parameters per Token: 45.9B - Number Layers: 80 - Hybrid Attention: a softmax attention is positioned after every 7 lightning attention. - Number of attention heads: 64 - Attention head dimension: 128 - Mixture of Experts: - Number of experts: 32 - Expert hidden dimension: 9216 - Top-2 routing strategy - Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000 - Hidden Size: 6144 - Vocab Size: 200,064 | Tasks | GPT-4o (11-20) | Claude-3.5-Sonnet (10-22) | Gemini-1.5-Pro (002) | Gemini-2.0-Flash (exp) | Qwen2.5-72B-Inst. | DeepSeek-V3 | Llama-3.1-405B-Inst. | MiniMax-Text-01 | |-------------------------------|--------------------|-------------------------------|--------------------------|----------------------------|-----------------------|-----------------|--------------------------|---------------------| | General | | | | | | | | | | MMLU | 85.7 | 88.3 | 86.8 | 86.5 | 86.1 | 88.5 | 88.6 | 88.5 | | MMLU-Pro | 74.4 | 78.0 | 75.8 | 76.4 | 71.1 | 75.9 | 73.3 | 75.7 | | SimpleQA | 39.0 | 28.1 | 23.4 | 26.6 | 10.3 | 24.9 | 23.2 | 23.7 | | C-SimpleQA | 64.6 | 56.8 | 59.4 | 63.3 | 52.2 | 64.8 | 54.7 | 67.4 | | IFEval (avg) | 84.1 | 90.1 | 89.4 | 88.4 | 87.2 | 87.3 | 86.4 | 89.1 | | Arena-Hard | 92.4 | 87.6 | 85.3 | 72.7 | 81.2 | 91.4 | 63.5 | 89.1 | | Reasoning | | | | | | | | | | GPQA (diamond) | 46.0 | 65.0 | 59.1 | 62.1 | 49.0 | 59.1 | 50.7 | 54.4 | | DROP (F1) | 89.2 | 88.8 | 89.2 | 89.3 | 85.0 | 91.0 | 92.5 | 87.8 | | Mathematics | | | | | | | | | | GSM8k | 95.6 | 96.9 | 95.2 | 95.4 | 95.8 | 96.7 | 96.7 | 94.8 | | MATH | 76.6 | 74.1 | 84.6 | 83.9 | 81.8 | 84.6 | 73.8 | 77.4 | | Coding | | | | | | | | | | MBPP + | 76.2 | 75.1 | 75.4 | 75.9 | 77.0 | 78.8 | 73.0 | 71.7 | | HumanEval | 90.2 | 93.7 | 86.6 | 89.6 | 86.6 | 92.1 | 89.0 | 86.9 | Ruler | Model | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M | |-------|----|----|-----|-----|-----|------|------|------|----| | GPT-4o (11-20) | 0.970 | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - | | Claude-3.5-Sonnet (10-22) | 0.965 | 0.960 | 0.957 | 0.950 | 0.952 | 0.938 | - | - | - | | Gemini-1.5-Pro (002) | 0.962 | 0.960 | 0.960 | 0.958 | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 | | Gemini-2.0-Flash (exp) | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - | | MiniMax-Text-01 | 0.963 | 0.961 | 0.953 | 0.954 | 0.943 | 0.947 | 0.945 | 0.928 | 0.910 | LongBench v2 | Model | overall | easy | hard | short | medium | long | |----------------------------|-------------|----------|----------|------------|------------|----------| | Human | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 | | w/ CoT | | | | | | | | GPT-4o (11-20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 | | Claude-3.5-Sonnet (10-22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 | | Deepseek-V3 | - | - | - | - | - | - | | Qwen2.5-72B-Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 | | MiniMax-Text-01 | 56.5 | 66.1 | 50.5 | 61.7 | 56.7 | 47.2 | | w/o CoT | | | | | | | | GPT-4o (11-20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 | | Claude-3.5-Sonnet (10-22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 | | Deepseek-V3 | 48.7 | - | - | - | - | - | | Qwen2.5-72B-Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | 44.4 | | MiniMax-Text-01 | 52.9 | 60.9 | 47.9 | 58.9 | 52.6 | 43.5 | MTOB | Context Type | no context | half book | full book | Δ half book | Δ full book | |------------------|----------------|---------------|---------------|------------------|-----------------| | eng → kalam (ChrF) | | | | | | | GPT-4o (11-20) | 9.90 | 54.30 | - | 44.40 | - | | Claude-3.5-Sonnet (10-22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 | | Gemini-1.5-Pro (002) | 16.79 | 53.68 | 57.90 | 36.89 | 41.11 | | Gemini-2.0-Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 | | Qwen-Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 | | MiniMax-Text-01 | 6.0 | 51.74 | 51.60 | 45.7 | 45.6 | | kalam → eng (BLEURT) | | | | | | | GPT-4o (11-20) | 33.20 | 58.30 | - | 25.10 | - | | Claude-3.5-Sonnet (10-22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 | | Gemini-1.5-Pro (002) | 32.02 | 61.52 | 63.09 | 29.50 | 31.07 | | Gemini-2.0-Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 | | Qwen-Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 | | MiniMax-Text-01 | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 | 4. Quickstart Here we provide a simple example of loading the tokenizer and model to generate content. 5. Deployment Guide For production deployment, we recommend using vLLM to serve MiniMax-Text-01. vLLM provides excellent performance for serving large language models with the following features: 🔥 Outstanding service throughput performance ⚡ Efficient and intelligent memory management 📦 Powerful batch request processing capability ⚙️ Deeply optimized underlying performance For detailed deployment instructions, please refer to our vLLM Deployment Guide. 6. Function Calling MiniMax-Text-01 supports Function Calling capability, enabling the model to intelligently identify when external functions need to be called and output parameters in structured JSON format. With Function Calling, you can: - Let the model recognize implicit function call needs in user requests - Receive structured parameter outputs for seamless application integration - Support various complex parameter types, including nested objects and arrays Function Calling supports standard OpenAI-compatible format definitions and integrates seamlessly with the Transformers library. For detailed usage instructions, please refer to our Function Call Guide or Chinese Guide. 8. Chatbot & API For general use and evaluation, we provide a Chatbot with online search capabilities and the online API for developers. For general use and evaluation, we provide the MiniMax MCP Server with video generation, image generation, speech synthesis, and voice cloning for developers.

—

2,963

649

MiniMax M1 80k

We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek R1, M1 consumes 25% of the FLOPs at a generation length of 100K tokens. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems ranging from traditional mathematical reasoning to sandbox-based, real-world software engineering environments. We develop an efficient RL scaling framework for M1 highlighting two perspectives: (1) We propose CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We train two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively. Experiments on standard benchmarks show that our models outperform other strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using, and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong foundation for next-generation language model agents to reason and tackle real-world challenges. Benchmark performance comparison of leading commercial and open-weight models across competition-level mathematics, coding, software engineering, agentic tool use, and long-context understanding tasks. We use the MiniMax-M1-80k model here for MiniMax-M1. | Category | Task | MiniMax-M1-80K | MiniMax-M1-40K | Qwen3-235B-A22B | DeepSeek-R1-0528 | DeepSeek-R1 | Seed-Thinking-v1.5 | Claude 4 Opus | Gemini 2.5 Pro (06-05) | OpenAI-o3 | |:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | | Extended Thinking | 80K | 40K | 32k | 64k | 32k | 32k | 64k | 64k | 100k | | Mathematics | AIME 2024 | 86.0 | 83.3 | 85.7 | 91.4 | 79.8 | 86.7 | 76.0 | 92.0 | 91.6 | | | AIME 2025 | 76.9 | 74.6 | 81.5 | 87.5 | 70.0 | 74.0 | 75.5 | 88.0 | 88.9 | | | MATH-500 | 96.8 | 96.0 | 96.2 | 98.0 | 97.3 | 96.7 | 98.2 | 98.8 | 98.1 | | General Coding | LiveCodeBench (24/8~25/5) | 65.0 | 62.3 | 65.9 | 73.1 | 55.9 | 67.5 | 56.6 | 77.1 | 75.8 | | | FullStackBench | 68.3 | 67.6 | 62.9 | 69.4 | 70.1 | 69.9 | 70.3 | -- | 69.3 | | Reasoning & Knowledge| GPQA Diamond | 70.0 | 69.2 | 71.1 | 81.0 | 71.5 | 77.3 | 79.6 | 86.4 | 83.3 | | | HLE (no tools) | 8.4\ | 7.2\ | 7.6\ | 17.7\ | 8.6\ | 8.2 | 10.7 | 21.6 | 20.3 | | | ZebraLogic | 86.8 | 80.1 | 80.3 | 95.1 | 78.7 | 84.4 | 95.1 | 91.6 | 95.8 | | | MMLU-Pro | 81.1 | 80.6 | 83.0 | 85.0 | 84.0 | 87.0 | 85.0 | 86.0 | 85.0 | | Software Engineering| SWE-bench Verified| 56.0 | 55.6 | 34.4 | 57.6 | 49.2 | 47.0 | 72.5 | 67.2 | 69.1 | | Long Context | OpenAI-MRCR (128k) | 73.4 | 76.1 | 27.7 | 51.5 | 35.8 | 54.3 | 48.9 | 76.8 | 56.5 | | | OpenAI-MRCR (1M) | 56.2 | 58.6 | -- | -- | -- | -- | -- | 58.8 | -- | | | LongBench-v2 | 61.5 | 61.0 | 50.1 | 52.1 | 58.3 | 52.5 | 55.6 | 65.0 | 58.8 | | Agentic Tool Use| TAU-bench (airline) | 62.0 | 60.0 | 34.7 | 53.5 | -- | 44.0 | 59.6 | 50.0 | 52.0 | | | TAU-bench (retail) | 63.5 | 67.8 | 58.6 | 63.9 | -- | 55.7 | 81.4 | 67.0 | 73.9 | | Factuality | SimpleQA | 18.5 | 17.9 | 11.0 | 27.8 | 30.1 | 12.9 | -- | 54.0 | 49.4 | | General Assistant| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 | Our models are evaluated with `temperature=1.0`, `topp=0.95`. SWE-bench methodology We report results derived from the Agentless scaffold. Departing from the original pipeline, our methodology employs a two-stage localization process (without any embedding-based retrieval mechanisms): initial coarse-grained file localization followed by fine-grained localization to specific files and code elements. The values for our models are calculated on the subset of n=486 verified tasks which work on our infrastructure. The excluded 14 test cases that were incompatible with our internal infrastructure are: `"astropyastropy-7606"`, `"astropyastropy-8707"`, `"astropyastropy-8872"`, `"djangodjango-10097"`, `"matplotlibmatplotlib-20488"`, `"psfrequests-2317"`, `"psfrequests-2931"`, `"psfrequests-5414"`, `"pylint-devpylint-6528"`, `"pylint-devpylint-7277"`, `"sphinx-docsphinx-10435"`, `"sphinx-docsphinx-7985"`, `"sphinx-docsphinx-8269"`, `"sphinx-docsphinx-8475"` TAU-bench methodology We evaluate TAU-Bench with GPT-4.1 as user model and without any custom tools. The maximum number of interaction steps is 40. Our general system prompt is: To achieve the best results with the Minimax-M1 model, we suggest focusing on two key points: Inference Parameters and the System Prompt. 3.1. Inference Parameters - Temperature: `1.0` - Topp: `0.95` This setting is optimal for encouraging creativity and diversity in the model's responses. It allows the model to explore a wider range of linguistic possibilities, preventing outputs that are too rigid or repetitive, while still maintaining strong logical coherence. 3.2. System Prompt Tailoring your system prompt to the specific task is crucial for guiding the model effectively. Below are suggested settings for different scenarios. A. General-Purpose Scenarios For common tasks like summarization, translation, Q&A, or creative writing: B. Web Development Scenarios For complex tasks like generating code for web pages: C. Mathematical Scenarios When dealing with problems that require calculation or logical deduction: Download the model from HuggingFace repository: - MiniMax-M1-40k - MiniMax-M1-80k For production deployment, we recommend using vLLM to serve MiniMax-M1. vLLM provides excellent performance for serving large language models with the following features: - 🔥 Outstanding service throughout performance - ⚡ Efficient and intelligent memory management - 📦 Powerful batch request processing capability - ⚙️ Deeply optimized underlying performance For detailed vLLM deployment instructions, please refer to our vLLM Deployment Guide. Special Note: Using vLLM versions below 0.9.2 may result in incompatibility or incorrect precision for the model. Alternatively, you can also deploy using Transformers directly. For detailed Transformers deployment instructions, you can see our MiniMax-M1 Transformers Deployment Guide. The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. MiniMax-M1 Function Call Guide provides detailed instructions on how to use the function calling feature of MiniMax-M1. 6. Chatbot & API For general use and evaluation, we provide a Chatbot with online search capabilities and the online API for developers. For general use and evaluation, we provide the MiniMax MCP Server with video generation, image generation, speech synthesis, and voice cloning for developers.

license:apache-2.0

256

685

SynLogic-7B

NaNK

license:mit

105

MiniMax M1 80k Hf

This repository is primarily for the Transformers framework. If you're using other open-source frameworks, please use the alternative repository: MiniMax-M1-80k We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek R1, M1 consumes 25% of the FLOPs at a generation length of 100K tokens. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems ranging from traditional mathematical reasoning to sandbox-based, real-world software engineering environments. We develop an efficient RL scaling framework for M1 highlighting two perspectives: (1) We propose CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We train two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively. Experiments on standard benchmarks show that our models outperform other strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using, and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong foundation for next-generation language model agents to reason and tackle real-world challenges. Benchmark performance comparison of leading commercial and open-weight models across competition-level mathematics, coding, software engineering, agentic tool use, and long-context understanding tasks. We use the MiniMax-M1-80k model here for MiniMax-M1. | Category | Task | MiniMax-M1-80K | MiniMax-M1-40K | Qwen3-235B-A22B | DeepSeek-R1-0528 | DeepSeek-R1 | Seed-Thinking-v1.5 | Claude 4 Opus | Gemini 2.5 Pro (06-05) | OpenAI-o3 | |:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | | Extended Thinking | 80K | 40K | 32k | 64k | 32k | 32k | 64k | 64k | 100k | | Mathematics | AIME 2024 | 86.0 | 83.3 | 85.7 | 91.4 | 79.8 | 86.7 | 76.0 | 92.0 | 91.6 | | | AIME 2025 | 76.9 | 74.6 | 81.5 | 87.5 | 70.0 | 74.0 | 75.5 | 88.0 | 88.9 | | | MATH-500 | 96.8 | 96.0 | 96.2 | 98.0 | 97.3 | 96.7 | 98.2 | 98.8 | 98.1 | | General Coding | LiveCodeBench (24/8~25/5) | 65.0 | 62.3 | 65.9 | 73.1 | 55.9 | 67.5 | 56.6 | 77.1 | 75.8 | | | FullStackBench | 68.3 | 67.6 | 62.9 | 69.4 | 70.1 | 69.9 | 70.3 | -- | 69.3 | | Reasoning & Knowledge| GPQA Diamond | 70.0 | 69.2 | 71.1 | 81.0 | 71.5 | 77.3 | 79.6 | 86.4 | 83.3 | | | HLE (no tools) | 8.4\ | 7.2\ | 7.6\ | 17.7\ | 8.6\ | 8.2 | 10.7 | 21.6 | 20.3 | | | ZebraLogic | 86.8 | 80.1 | 80.3 | 95.1 | 78.7 | 84.4 | 95.1 | 91.6 | 95.8 | | | MMLU-Pro | 81.1 | 80.6 | 83.0 | 85.0 | 84.0 | 87.0 | 85.0 | 86.0 | 85.0 | | Software Engineering| SWE-bench Verified| 56.0 | 55.6 | 34.4 | 57.6 | 49.2 | 47.0 | 72.5 | 67.2 | 69.1 | | Long Context | OpenAI-MRCR (128k) | 73.4 | 76.1 | 27.7 | 51.5 | 35.8 | 54.3 | 48.9 | 76.8 | 56.5 | | | OpenAI-MRCR (1M) | 56.2 | 58.6 | -- | -- | -- | -- | -- | 58.8 | -- | | | LongBench-v2 | 61.5 | 61.0 | 50.1 | 52.1 | 58.3 | 52.5 | 55.6 | 65.0 | 58.8 | | Agentic Tool Use| TAU-bench (airline) | 62.0 | 60.0 | 34.7 | 53.5 | -- | 44.0 | 59.6 | 50.0 | 52.0 | | | TAU-bench (retail) | 63.5 | 67.8 | 58.6 | 63.9 | -- | 55.7 | 81.4 | 67.0 | 73.9 | | Factuality | SimpleQA | 18.5 | 17.9 | 11.0 | 27.8 | 30.1 | 12.9 | -- | 54.0 | 49.4 | | General Assistant| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 | Our models are evaluated with `temperature=1.0`, `topp=0.95`. SWE-bench methodology We report results derived from the Agentless scaffold. Departing from the original pipeline, our methodology employs a two-stage localization process (without any embedding-based retrieval mechanisms): initial coarse-grained file localization followed by fine-grained localization to specific files and code elements. The values for our models are calculated on the subset of n=486 verified tasks which work on our infrastructure. The excluded 14 test cases that were incompatible with our internal infrastructure are: `"astropyastropy-7606"`, `"astropyastropy-8707"`, `"astropyastropy-8872"`, `"djangodjango-10097"`, `"matplotlibmatplotlib-20488"`, `"psfrequests-2317"`, `"psfrequests-2931"`, `"psfrequests-5414"`, `"pylint-devpylint-6528"`, `"pylint-devpylint-7277"`, `"sphinx-docsphinx-10435"`, `"sphinx-docsphinx-7985"`, `"sphinx-docsphinx-8269"`, `"sphinx-docsphinx-8475"` TAU-bench methodology We evaluate TAU-Bench with GPT-4.1 as user model and without any custom tools. The maximum number of interaction steps is 40. Our general system prompt is: To achieve the best results with the Minimax-M1 model, we suggest focusing on two key points: Inference Parameters and the System Prompt. 3.1. Inference Parameters - Temperature: `1.0` - Topp: `0.95` This setting is optimal for encouraging creativity and diversity in the model's responses. It allows the model to explore a wider range of linguistic possibilities, preventing outputs that are too rigid or repetitive, while still maintaining strong logical coherence. 3.2. System Prompt Tailoring your system prompt to the specific task is crucial for guiding the model effectively. Below are suggested settings for different scenarios. A. General-Purpose Scenarios For common tasks like summarization, translation, Q&A, or creative writing: B. Web Development Scenarios For complex tasks like generating code for web pages: C. Mathematical Scenarios When dealing with problems that require calculation or logical deduction: Download the model from HuggingFace repository: - MiniMax-M1-40k - MiniMax-M1-80k For production deployment, we recommend using vLLM to serve MiniMax-M1. vLLM provides excellent performance for serving large language models with the following features: - 🔥 Outstanding service throughout performance - ⚡ Efficient and intelligent memory management - 📦 Powerful batch request processing capability - ⚙️ Deeply optimized underlying performance For detailed vLLM deployment instructions, please refer to our vLLM Deployment Guide. Alternatively, you can also deploy using Transformers directly. For detailed Transformers deployment instructions, you can see our MiniMax-M1 Transformers Deployment Guide. The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. MiniMax-M1 Function Call Guide provides detailed instructions on how to use the function calling feature of MiniMax-M1. 6. Chatbot & API For general use and evaluation, we provide a Chatbot with online search capabilities and the online API for developers. For general use and evaluation, we provide the MiniMax MCP Server with video generation, image generation, speech synthesis, and voice cloning for developers.

license:apache-2.0

SynLogic-32B

NaNK

license:mit

MiniMax M1 40k Hf

This repository is primarily for the Transformers framework. If you're using other open-source frameworks, please use the alternative repository: MiniMax-M1-40k We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute – For example, compared to DeepSeek R1, M1 consumes 25% of the FLOPs at a generation length of 100K tokens. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems ranging from traditional mathematical reasoning to sandbox-based, real-world software engineering environments. We develop an efficient RL scaling framework for M1 highlighting two perspectives: (1) We propose CISPO, a novel algorithm that clips importance sampling weights instead of token updates, which outperforms other competitive RL variants; (2) Our hybrid-attention design naturally enhances the efficiency of RL, where we address unique challenges when scaling RL with the hybrid architecture. We train two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively. Experiments on standard benchmarks show that our models outperform other strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, particularly on complex software engineering, tool using, and long context tasks. With efficient scaling of test-time compute, MiniMax-M1 serves as a strong foundation for next-generation language model agents to reason and tackle real-world challenges. Benchmark performance comparison of leading commercial and open-weight models across competition-level mathematics, coding, software engineering, agentic tool use, and long-context understanding tasks. We use the MiniMax-M1-80k model here for MiniMax-M1. | Category | Task | MiniMax-M1-80K | MiniMax-M1-40K | Qwen3-235B-A22B | DeepSeek-R1-0528 | DeepSeek-R1 | Seed-Thinking-v1.5 | Claude 4 Opus | Gemini 2.5 Pro (06-05) | OpenAI-o3 | |:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | | Extended Thinking | 80K | 40K | 32k | 64k | 32k | 32k | 64k | 64k | 100k | | Mathematics | AIME 2024 | 86.0 | 83.3 | 85.7 | 91.4 | 79.8 | 86.7 | 76.0 | 92.0 | 91.6 | | | AIME 2025 | 76.9 | 74.6 | 81.5 | 87.5 | 70.0 | 74.0 | 75.5 | 88.0 | 88.9 | | | MATH-500 | 96.8 | 96.0 | 96.2 | 98.0 | 97.3 | 96.7 | 98.2 | 98.8 | 98.1 | | General Coding | LiveCodeBench (24/8~25/5) | 65.0 | 62.3 | 65.9 | 73.1 | 55.9 | 67.5 | 56.6 | 77.1 | 75.8 | | | FullStackBench | 68.3 | 67.6 | 62.9 | 69.4 | 70.1 | 69.9 | 70.3 | -- | 69.3 | | Reasoning & Knowledge| GPQA Diamond | 70.0 | 69.2 | 71.1 | 81.0 | 71.5 | 77.3 | 79.6 | 86.4 | 83.3 | | | HLE (no tools) | 8.4\ | 7.2\ | 7.6\ | 17.7\ | 8.6\ | 8.2 | 10.7 | 21.6 | 20.3 | | | ZebraLogic | 86.8 | 80.1 | 80.3 | 95.1 | 78.7 | 84.4 | 95.1 | 91.6 | 95.8 | | | MMLU-Pro | 81.1 | 80.6 | 83.0 | 85.0 | 84.0 | 87.0 | 85.0 | 86.0 | 85.0 | | Software Engineering| SWE-bench Verified| 56.0 | 55.6 | 34.4 | 57.6 | 49.2 | 47.0 | 72.5 | 67.2 | 69.1 | | Long Context | OpenAI-MRCR (128k) | 73.4 | 76.1 | 27.7 | 51.5 | 35.8 | 54.3 | 48.9 | 76.8 | 56.5 | | | OpenAI-MRCR (1M) | 56.2 | 58.6 | -- | -- | -- | -- | -- | 58.8 | -- | | | LongBench-v2 | 61.5 | 61.0 | 50.1 | 52.1 | 58.3 | 52.5 | 55.6 | 65.0 | 58.8 | | Agentic Tool Use| TAU-bench (airline) | 62.0 | 60.0 | 34.7 | 53.5 | -- | 44.0 | 59.6 | 50.0 | 52.0 | | | TAU-bench (retail) | 63.5 | 67.8 | 58.6 | 63.9 | -- | 55.7 | 81.4 | 67.0 | 73.9 | | Factuality | SimpleQA | 18.5 | 17.9 | 11.0 | 27.8 | 30.1 | 12.9 | -- | 54.0 | 49.4 | | General Assistant| MultiChallenge | 44.7 | 44.7 | 40.0 | 45.0 | 40.7 | 43.0 | 45.8 | 51.8 | 56.5 | Our models are evaluated with `temperature=1.0`, `topp=0.95`. SWE-bench methodology We report results derived from the Agentless scaffold. Departing from the original pipeline, our methodology employs a two-stage localization process (without any embedding-based retrieval mechanisms): initial coarse-grained file localization followed by fine-grained localization to specific files and code elements. The values for our models are calculated on the subset of n=486 verified tasks which work on our infrastructure. The excluded 14 test cases that were incompatible with our internal infrastructure are: `"astropyastropy-7606"`, `"astropyastropy-8707"`, `"astropyastropy-8872"`, `"djangodjango-10097"`, `"matplotlibmatplotlib-20488"`, `"psfrequests-2317"`, `"psfrequests-2931"`, `"psfrequests-5414"`, `"pylint-devpylint-6528"`, `"pylint-devpylint-7277"`, `"sphinx-docsphinx-10435"`, `"sphinx-docsphinx-7985"`, `"sphinx-docsphinx-8269"`, `"sphinx-docsphinx-8475"` TAU-bench methodology We evaluate TAU-Bench with GPT-4.1 as user model and without any custom tools. The maximum number of interaction steps is 40. Our general system prompt is: To achieve the best results with the Minimax-M1 model, we suggest focusing on two key points: Inference Parameters and the System Prompt. 3.1. Inference Parameters - Temperature: `1.0` - Topp: `0.95` This setting is optimal for encouraging creativity and diversity in the model's responses. It allows the model to explore a wider range of linguistic possibilities, preventing outputs that are too rigid or repetitive, while still maintaining strong logical coherence. 3.2. System Prompt Tailoring your system prompt to the specific task is crucial for guiding the model effectively. Below are suggested settings for different scenarios. A. General-Purpose Scenarios For common tasks like summarization, translation, Q&A, or creative writing: B. Web Development Scenarios For complex tasks like generating code for web pages: C. Mathematical Scenarios When dealing with problems that require calculation or logical deduction: Download the model from HuggingFace repository: - MiniMax-M1-40k - MiniMax-M1-80k For production deployment, we recommend using vLLM to serve MiniMax-M1. vLLM provides excellent performance for serving large language models with the following features: - 🔥 Outstanding service throughout performance - ⚡ Efficient and intelligent memory management - 📦 Powerful batch request processing capability - ⚙️ Deeply optimized underlying performance For detailed vLLM deployment instructions, please refer to our vLLM Deployment Guide. Alternatively, you can also deploy using Transformers directly. For detailed Transformers deployment instructions, you can see our MiniMax-M1 Transformers Deployment Guide. The MiniMax-M1 model supports function calling capabilities, enabling the model to identify when external functions need to be called and output function call parameters in a structured format. MiniMax-M1 Function Call Guide provides detailed instructions on how to use the function calling feature of MiniMax-M1. 6. Chatbot & API For general use and evaluation, we provide a Chatbot with online search capabilities and the online API for developers. For general use and evaluation, we provide the MiniMax MCP Server with video generation, image generation, speech synthesis, and voice cloning for developers.

license:apache-2.0