mistralai

✓ VerifiedAI Startup

European AI champion, open and commercial models

71 models • 24 total models in database

Sort by:

Mistral-7B-Instruct-v0.2

--- library_name: transformers license: apache-2.0 tags: - finetuned - mistral-common new_version: mistralai/Mistral-7B-Instruct-v0.3 inference: false widget: - messages: - role: user content: What is your favorite condiment? extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. ---

Mistral-7B-Instruct-v0.3

--- library_name: vllm license: apache-2.0 base_model: mistralai/Mistral-7B-v0.3 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm - mistral-common ---

license:apache-2.0

Voxtral-Mini-4B-Realtime-2602

license:apache-2.0

Mistral-7B-Instruct-v0.1

--- library_name: transformers license: apache-2.0 tags: - finetuned - mistral-common base_model: mistralai/Mistral-7B-v0.1 inference: false widget: - messages: - role: user content: What is your favorite condiment? extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. ---

license:apache-2.0

Mixtral-8x7B-Instruct-v0.1

--- library_name: vllm language: - fr - it - de - es - en license: apache-2.0 base_model: mistralai/Mixtral-8x7B-v0.1 inference: false widget: - messages: - role: user content: What is your favorite condiment? extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm ---

license:apache-2.0

Mistral-7B-v0.1

--- library_name: transformers language: - en license: apache-2.0 tags: - pretrained - mistral-common inference: false extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. ---

license:apache-2.0

Voxtral-Mini-3B-2507

--- library_name: mistral-common language: - en - fr - de - es - it - pt - nl - hi license: apache-2.0 inference: false extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm ---

license:apache-2.0

Devstral-Small-2-24B-Instruct-2512

license:apache-2.0

Ministral-3-3B-Instruct-2512

license:apache-2.0

Ministral-3-14B-Reasoning-2512

license:apache-2.0

Mathstral-7B-v0.1

--- library_name: mistral-common license: apache-2.0 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm ---

license:apache-2.0

Mistral-Small-3.2-24B-Instruct-2506

license:apache-2.0

Mistral-Small-24B-Instruct-2501

--- library_name: vllm language: - en - fr - de - es - it - pt - zh - ja - ru - ko license: apache-2.0 inference: false base_model: - mistralai/Mistral-Small-24B-Base-2501 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm ---

license:apache-2.0

Ministral-8B-Instruct-2410

--- library_name: vllm language: - en - fr - de - es - it - pt - zh - ja - ru - ko license: other license_name: mrl inference: false license_link: https://mistral.ai/licenses/MRL-0.1.md extra_gated_prompt: >-

Mistral-Nemo-Instruct-2407

--- library_name: vllm language: - en - fr - de - es - it - pt - ru - zh - ja license: apache-2.0 base_model: mistralai/Mistral-Nemo-Base-2407 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - mistral-common ---

license:apache-2.0

Ministral-3-14B-Instruct-2512

license:apache-2.0

Mistral-Small-3.1-24B-Instruct-2503

--- library_name: vllm language: - en - fr - de - es - pt - it - ja - ko - ru - zh - ar - fa - id - ms - ne - pl - ro - sr - sv - tr - uk - vi - hi - bn license: apache-2.0 inference: false base_model: - mistralai/Mistral-Small-3.1-24B-Base-2503 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - mistral-common ---

license:apache-2.0

Mixtral-8x7B-v0.1

--- library_name: vllm license: apache-2.0 language: - fr - it - de - es - en tags: - moe - mistral-common extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. ---

license:apache-2.0

Ministral-3-8B-Reasoning-2512

license:apache-2.0

Mistral-7B-v0.3

--- library_name: vllm license: apache-2.0 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - mistral-common ---

license:apache-2.0

Devstral-Small-2507

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this benchmark. It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed. For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. Updates compared to `Devstral Small 1.0`: - Improved performance, please refer to the benchmark results. - `Devstral Small 1.1` is still great when paired with OpenHands. This new version also generalizes better to other prompts and coding environments. - Supports Mistral's function calling format. Key Features: - Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. - lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. - Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size. Devstral Small 1.1 achieves a score of 53.6% on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%. | Model | Agentic Scaffold | SWE-Bench Verified (%) | |--------------------|--------------------|------------------------| | Devstral Small 1.1 | OpenHands Scaffold | 53.6 | | Devstral Small 1.0 | OpenHands Scaffold | 46.8 | | GPT-4.1-mini | OpenAI Scaffold | 23.6 | | Claude 3.5 Haiku | Anthropic Scaffold | 40.6 | | SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 | | Skywork SWE | OpenHands Scaffold | 38.0 | | DeepSWE | R2E-Gym Scaffold | 42.2 | When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B. We recommend to use Devstral with the OpenHands scaffold. You can use it either through our API or by running locally. API Follow these instructions to create a Mistral account and get an API key. Then run these commands to start the OpenHands docker container. The model can also be deployed with the following libraries: - `vllm (recommended)`: See here - `mistral-inference`: See here - `transformers`: See here - `LMStudio`: See here - `llama.cpp`: See here - `ollama`: See here Expand = 0.9.1`](https://github.com/vllm-project/vllm/releases/tag/v0.9.1): Also make sure to have installed `mistralcommon >= 1.7.0`. You can also make use of a ready-to-go docker image or on the docker hub. We recommand that you use Devstral in a server/client setting. 2. To ping the client you can use a simple Python snippet. Then load our tokenizer along with the model and generate: Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.1`. In the case of the tutorial we spineed up a vLLM server running the command: The server address should be in the following format: `http:// :8000/v1` The easiest way to launch OpenHands is to use the Docker image: Then, you can access the OpenHands UI at `http://localhost:3000`. When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier. Fill the following fields: - Custom Model: `openai/mistralai/Devstral-Small-2507` - Base URL: `http:// :8000/v1` - API Key: `token` (or any other token you used to launch the server if any) Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.1`. In the case of the tutorial we spineed up a vLLM server running the command: The server address should be in the following format: `http:// :8000/v1` You can follow installation of Cline here. Then you can configure the server address in the settings. OpenHands:Understanding Test Coverage of Mistral Common We can start the OpenHands scaffold and link it to a repo to analyze test coverage and identify badly covered files. Here we start with our public `mistral-common` repo. After the repo is mounted in the workspace, we give the following instruction The agent will first browse the code base to check test configuration and structure. Then it sets up the testing dependencies and launches the coverage test: Finally, the agent writes necessary code to visualize the coverage, export the results and save the plots to a png. At the end of the run, the following plots are produced: First initialize Cline inside VSCode and connect it to the server you launched earlier. We give the following instruction to builde the video game: Don't hesitate to iterate or give more information to Devstral to improve the game!

license:apache-2.0

Mistral-Small-4-119B-2603

license:apache-2.0

Ministral-3-8B-Instruct-2512

license:apache-2.0

Magistral-Small-2506

license:apache-2.0

Ministral-3-14B-Base-2512

license:apache-2.0

Ministral-3-3B-Reasoning-2512

license:apache-2.0

Mistral-Nemo-Base-2407

The Mistral-Nemo-Base-2407 Large Language Model (LLM) is a pretrained generative text model of 12B parameters trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size. For more details about this model please refer to our release blog post. Key features - Released under the Apache 2 License - Pre-trained and instructed versions - Trained with a 128k context window - Trained on a large proportion of multilingual and code data - Drop-in replacement of Mistral 7B Model Architecture Mistral Nemo is a transformer model, with the following architecture choices: - Layers: 40 - Dim: 5,120 - Head dim: 128 - Hidden dim: 14,436 - Activation Function: SwiGLU - Number of heads: 32 - Number of kv-heads: 8 (GQA) - Vocabulary size: 217 ~= 128k - Rotary embeddings (theta = 1M) | Benchmark | Score | | --- | --- | | HellaSwag (0-shot) | 83.5% | | Winogrande (0-shot) | 76.8% | | OpenBookQA (0-shot) | 60.6% | | CommonSenseQA (0-shot) | 70.4% | | TruthfulQA (0-shot) | 50.3% | | MMLU (5-shot) | 68.0% | | TriviaQA (5-shot) | 73.8% | | NaturalQuestions (5-shot) | 31.2% | | Language | Score | | --- | --- | | French | 62.3% | | German | 62.7% | | Spanish | 64.6% | | Italian | 61.3% | | Portuguese | 63.3% | | Russian | 59.2% | | Chinese | 59.0% | | Japanese | 59.0% | The model can be used with three different frameworks - `mistralinference`: See here - `transformers`: See here - `NeMo`: See nvidia/Mistral-NeMo-12B-Base It is recommended to use `mistralai/Mistral-Nemo-Base-2407` with mistral-inference. For HF transformers code snippets, please keep scrolling. After installing `mistralinference`, a `mistral-demo` CLI command should be available in your environment. > [!IMPORTANT] > NOTE: Until a new release has been made, you need to install transformers from source: > If you want to use Hugging Face `transformers` to generate text, you can do something like this. > [!TIP] > Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3. `Mistral-Nemo-Base-2407` is a pretrained base model and therefore does not have any moderation mechanisms. Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

license:apache-2.0

Mistral-Small-24B-Base-2501

license:apache-2.0

Mistral-Small-Instruct-2409

Mistral-Small-Instruct-2409 is an instruct fine-tuned version with the following characteristics: - 22B parameters - Vocabulary to 32768 - Supports function calling - 32k sequence length We recommend using this model with the vLLM library to implement production-ready inference pipelines. Also make sure you have `mistralcommon >= 1.4.1` installed: You can also make use of a ready-to-go docker image. You can also use Mistral Small in a server/client setting. Note: Running Mistral-Small on a single GPU requires at least 44 GB of GPU RAM. If you want to divide the GPU requirement over multiple devices, please add e.g. `--tensorparallel=2` We recommend using mistral-inference to quickly try out / "vibe-check" the model. Make sure to have `mistralinference >= 1.4.1` installed. After installing `mistralinference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using You can also use Hugging Face `transformers` library to run inference using various chat templates, or fine-tune the model. Example for inference: Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

Devstral-2-123B-Instruct-2512

Ministral-3-3B-Instruct-2512-GGUF

license:apache-2.0

Mixtral-8x22B-Instruct-v0.1

> [!TIP] > PRs to correct the `transformers` tokenizer so that it gives 1-to-1 the same results as the `mistralcommon` reference implementation are very welcome! --- The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1. To use this example, you'll need `transformers` version 4.42.0 or higher. Please see the function calling guide in the `transformers` docs for more information. Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool results to the chat history so that the model can use them in its next generation. For a full tool calling example, please see the function calling guide, and note that Mixtral does use tool call IDs, so these must be included in your tool calls and tool results. They should be exactly 9 alphanumeric characters. Instruct tokenizer The HuggingFace tokenizer included in this release should match our own. To compare: `pip install mistral-common` Function calling and special tokens This tokenizer includes more special tokens, related to function calling : - [TOOLCALLS] - [AVAILABLETOOLS] - [/AVAILABLETOOLS] - [TOOLRESULTS] - [/TOOLRESULTS] If you want to use this model with function calling, please be sure to apply it similarly to what is done in our SentencePieceTokenizerV3. The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall

license:apache-2.0

Mistral-Large-Instruct-2407

Codestral-22B-v0.1

Devstral-Small-2505

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this benchmark. It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed. For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. Key Features: - Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. - lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. - Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size. Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA by 6%. | Model | Scaffold | SWE-Bench Verified (%) | |------------------|--------------------|------------------------| | Devstral | OpenHands Scaffold | 46.8 | | GPT-4.1-mini | OpenAI Scaffold | 23.6 | | Claude 3.5 Haiku | Anthropic Scaffold | 40.6 | | SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 | When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B. We recommend to use Devstral with the OpenHands scaffold. You can use it either through our API or by running locally. API Follow these instructions to create a Mistral account and get an API key. Then run these commands to start the OpenHands docker container. The model can also be deployed with the following libraries: - `vllm (recommended)`: See here - `mistral-inference`: See here - `transformers`: See here - `LMStudio`: See here - `llama.cpp`: See here - `ollama`: See here Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.0`. In the case of the tutorial we spineed up a vLLM server running the command: The server address should be in the following format: `http:// :8000/v1` The easiest way to launch OpenHands is to use the Docker image: Then, you can access the OpenHands UI at `http://localhost:3000`. When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier. Fill the following fields: - Custom Model: `openai/mistralai/Devstral-Small-2505` - Base URL: `http:// :8000/v1` - API Key: `token` (or any other token you used to launch the server if any) Now you're good to use Devstral Small inside OpenHands by starting a new conversation. Let's build a To-Do list app. We recommend using this model with the vLLM library to implement production-ready inference pipelines. Doing so should automatically install `mistralcommon >= 1.5.5`. You can also make use of a ready-to-go docker image or on the docker hub. We recommand that you use Devstral in a server/client setting. 2. To ping the client you can use a simple Python snippet. We recommend using mistral-inference to quickly try out / "vibe-check" Devstral. Make sure to have mistralinference >= 1.6.0 installed. To make the best use of our model with transformers make sure to have installed ` mistral-common >= 1.5.5` to use our tokenizer. Then load our tokenizer along with the model and generate: You can serve the model locally with LMStudio. Download LM Studio and install it Install `lms cli ~/.lmstudio/bin/lms bootstrap` In a bash terminal, run `lms import devstralQ4KM.gguf` in the directory where you've downloaded the model checkpoint (e.g. `mistralai/Devstral-Small-2505gguf`) Open the LMStudio application, click the terminal icon to get into the developer tab. Click select a model to load and select Devstral Q4 K M. Toggle the status button to start the model, in setting toggle Serve on Local Network to be on. On the right tab, you will see an API identifier which should be devstralq4km and an api address under API Usage. Keep note of this address, we will use it in the next step. Launch Openhands You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker Click “see advanced setting” on the second line. In the new tab, toggle advanced to on. Set the custom model to be mistral/devstralq4km and Base URL the api address we get from the last step in LM Studio. Set API Key to dummy. Click save changes. Example: Understanding Test Coverage of Mistral Common We can start the OpenHands scaffold and link it to a repo to analyze test coverage and identify badly covered files. Here we start with our public `mistral-common` repo. After the repo is mounted in the workspace, we give the following instruction The agent will first browse the code base to check test configuration and structure. Then it sets up the testing dependencies and launches the coverage test: Finally, the agent writes necessary code to visualize the coverage. At the end of the run, the following plots are produced:

license:apache-2.0

Mistral-Large-Instruct-2411

Mistral-Large-Instruct-2411 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities extending Mistral-Large-Instruct-2407 with better Long Context, Function Calling and System Prompt. Key features - Multi-lingual by design: Dozens of languages supported, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch and Polish. - Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, Javacsript, and Bash. Also trained on more specific languages such as Swift and Fortran. - Agent-centric: Best-in-class agentic capabilities with native function calling and JSON outputting. - Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities. - Mistral Research License: Allows usage and modification for non-commercial usages. - Large Context: A large 128k context window. - Robust Context Adherence: Ensures strong adherence for RAG and large context applications. - System Prompt: Maintains strong adherence and support for more reliable system prompts. System Prompt We appreciate the feedback received from our community regarding our system prompt handling. In response, we have implemented stronger support for system prompts. To achieve optimal results, we recommend always including a system prompt that clearly outlines the bot's purpose, even if it is minimal. Be careful with subtle missing or trailing white spaces! Please make sure to use mistral-common as the source of truth The model can be used with the following frameworks We recommend using this model with the vLLM library to implement production-ready inference pipelines. Also make sure you have `mistralcommon >= 1.5.0` installed: You can also make use of a ready-to-go docker image or on the docker hub. We recommand that you use Mistral-Large-Instruct-2411 in a server/client setting. Note: Running Mistral-Large-Instruct-2411 on GPU requires over 300 GB of GPU RAM. 2. To ping the client you can use a simple Python snippet. Mistral-Large-2411 has much improved function calling capabilities that are fully supported using `mistralcommon >= 1.5.0` and `vLLM >= v0.6.4.post1`. Make sure to serve the model with the following flags in vLLM: Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

Ministral-3-3B-Base-2512

license:apache-2.0

Voxtral-Small-24B-2507

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Learn more about Voxtral in our blog post here and our research paper. Voxtral builds upon Mistral Small 3 with powerful audio understanding capabilities. - Dedicated transcription mode: Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly - Long-form context: With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding - Built-in Q&A and summarization: Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models - Natively multilingual: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian) - Function-calling straight from voice: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents - Highly capable at text: Retains the text understanding capabilities of its language model backbone, Mistral Small 3.1 Average word error rate (WER) over the FLEURS, Mozilla Common Voice and Multilingual LibriSpeech benchmarks: The model can be used with the following frameworks; - `vllm (recommended)`: See here - `Transformers` 🤗: See here - `temperature=0.2` and `topp=0.95` for chat completion (e.g. Audio Understanding) and `temperature=0.0` for transcription - Multiple audios per message and multiple user turns with audio are supported - Function calling is supported - System prompts are not yet supported Make sure to install vllm >= `0.10.0`, we recommend using uv Doing so should automatically install `mistralcommon >= 1.8.1`. You can test that your vLLM setup works as expected by cloning the vLLM repo: We recommend that you use Voxtral-Small-24B-2507 in a server/client setting. Note: Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM in bf16 or fp16. 2. To ping the client you can use a simple Python snippet. See the following examples. Leverage the audio capabilities of Voxtral-Small-24B-2507 to chat. Make sure that your client has `mistral-common` with audio installed: L'orateur le plus inspirant est le président. Il est plus inspirant parce qu'il parle de ses expériences personnelles et de son optimisme pour l'avenir du pays. Il est différent de l'autre orateur car il ne parle pas de la météo, mais plutôt de ses interactions avec les gens et de son rôle en tant que président. Voxtral-Small-24B-2507 has powerful transcription capabilities! Make sure that your client has `mistral-common` with audio installed: Voxtral has some experimental function calling support. You can try as shown below. Make sure that your client has `mistral-common` with audio installed: Starting with `transformers >= 4.54.0` and above, you can run Voxtral natively! Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:

license:apache-2.0

Mamba-Codestral-7B-v0.1

license:apache-2.0

Ministral-3-3B-Instruct-2512-BF16

license:apache-2.0

Magistral-Small-2509

Building upon Mistral Small 3.2 (2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters. Magistral Small can be deployed locally, fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized. - Multimodality: The model now has a vision encoder and can take multimodal inputs, extending its reasoning capabilities to vision. - Performance upgrade: Magistral Small 1.2 should give you significantly better performance than Magistral Small 1.1 as seen in the benchmark results. - Better tone and persona: You should experience better LaTeX and Markdown formatting, and shorter answers on easy general prompts. - Finite generation: The model is less likely to enter infinite generation loops. - Special think tokens: [THINK] and [/THINK] special tokens encapsulate the reasoning content in a thinking chunk. This makes it easier to parse the reasoning trace and prevents confusion when the '[THINK]' token is given as a string in the prompt. - Reasoning prompt: The reasoning prompt is given in the system prompt. - Reasoning: Capable of long chains of reasoning traces before providing an answer. - Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, and Farsi. - Vision: Vision capabilities enable the model to analyze images and reason based on visual content in addition to text. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. Performance might degrade past 40k but Magistral should still give good results. Hence we recommend to leave the maximum model length to 128k and only lower if you encounter low performance. | Model | AIME24 pass@1 | AIME25 pass@1 | GPQA Diamond | Livecodebench (v5) | |--------------------------|---------------|---------------|--------------|--------------------| | Magistral Medium 1.2 | 91.82% | 83.48% | 76.26% | 75.00% | | Magistral Medium 1.1 | 72.03% | 60.99% | 71.46% | 59.35% | | Magistral Medium 1.0 | 73.59% | 64.95% | 70.83% | 59.36% | | Magistral Small 1.2 | 86.14% | 77.34% | 70.07% | 70.88% | | Magistral Small 1.1 | 70.52% | 62.03% | 65.78% | 59.17% | | Magistral Small 1.0 | 70.68% | 62.76% | 68.18% | 55.84% | Please make sure to use: - `topp`: 0.95 - `temperature`: 0.7 - `maxtokens`: 131072 We highly recommend including the following system prompt for the best results, you can edit and customise it if needed for your specific use case. The `[THINK]` and `[/THINK]` are special tokens that must be encoded as such. Please make sure to use mistral-common as the source of truth. Find below examples from libraries supporting `mistral-common`. We invite you to choose, depending on your use case and requirements, between keeping reasoning traces during multi-turn interactions or keeping only the final assistant response. The model can be used with the following frameworks. - `vllm (recommended)`: See below - `transformers`: See below - `llama.cpp`: See https://huggingface.co/mistralai/Magistral-Small-2509-GGUF - `Unsloth GGUFs`: See https://huggingface.co/unsloth/Magistral-Small-2509-GGUF - `Kaggle`: See https://www.kaggle.com/models/mistral-ai/magistral-small-2509 - `LM Studio`: See https://lmstudio.ai/models/mistralai/magistral-small-2509 - `Axolotl`: See https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/magistral - `Unsloth`: See https://docs.unsloth.ai/models/tutorials-how-to-fine-tune-and-run-llms/magistral-how-to-run-and-fine-tune We recommend using this model with the vLLM library to implement production-ready inference pipelines. Doing so should automatically install `mistralcommon >= 1.8.5`. You can also make use of a ready-to-go docker image or on the docker hub. Make sure you install the latest `Transformers` version:

license:apache-2.0

Ministral-3-14B-Instruct-2512-GGUF

license:apache-2.0

Magistral-Small-2509-GGUF

> [!Note] > To make our models more accesible to everyone, this repo provides a basic GGUF checkpoint compatible with llama.cpp > and mistral-common. > > In addition to using this GGUF checkpoint, we encourage the community to use other GGUF variants, e.g. > from Unsloth, LM Studio, ... > > If you encounter any problems with the provided checkpoints here, please open a discussion or pull request. Building upon Mistral Small 3.2 (2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters. Magistral Small can be deployed locally, fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized. This is the GGUF version of the Magistral-Small-2509 model. We released the BF16 weights as well as the following quantized format: - Q80 - Q5KM - Q4KM We do not release alongside our GGUF files: - An official chat template. Instead, we recommend using `mistral-common`, which serves as our source of truth for tokenization and detokenization. Llama.cpp automatically loads a chat template, but it is in most likelihood incorrect for Magistral. - The vision encoder, since our recommended usage does not involve multimodality. - Multimodality: The model now has a vision encoder and can take multimodal inputs, extending its reasoning capabilities to vision. - Performance upgrade: Magistral Small 1.2 should give you significatively better performance than Magistral Small 1.1 as seen in the benchmark results. - Better tone and persona: You should experiment better LaTeX and Markdown formatting, and shorter answers on easy general prompts. - Finite generation: The model is less likely to enter infinite generation loops. - Special think tokens: [THINK] and [/THINK] special tokens encapsulate the reasoning content in a thinking chunk. This makes it easier to parse the reasoning trace and prevents confusion when the '[THINK]' token is given as a string in the prompt. - Reasoning prompt: The reasoning prompt is given in the system prompt. Key Features - Reasoning: Capable of long chains of reasoning traces before providing an answer. - Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, and Farsi. - Vision: Vision capabilities enable the model to analyze images and reason based on visual content in addition to text available with our main model Magistral-Small-2509. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. Performance might degrade past 40k but Magistral should still give good results. Hence we recommend to leave the maximum model length to 128k and only lower if you encounter low performance. We recommend to use Magistral Small 1.2 GGUF with llama.cpp along with mistral-common >= 1.8.5 server. See here for the documentation of `mistral-common` server. > [!Note] > We do not believe we can guarantee correct behavior using the integrated, stringified chat template hence > mistral-common should be used as a reference. However, we strongly encourage the community members to use this GGUF checkpoint > and mistralcommon as a reference implementation to build a correct integrated, stringified chat template. 2. Install `mistral-common >= 1.8.5` with its dependencies. 2. Launch the `mistral-common` server and pass the url of the `llama.cpp` server. This is the server that will handle tokenization and detokenization and call the `llama.cpp` server for generations. generate: call `mistral-common` that will tokenizer, call the `llama.cpp` server to generate new tokens and detokenize the output to an `AssistantMessage` with think chunk and tool calls parsed. 2. Tokenize the input, call the model and detokenize

Ministral-3-8B-Instruct-2512-GGUF

license:apache-2.0

Ministral-3-8B-Instruct-2512-BF16

license:apache-2.0

Magistral-Small-2507

license:apache-2.0

Ministral-3-8B-Base-2512

license:apache-2.0

Mistral-Small-3.1-24B-Base-2503

license:apache-2.0

Ministral-3-14B-Reasoning-2512-GGUF

license:apache-2.0

Ministral-3-14B-Instruct-2512-BF16

license:apache-2.0

Voxtral-4B-TTS-2603

license:cc-by-nc-4.0

Pixtral-12B-2409

The Pixtral-12B-2409 is a Multimodal Model of 12B parameters plus a 400M parameter vision encoder. For more details about this model please refer to our release blog post. Key features - Natively multimodal, trained with interleaved image and text data - 12B parameter Multimodal Decoder + 400M parameter Vision Encoder - Supports variable image sizes - Leading performance in its weight class on multimodal tasks - Maintains state-of-the-art performance on text-only benchmarks - Sequence length: 128k - License: Apache 2.0 Benchmarks The performance of Pixtral-12B-2409 compared to multimodal models. All models were re-evaluated and benchmarked through the same evaluation pipeline. | | Pixtral 12B | Qwen2 7B VL | LLaVA-OV 7B | Phi-3 Vision | Phi-3.5 Vision | |:-------------------:|:-------------:|:----------:|:-------------:|:--------------:|:--------------:| | MMMU (CoT) | 52.5 | 47.6 | 45.1 | 40.3 | 38.3 | | Mathvista (CoT) | 58.0 | 54.4 | 36.1 | 36.4 | 39.3 | | ChartQA (CoT) | 81.8 | 38.6 | 67.1 | 72.0 | 67.7 | | DocVQA (ANLS) | 90.7 | 94.5 | 90.5 | 84.9 | 74.4 | | VQAv2 (VQA Match) | 78.6 | 75.9 | 78.3 | 42.4 | 56.1 | | | Pixtral 12B | Qwen2 7B VL | LLaVA-OV 7B | Phi-3 Vision | Phi-3.5 Vision | |:-------------------:|:-------------:|:----------:|:-------------:|:--------------:|:--------------:| | MM MT-Bench | 6.05 | 5.43 | 4.12 | 3.70 |4.46 | | Text MT-Bench | 7.68 | 6.41 | 6.94 | 6.27 |6.31 | | MM IF-Eval | 52.7 | 38.9 | 42.5 | 41.2 |31.4 | | Text IF-Eval | 61.3 | 50.1 | 51.4 | 50.9 |47.4 | | | Pixtral 12B | Qwen2 7B VL | LLaVA-OV 7B | Phi-3 Vision | Phi-3.5 Vision | |:-------------------:|:-------------:|:----------:|:-------------:|:--------------:|:--------------:| | MMLU (5-shot) | 69.2 | 68.5 | 67.9 | 63.5 | 63.6 | | Math (Pass@1) | 48.1 | 27.8 | 38.6 | 29.2 | 28.4 | | Human Eval (Pass@1) | 72.0 | 64.6 | 65.9 | 48.8 | 49.4 | Comparison with Closed Source and Larger Models | | Pixtral 12B | Claude-3 Haiku | Gemini-1.5 Flash 8B (0827) | . |LLaVA-OV 72B | GPT-4o | Claude-3.5 Sonnet | |:-------------------:|:-------------:|:----------------:|:----------------------:|:--------:|:----:|:-------------------:|:-------------------:| | MMMU (CoT) | 52.5 | 50.4 | 50.7 | |54.4 | 68.6 | 68.0 | | Mathvista (CoT) | 58.0 | 44.8 | 56.9 | |57.2 | 64.6 | 64.4 | | ChartQA (CoT) | 81.8 | 69.6 | 78.0 | |66.9 |85.1 | 87.6 | | DocVQA (ANLS) | 90.7 | 74.6 | 79.5 | | 91.6 |88.9 | 90.3 | | VQAv2 (VQA Match) | 78.6 | 68.4 | 65.5 | | 83.8 |77.8 | 70.7 | We recommend using Pixtral with the vLLM library to implement production-ready inference pipelines with Pixtral. Also make sure you have `mistralcommon >= 1.4.4` installed: You can also make use of a ready-to-go docker image. You can also pass multiple images per message and/or pass multi-turn conversations You can find more examples and tests directly in vLLM. - Examples - Tests You can also use pixtral in a server/client setting. We recommend using mistral-inference to quickly try out / "vibe-check" Pixtral. Make sure to have `mistralinference >= 1.4.1` installed. After installing `mistralinference`, a `mistral-chat` CLI command should be available in your environment. You can pass text and images or image urls to the model in instruction-following mode as follows: You can also run the model in a Python shell as follows. The Pixtral model does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

license:apache-2.0

Mixtral-8x22B-v0.1

Model Card for Mixtral-8x22B The Mixtral-8x22B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. For full details of this model please read our release blog post. Warning This repo contains weights that are compatible with vLLM serving of the model as well as Hugging Face transformers library. It is based on the original Mixtral torrent release, but the file format and parameter names are different. By default, transformers will load the model in full precision. Therefore you might be interested to further reduce down the memory requirements to run the model through the optimizations we offer in HF ecosystem: Notice Mixtral-8x22B is a pretrained base model and therefore does not have any moderation mechanisms. The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall

license:apache-2.0

Ministral-3-3B-Reasoning-2512-GGUF

license:apache-2.0

Devstral-Small-2507_gguf

> [!Note] > At Mistral, we don't yet have too much experience with providing GGUF-quantized checkpoints > to the community, but want to help improving the ecosystem going forward. > If you encounter any problems with the provided checkpoints here, please open a discussion or pull request Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this benchmark. This is the GGUF version of the Devstral-Small-2507 model. We released the BF16 weights as well as the following quantized format: - Q80 - Q5KM - Q4KM It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed. For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. Updates compared to `Devstral Small 1.0`: - The performance has been increased, please refer to the benchmark results. - `Devstral Small 1.1` is still great when paired with OpenHands. This new version also generalizes better to other prompts and coding environments. - Supports Mistral's function calling format. Key Features: - Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. - lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. - Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size. We recommend to use Devstral with the OpenHands scaffold as explained here. To use it local with a GGUF-quantized checkpoint, see the following section. Download the weights from huggingface and then run Devstral using the llama.cpp CLI or llama.cpp server: You can serve the model locally with LM Studio. Download LM Studio and install it Install `lms cli ~/.lmstudio/bin/lms bootstrap` In a bash terminal, run `lms import Devstral-Small-2507-Q4KM.gguf` in the directory where you've downloaded the model checkpoint (e.g. `mistralai/Devstral-Small-2507gguf`) Open the LM Studio application, click the terminal icon to get into the developer tab. Click select a model to load and select `Devstral Small 2507`. Toggle the status button to start the model, in setting oggle Serve on Local Network to be on. On the right tab, you will see an API identifier which should be `devstral-small-2507` and an api address under API Usage. Keep note of this address, we will use it in the next step. You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker The server will start at http://0.0.0.0:3000. Follow instruction here to configure the LLM calls from OpenHands.

Mistral-Large-3-675B-Instruct-2512-NVFP4

license:apache-2.0

Ministral-3-8B-Reasoning-2512-GGUF

license:apache-2.0

Devstral-Small-2505_gguf

license:apache-2.0

Ministral-3-3B-Instruct-2512-ONNX

license:apache-2.0

Mistral-Large-3-675B-Instruct-2512

license:apache-2.0

Magistral-Small-2506_gguf

> [!Note] > At Mistral, we don't yet have too much experience with providing GGUF-quantized checkpoints > to the community, but want to help improving the ecosystem going forward. > If you encounter any problems with the provided checkpoints here, please open a discussion or pull request Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters. Magistral-Small-2506gguf is a GGUF (quantized) version of Magistral-Small-2506. Key Features: - Reasoning: Capable of long chains of reasoning traces before providing an answer. - lightweight: with its compact size of just 24 billion parameters, Magistral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM once quantized, making it an appropriate model for local deployment and on-device use. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: We recommend setting the context window to 40k. We converted the original weights to gguf with this requirement. Theoretically, an even larger 128k context window is supported but untested. - Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size. Please see the original model card for benchmarks results. We recommend to use Magistral-Small-2506gguf with llama.ccp. Follow the install or build instructions to get started. `Magistral-Small-2506gguf` does not support function calling. In the rest of this usage guide, we assume you have the `llama-cli` and `llama-server` binaries available. Download the weights from the huggingface hub using the `huggingface-cli`: You can interact with the model using the `llama-cli`'s llama.ccp tool. Make sure to add `--jinja` to use our tokenizer. It uses the default system prompt for Magistral. The default system prompt is in English, but you can customize it if you want by passing the `-sys "yoursystemprompt"` argument to `llama-cli`. By default, the context size of llama.ccp is 4096, but you can increase it to 40,960. We also recommend to set the temperature to `0.7` and the topp to `0.95`. Now you can pass to the model the prompt examples or your own prompts ! You can also use the `llama-server` to run the model as a server. Make sure to add `--jinja` to use our tokenizer and increase the context size to 40,960. We also recommend to set the temperature to `0.7` and the topp to `0.95`. Now you can interact with the model directly where it is served in the browser, via curl or by using the OpenAI client. Here is a list of questions to help you test the model. 2. `John is one of 4 children. The first sister is 4 years old. Next year, the second sister will be twice as old as the first sister. The third sister is two years older than the second sister. The third sister is half the age of her older brother. How old is John?`

Magistral-Small-2507-GGUF

Mistral-Small-4-119B-2603-eagle

license:apache-2.0

Mistral-Small-4-119B-2603-NVFP4

license:apache-2.0

Mistral Nemo Instruct FP8 2407

The Mistral-Nemo-Instruct-FP8-2407 Large Language Model (LLM) is a quantized instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size. For more details about this model please refer to our release blog post. Key features - Released under the Apache 2 License - Pre-trained and instructed versions - Trained with a 128k context window - Trained on a large proportion of multilingual and code data - Drop-in replacement of Mistral 7B Model Architecture Mistral Nemo is a transformer model, with the following architecture choices: - Layers: 40 - Dim: 5,120 - Head dim: 128 - Hidden dim: 14,336 - Activation Function: SwiGLU - Number of heads: 32 - Number of kv-heads: 8 (GQA) - Vocabulary size: 217 ~= 128k - Rotary embeddings (theta = 1M) | Benchmark | Score | | --- | --- | | HellaSwag (0-shot) | 83.5% | | Winogrande (0-shot) | 76.8% | | OpenBookQA (0-shot) | 60.6% | | CommonSenseQA (0-shot) | 70.4% | | TruthfulQA (0-shot) | 50.3% | | MMLU (5-shot) | 68.0% | | TriviaQA (5-shot) | 73.8% | | NaturalQuestions (5-shot) | 31.2% | | Language | Score | | --- | --- | | French | 62.3% | | German | 62.7% | | Spanish | 64.6% | | Italian | 61.3% | | Portuguese | 63.3% | | Russian | 59.2% | | Chinese | 59.0% | | Japanese | 59.0% | You can use Mistral-Nemo-Instruct-FP8-2407 in a server/client settings. 2. To ping the client, you can use a simple Python snippet: The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

license:apache-2.0

Mistral-Large-3-675B-Instruct-2512-BF16

license:apache-2.0

Pixtral-Large-Instruct-2411

Mistral-Large-3-675B-Instruct-2512-Eagle

license:apache-2.0

Pixtral-12B-Base-2409

license:apache-2.0

Mistral-Large-3-675B-Base-2512

license:apache-2.0

Leanstral-2603

license:apache-2.0