mistralai

✓ VerifiedAI Startup

European AI champion, open and commercial models

71 models • 24 total models in database
Sort by:

Mistral-7B-Instruct-v0.2

--- library_name: transformers license: apache-2.0 tags: - finetuned - mistral-common new_version: mistralai/Mistral-7B-Instruct-v0.3 inference: false widget: - messages: - role: user content: What is your favorite condiment? extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. ---

3,313,585
3,014

Mistral-7B-Instruct-v0.3

--- library_name: vllm license: apache-2.0 base_model: mistralai/Mistral-7B-v0.3 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm - mistral-common ---

NaNK
license:apache-2.0
1,087,926
2,282

Voxtral-Mini-4B-Realtime-2602

NaNK
license:apache-2.0
580,273
695

Mistral-7B-Instruct-v0.1

--- library_name: transformers license: apache-2.0 tags: - finetuned - mistral-common base_model: mistralai/Mistral-7B-v0.1 inference: false widget: - messages: - role: user content: What is your favorite condiment? extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. ---

NaNK
license:apache-2.0
560,072
1,807

Mixtral-8x7B-Instruct-v0.1

--- library_name: vllm language: - fr - it - de - es - en license: apache-2.0 base_model: mistralai/Mixtral-8x7B-v0.1 inference: false widget: - messages: - role: user content: What is your favorite condiment? extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm ---

NaNK
license:apache-2.0
554,027
4,594

Mistral-7B-v0.1

--- library_name: transformers language: - en license: apache-2.0 tags: - pretrained - mistral-common inference: false extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. ---

NaNK
license:apache-2.0
553,286
4,002

Voxtral-Mini-3B-2507

--- library_name: mistral-common language: - en - fr - de - es - it - pt - nl - hi license: apache-2.0 inference: false extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm ---

NaNK
license:apache-2.0
449,097
585

Devstral-Small-2-24B-Instruct-2512

NaNK
license:apache-2.0
416,361
537

Ministral-3-3B-Instruct-2512

NaNK
license:apache-2.0
263,047
174

Ministral-3-14B-Reasoning-2512

NaNK
license:apache-2.0
258,806
109

Mathstral-7B-v0.1

--- library_name: mistral-common license: apache-2.0 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm ---

NaNK
license:apache-2.0
236,871
236

Mistral-Small-3.2-24B-Instruct-2506

NaNK
license:apache-2.0
190,812
531

Mistral-Small-24B-Instruct-2501

--- library_name: vllm language: - en - fr - de - es - it - pt - zh - ja - ru - ko license: apache-2.0 inference: false base_model: - mistralai/Mistral-Small-24B-Base-2501 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - vllm ---

NaNK
license:apache-2.0
182,310
946

Ministral-8B-Instruct-2410

--- library_name: vllm language: - en - fr - de - es - it - pt - zh - ja - ru - ko license: other license_name: mrl inference: false license_link: https://mistral.ai/licenses/MRL-0.1.md extra_gated_prompt: >-

NaNK
153,327
551

Mistral-Nemo-Instruct-2407

--- library_name: vllm language: - en - fr - de - es - it - pt - ru - zh - ja license: apache-2.0 base_model: mistralai/Mistral-Nemo-Base-2407 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - mistral-common ---

license:apache-2.0
132,612
1,618

Ministral-3-14B-Instruct-2512

NaNK
license:apache-2.0
91,668
225

Mistral-Small-3.1-24B-Instruct-2503

--- library_name: vllm language: - en - fr - de - es - pt - it - ja - ko - ru - zh - ar - fa - id - ms - ne - pl - ro - sr - sv - tr - uk - vi - hi - bn license: apache-2.0 inference: false base_model: - mistralai/Mistral-Small-3.1-24B-Base-2503 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - mistral-common ---

NaNK
license:apache-2.0
83,276
1,333

Mixtral-8x7B-v0.1

--- library_name: vllm license: apache-2.0 language: - fr - it - de - es - en tags: - moe - mistral-common extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. ---

NaNK
license:apache-2.0
80,709
1,768

Ministral-3-8B-Reasoning-2512

NaNK
license:apache-2.0
76,969
62

Mistral-7B-v0.3

--- library_name: vllm license: apache-2.0 extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. tags: - mistral-common ---

NaNK
license:apache-2.0
50,136
544

Devstral-Small-2507

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this benchmark. It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed. For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. Updates compared to `Devstral Small 1.0`: - Improved performance, please refer to the benchmark results. - `Devstral Small 1.1` is still great when paired with OpenHands. This new version also generalizes better to other prompts and coding environments. - Supports Mistral's function calling format. Key Features: - Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. - lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. - Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size. Devstral Small 1.1 achieves a score of 53.6% on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%. | Model | Agentic Scaffold | SWE-Bench Verified (%) | |--------------------|--------------------|------------------------| | Devstral Small 1.1 | OpenHands Scaffold | 53.6 | | Devstral Small 1.0 | OpenHands Scaffold | 46.8 | | GPT-4.1-mini | OpenAI Scaffold | 23.6 | | Claude 3.5 Haiku | Anthropic Scaffold | 40.6 | | SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 | | Skywork SWE | OpenHands Scaffold | 38.0 | | DeepSWE | R2E-Gym Scaffold | 42.2 | When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B. We recommend to use Devstral with the OpenHands scaffold. You can use it either through our API or by running locally. API Follow these instructions to create a Mistral account and get an API key. Then run these commands to start the OpenHands docker container. The model can also be deployed with the following libraries: - `vllm (recommended)`: See here - `mistral-inference`: See here - `transformers`: See here - `LMStudio`: See here - `llama.cpp`: See here - `ollama`: See here Expand = 0.9.1`](https://github.com/vllm-project/vllm/releases/tag/v0.9.1): Also make sure to have installed `mistralcommon >= 1.7.0`. You can also make use of a ready-to-go docker image or on the docker hub. We recommand that you use Devstral in a server/client setting. 2. To ping the client you can use a simple Python snippet. Then load our tokenizer along with the model and generate: Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.1`. In the case of the tutorial we spineed up a vLLM server running the command: The server address should be in the following format: `http:// :8000/v1` The easiest way to launch OpenHands is to use the Docker image: Then, you can access the OpenHands UI at `http://localhost:3000`. When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier. Fill the following fields: - Custom Model: `openai/mistralai/Devstral-Small-2507` - Base URL: `http:// :8000/v1` - API Key: `token` (or any other token you used to launch the server if any) Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.1`. In the case of the tutorial we spineed up a vLLM server running the command: The server address should be in the following format: `http:// :8000/v1` You can follow installation of Cline here. Then you can configure the server address in the settings. OpenHands:Understanding Test Coverage of Mistral Common We can start the OpenHands scaffold and link it to a repo to analyze test coverage and identify badly covered files. Here we start with our public `mistral-common` repo. After the repo is mounted in the workspace, we give the following instruction The agent will first browse the code base to check test configuration and structure. Then it sets up the testing dependencies and launches the coverage test: Finally, the agent writes necessary code to visualize the coverage, export the results and save the plots to a png. At the end of the run, the following plots are produced: First initialize Cline inside VSCode and connect it to the server you launched earlier. We give the following instruction to builde the video game: Don't hesitate to iterate or give more information to Devstral to improve the game!

NaNK
license:apache-2.0
39,465
351

Mistral-Small-4-119B-2603

NaNK
license:apache-2.0
38,985
326

Ministral-3-8B-Instruct-2512

NaNK
license:apache-2.0
28,940
123

Magistral-Small-2506

license:apache-2.0
27,402
603

Ministral-3-14B-Base-2512

NaNK
license:apache-2.0
23,993
48

Ministral-3-3B-Reasoning-2512

NaNK
license:apache-2.0
22,168
86

Mistral-Nemo-Base-2407

The Mistral-Nemo-Base-2407 Large Language Model (LLM) is a pretrained generative text model of 12B parameters trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size. For more details about this model please refer to our release blog post. Key features - Released under the Apache 2 License - Pre-trained and instructed versions - Trained with a 128k context window - Trained on a large proportion of multilingual and code data - Drop-in replacement of Mistral 7B Model Architecture Mistral Nemo is a transformer model, with the following architecture choices: - Layers: 40 - Dim: 5,120 - Head dim: 128 - Hidden dim: 14,436 - Activation Function: SwiGLU - Number of heads: 32 - Number of kv-heads: 8 (GQA) - Vocabulary size: 217 ~= 128k - Rotary embeddings (theta = 1M) | Benchmark | Score | | --- | --- | | HellaSwag (0-shot) | 83.5% | | Winogrande (0-shot) | 76.8% | | OpenBookQA (0-shot) | 60.6% | | CommonSenseQA (0-shot) | 70.4% | | TruthfulQA (0-shot) | 50.3% | | MMLU (5-shot) | 68.0% | | TriviaQA (5-shot) | 73.8% | | NaturalQuestions (5-shot) | 31.2% | | Language | Score | | --- | --- | | French | 62.3% | | German | 62.7% | | Spanish | 64.6% | | Italian | 61.3% | | Portuguese | 63.3% | | Russian | 59.2% | | Chinese | 59.0% | | Japanese | 59.0% | The model can be used with three different frameworks - `mistralinference`: See here - `transformers`: See here - `NeMo`: See nvidia/Mistral-NeMo-12B-Base It is recommended to use `mistralai/Mistral-Nemo-Base-2407` with mistral-inference. For HF transformers code snippets, please keep scrolling. After installing `mistralinference`, a `mistral-demo` CLI command should be available in your environment. > [!IMPORTANT] > NOTE: Until a new release has been made, you need to install transformers from source: > If you want to use Hugging Face `transformers` to generate text, you can do something like this. > [!TIP] > Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3. `Mistral-Nemo-Base-2407` is a pretrained base model and therefore does not have any moderation mechanisms. Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

license:apache-2.0
19,957
330

Mistral-Small-24B-Base-2501

NaNK
license:apache-2.0
17,481
257

Mistral-Small-Instruct-2409

Mistral-Small-Instruct-2409 is an instruct fine-tuned version with the following characteristics: - 22B parameters - Vocabulary to 32768 - Supports function calling - 32k sequence length We recommend using this model with the vLLM library to implement production-ready inference pipelines. Also make sure you have `mistralcommon >= 1.4.1` installed: You can also make use of a ready-to-go docker image. You can also use Mistral Small in a server/client setting. Note: Running Mistral-Small on a single GPU requires at least 44 GB of GPU RAM. If you want to divide the GPU requirement over multiple devices, please add e.g. `--tensorparallel=2` We recommend using mistral-inference to quickly try out / "vibe-check" the model. Make sure to have `mistralinference >= 1.4.1` installed. After installing `mistralinference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using You can also use Hugging Face `transformers` library to run inference using various chat templates, or fine-tune the model. Example for inference: Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

17,141
393

Devstral-2-123B-Instruct-2512

NaNK
14,797
290

Ministral-3-3B-Instruct-2512-GGUF

NaNK
license:apache-2.0
14,635
26

Mixtral-8x22B-Instruct-v0.1

> [!TIP] > PRs to correct the `transformers` tokenizer so that it gives 1-to-1 the same results as the `mistralcommon` reference implementation are very welcome! --- The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1. To use this example, you'll need `transformers` version 4.42.0 or higher. Please see the function calling guide in the `transformers` docs for more information. Note that, for reasons of space, this example does not show a complete cycle of calling a tool and adding the tool call and tool results to the chat history so that the model can use them in its next generation. For a full tool calling example, please see the function calling guide, and note that Mixtral does use tool call IDs, so these must be included in your tool calls and tool results. They should be exactly 9 alphanumeric characters. Instruct tokenizer The HuggingFace tokenizer included in this release should match our own. To compare: `pip install mistral-common` Function calling and special tokens This tokenizer includes more special tokens, related to function calling : - [TOOLCALLS] - [AVAILABLETOOLS] - [/AVAILABLETOOLS] - [TOOLRESULTS] - [/TOOLRESULTS] If you want to use this model with function calling, please be sure to apply it similarly to what is done in our SentencePieceTokenizerV3. The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall

NaNK
license:apache-2.0
13,078
737

Mistral-Large-Instruct-2407

11,551
851

Codestral-22B-v0.1

NaNK
10,629
1,312

Devstral-Small-2505

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this benchmark. It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed. For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. Key Features: - Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. - lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. - Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size. Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA by 6%. | Model | Scaffold | SWE-Bench Verified (%) | |------------------|--------------------|------------------------| | Devstral | OpenHands Scaffold | 46.8 | | GPT-4.1-mini | OpenAI Scaffold | 23.6 | | Claude 3.5 Haiku | Anthropic Scaffold | 40.6 | | SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 | When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B. We recommend to use Devstral with the OpenHands scaffold. You can use it either through our API or by running locally. API Follow these instructions to create a Mistral account and get an API key. Then run these commands to start the OpenHands docker container. The model can also be deployed with the following libraries: - `vllm (recommended)`: See here - `mistral-inference`: See here - `transformers`: See here - `LMStudio`: See here - `llama.cpp`: See here - `ollama`: See here Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral Small 1.0`. In the case of the tutorial we spineed up a vLLM server running the command: The server address should be in the following format: `http:// :8000/v1` The easiest way to launch OpenHands is to use the Docker image: Then, you can access the OpenHands UI at `http://localhost:3000`. When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier. Fill the following fields: - Custom Model: `openai/mistralai/Devstral-Small-2505` - Base URL: `http:// :8000/v1` - API Key: `token` (or any other token you used to launch the server if any) Now you're good to use Devstral Small inside OpenHands by starting a new conversation. Let's build a To-Do list app. We recommend using this model with the vLLM library to implement production-ready inference pipelines. Doing so should automatically install `mistralcommon >= 1.5.5`. You can also make use of a ready-to-go docker image or on the docker hub. We recommand that you use Devstral in a server/client setting. 2. To ping the client you can use a simple Python snippet. We recommend using mistral-inference to quickly try out / "vibe-check" Devstral. Make sure to have mistralinference >= 1.6.0 installed. To make the best use of our model with transformers make sure to have installed ` mistral-common >= 1.5.5` to use our tokenizer. Then load our tokenizer along with the model and generate: You can serve the model locally with LMStudio. Download LM Studio and install it Install `lms cli ~/.lmstudio/bin/lms bootstrap` In a bash terminal, run `lms import devstralQ4KM.gguf` in the directory where you've downloaded the model checkpoint (e.g. `mistralai/Devstral-Small-2505gguf`) Open the LMStudio application, click the terminal icon to get into the developer tab. Click select a model to load and select Devstral Q4 K M. Toggle the status button to start the model, in setting toggle Serve on Local Network to be on. On the right tab, you will see an API identifier which should be devstralq4km and an api address under API Usage. Keep note of this address, we will use it in the next step. Launch Openhands You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker Click “see advanced setting” on the second line. In the new tab, toggle advanced to on. Set the custom model to be mistral/devstralq4km and Base URL the api address we get from the last step in LM Studio. Set API Key to dummy. Click save changes. Example: Understanding Test Coverage of Mistral Common We can start the OpenHands scaffold and link it to a repo to analyze test coverage and identify badly covered files. Here we start with our public `mistral-common` repo. After the repo is mounted in the workspace, we give the following instruction The agent will first browse the code base to check test configuration and structure. Then it sets up the testing dependencies and launches the coverage test: Finally, the agent writes necessary code to visualize the coverage. At the end of the run, the following plots are produced:

NaNK
license:apache-2.0
9,504
853

Mistral-Large-Instruct-2411

Mistral-Large-Instruct-2411 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities extending Mistral-Large-Instruct-2407 with better Long Context, Function Calling and System Prompt. Key features - Multi-lingual by design: Dozens of languages supported, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch and Polish. - Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, Javacsript, and Bash. Also trained on more specific languages such as Swift and Fortran. - Agent-centric: Best-in-class agentic capabilities with native function calling and JSON outputting. - Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities. - Mistral Research License: Allows usage and modification for non-commercial usages. - Large Context: A large 128k context window. - Robust Context Adherence: Ensures strong adherence for RAG and large context applications. - System Prompt: Maintains strong adherence and support for more reliable system prompts. System Prompt We appreciate the feedback received from our community regarding our system prompt handling. In response, we have implemented stronger support for system prompts. To achieve optimal results, we recommend always including a system prompt that clearly outlines the bot's purpose, even if it is minimal. Be careful with subtle missing or trailing white spaces! Please make sure to use mistral-common as the source of truth The model can be used with the following frameworks We recommend using this model with the vLLM library to implement production-ready inference pipelines. Also make sure you have `mistralcommon >= 1.5.0` installed: You can also make use of a ready-to-go docker image or on the docker hub. We recommand that you use Mistral-Large-Instruct-2411 in a server/client setting. Note: Running Mistral-Large-Instruct-2411 on GPU requires over 300 GB of GPU RAM. 2. To ping the client you can use a simple Python snippet. Mistral-Large-2411 has much improved function calling capabilities that are fully supported using `mistralcommon >= 1.5.0` and `vLLM >= v0.6.4.post1`. Make sure to serve the model with the following flags in vLLM: Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

9,164
237

Ministral-3-3B-Base-2512

NaNK
license:apache-2.0
8,841
50

Voxtral-Small-24B-2507

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Learn more about Voxtral in our blog post here and our research paper. Voxtral builds upon Mistral Small 3 with powerful audio understanding capabilities. - Dedicated transcription mode: Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly - Long-form context: With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding - Built-in Q&A and summarization: Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models - Natively multilingual: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian) - Function-calling straight from voice: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents - Highly capable at text: Retains the text understanding capabilities of its language model backbone, Mistral Small 3.1 Average word error rate (WER) over the FLEURS, Mozilla Common Voice and Multilingual LibriSpeech benchmarks: The model can be used with the following frameworks; - `vllm (recommended)`: See here - `Transformers` 🤗: See here - `temperature=0.2` and `topp=0.95` for chat completion (e.g. Audio Understanding) and `temperature=0.0` for transcription - Multiple audios per message and multiple user turns with audio are supported - Function calling is supported - System prompts are not yet supported Make sure to install vllm >= `0.10.0`, we recommend using uv Doing so should automatically install `mistralcommon >= 1.8.1`. You can test that your vLLM setup works as expected by cloning the vLLM repo: We recommend that you use Voxtral-Small-24B-2507 in a server/client setting. Note: Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM in bf16 or fp16. 2. To ping the client you can use a simple Python snippet. See the following examples. Leverage the audio capabilities of Voxtral-Small-24B-2507 to chat. Make sure that your client has `mistral-common` with audio installed: L'orateur le plus inspirant est le président. Il est plus inspirant parce qu'il parle de ses expériences personnelles et de son optimisme pour l'avenir du pays. Il est différent de l'autre orateur car il ne parle pas de la météo, mais plutôt de ses interactions avec les gens et de son rôle en tant que président. Voxtral-Small-24B-2507 has powerful transcription capabilities! Make sure that your client has `mistral-common` with audio installed: Voxtral has some experimental function calling support. You can try as shown below. Make sure that your client has `mistral-common` with audio installed: Starting with `transformers >= 4.54.0` and above, you can run Voxtral natively! Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:

NaNK
license:apache-2.0
8,377
440

Mamba-Codestral-7B-v0.1

NaNK
license:apache-2.0
8,255
607

Ministral-3-3B-Instruct-2512-BF16

NaNK
license:apache-2.0
7,994
13

Magistral-Small-2509

Building upon Mistral Small 3.2 (2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters. Magistral Small can be deployed locally, fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized. - Multimodality: The model now has a vision encoder and can take multimodal inputs, extending its reasoning capabilities to vision. - Performance upgrade: Magistral Small 1.2 should give you significantly better performance than Magistral Small 1.1 as seen in the benchmark results. - Better tone and persona: You should experience better LaTeX and Markdown formatting, and shorter answers on easy general prompts. - Finite generation: The model is less likely to enter infinite generation loops. - Special think tokens: [THINK] and [/THINK] special tokens encapsulate the reasoning content in a thinking chunk. This makes it easier to parse the reasoning trace and prevents confusion when the '[THINK]' token is given as a string in the prompt. - Reasoning prompt: The reasoning prompt is given in the system prompt. - Reasoning: Capable of long chains of reasoning traces before providing an answer. - Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, and Farsi. - Vision: Vision capabilities enable the model to analyze images and reason based on visual content in addition to text. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. Performance might degrade past 40k but Magistral should still give good results. Hence we recommend to leave the maximum model length to 128k and only lower if you encounter low performance. | Model | AIME24 pass@1 | AIME25 pass@1 | GPQA Diamond | Livecodebench (v5) | |--------------------------|---------------|---------------|--------------|--------------------| | Magistral Medium 1.2 | 91.82% | 83.48% | 76.26% | 75.00% | | Magistral Medium 1.1 | 72.03% | 60.99% | 71.46% | 59.35% | | Magistral Medium 1.0 | 73.59% | 64.95% | 70.83% | 59.36% | | Magistral Small 1.2 | 86.14% | 77.34% | 70.07% | 70.88% | | Magistral Small 1.1 | 70.52% | 62.03% | 65.78% | 59.17% | | Magistral Small 1.0 | 70.68% | 62.76% | 68.18% | 55.84% | Please make sure to use: - `topp`: 0.95 - `temperature`: 0.7 - `maxtokens`: 131072 We highly recommend including the following system prompt for the best results, you can edit and customise it if needed for your specific use case. The `[THINK]` and `[/THINK]` are special tokens that must be encoded as such. Please make sure to use mistral-common as the source of truth. Find below examples from libraries supporting `mistral-common`. We invite you to choose, depending on your use case and requirements, between keeping reasoning traces during multi-turn interactions or keeping only the final assistant response. The model can be used with the following frameworks. - `vllm (recommended)`: See below - `transformers`: See below - `llama.cpp`: See https://huggingface.co/mistralai/Magistral-Small-2509-GGUF - `Unsloth GGUFs`: See https://huggingface.co/unsloth/Magistral-Small-2509-GGUF - `Kaggle`: See https://www.kaggle.com/models/mistral-ai/magistral-small-2509 - `LM Studio`: See https://lmstudio.ai/models/mistralai/magistral-small-2509 - `Axolotl`: See https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/magistral - `Unsloth`: See https://docs.unsloth.ai/models/tutorials-how-to-fine-tune-and-run-llms/magistral-how-to-run-and-fine-tune We recommend using this model with the vLLM library to implement production-ready inference pipelines. Doing so should automatically install `mistralcommon >= 1.8.5`. You can also make use of a ready-to-go docker image or on the docker hub. Make sure you install the latest `Transformers` version:

license:apache-2.0
7,950
284

Ministral-3-14B-Instruct-2512-GGUF

NaNK
license:apache-2.0
7,796
34

Magistral-Small-2509-GGUF

> [!Note] > To make our models more accesible to everyone, this repo provides a basic GGUF checkpoint compatible with llama.cpp > and mistral-common. > > In addition to using this GGUF checkpoint, we encourage the community to use other GGUF variants, e.g. > from Unsloth, LM Studio, ... > > If you encounter any problems with the provided checkpoints here, please open a discussion or pull request. Building upon Mistral Small 3.2 (2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters. Magistral Small can be deployed locally, fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized. This is the GGUF version of the Magistral-Small-2509 model. We released the BF16 weights as well as the following quantized format: - Q80 - Q5KM - Q4KM We do not release alongside our GGUF files: - An official chat template. Instead, we recommend using `mistral-common`, which serves as our source of truth for tokenization and detokenization. Llama.cpp automatically loads a chat template, but it is in most likelihood incorrect for Magistral. - The vision encoder, since our recommended usage does not involve multimodality. - Multimodality: The model now has a vision encoder and can take multimodal inputs, extending its reasoning capabilities to vision. - Performance upgrade: Magistral Small 1.2 should give you significatively better performance than Magistral Small 1.1 as seen in the benchmark results. - Better tone and persona: You should experiment better LaTeX and Markdown formatting, and shorter answers on easy general prompts. - Finite generation: The model is less likely to enter infinite generation loops. - Special think tokens: [THINK] and [/THINK] special tokens encapsulate the reasoning content in a thinking chunk. This makes it easier to parse the reasoning trace and prevents confusion when the '[THINK]' token is given as a string in the prompt. - Reasoning prompt: The reasoning prompt is given in the system prompt. Key Features - Reasoning: Capable of long chains of reasoning traces before providing an answer. - Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, and Farsi. - Vision: Vision capabilities enable the model to analyze images and reason based on visual content in addition to text available with our main model Magistral-Small-2509. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. Performance might degrade past 40k but Magistral should still give good results. Hence we recommend to leave the maximum model length to 128k and only lower if you encounter low performance. We recommend to use Magistral Small 1.2 GGUF with llama.cpp along with mistral-common >= 1.8.5 server. See here for the documentation of `mistral-common` server. > [!Note] > We do not believe we can guarantee correct behavior using the integrated, stringified chat template hence > mistral-common should be used as a reference. However, we strongly encourage the community members to use this GGUF checkpoint > and mistralcommon as a reference implementation to build a correct integrated, stringified chat template. 2. Install `mistral-common >= 1.8.5` with its dependencies. 2. Launch the `mistral-common` server and pass the url of the `llama.cpp` server. This is the server that will handle tokenization and detokenization and call the `llama.cpp` server for generations. generate: call `mistral-common` that will tokenizer, call the `llama.cpp` server to generate new tokens and detokenize the output to an `AssistantMessage` with think chunk and tool calls parsed. 2. Tokenize the input, call the model and detokenize

llama.cpp
7,778
58

Ministral-3-8B-Instruct-2512-GGUF

NaNK
license:apache-2.0
7,085
15

Ministral-3-8B-Instruct-2512-BF16

NaNK
license:apache-2.0
5,653
8

Magistral-Small-2507

license:apache-2.0
5,073
99

Ministral-3-8B-Base-2512

NaNK
license:apache-2.0
4,675
35

Mistral-Small-3.1-24B-Base-2503

NaNK
license:apache-2.0
4,285
254

Ministral-3-14B-Reasoning-2512-GGUF

NaNK
license:apache-2.0
4,257
20

Ministral-3-14B-Instruct-2512-BF16

NaNK
license:apache-2.0
3,940
10

Voxtral-4B-TTS-2603

NaNK
license:cc-by-nc-4.0
3,721
548

Pixtral-12B-2409

The Pixtral-12B-2409 is a Multimodal Model of 12B parameters plus a 400M parameter vision encoder. For more details about this model please refer to our release blog post. Key features - Natively multimodal, trained with interleaved image and text data - 12B parameter Multimodal Decoder + 400M parameter Vision Encoder - Supports variable image sizes - Leading performance in its weight class on multimodal tasks - Maintains state-of-the-art performance on text-only benchmarks - Sequence length: 128k - License: Apache 2.0 Benchmarks The performance of Pixtral-12B-2409 compared to multimodal models. All models were re-evaluated and benchmarked through the same evaluation pipeline. | | Pixtral 12B | Qwen2 7B VL | LLaVA-OV 7B | Phi-3 Vision | Phi-3.5 Vision | |:-------------------:|:-------------:|:----------:|:-------------:|:--------------:|:--------------:| | MMMU (CoT) | 52.5 | 47.6 | 45.1 | 40.3 | 38.3 | | Mathvista (CoT) | 58.0 | 54.4 | 36.1 | 36.4 | 39.3 | | ChartQA (CoT) | 81.8 | 38.6 | 67.1 | 72.0 | 67.7 | | DocVQA (ANLS) | 90.7 | 94.5 | 90.5 | 84.9 | 74.4 | | VQAv2 (VQA Match) | 78.6 | 75.9 | 78.3 | 42.4 | 56.1 | | | Pixtral 12B | Qwen2 7B VL | LLaVA-OV 7B | Phi-3 Vision | Phi-3.5 Vision | |:-------------------:|:-------------:|:----------:|:-------------:|:--------------:|:--------------:| | MM MT-Bench | 6.05 | 5.43 | 4.12 | 3.70 |4.46 | | Text MT-Bench | 7.68 | 6.41 | 6.94 | 6.27 |6.31 | | MM IF-Eval | 52.7 | 38.9 | 42.5 | 41.2 |31.4 | | Text IF-Eval | 61.3 | 50.1 | 51.4 | 50.9 |47.4 | | | Pixtral 12B | Qwen2 7B VL | LLaVA-OV 7B | Phi-3 Vision | Phi-3.5 Vision | |:-------------------:|:-------------:|:----------:|:-------------:|:--------------:|:--------------:| | MMLU (5-shot) | 69.2 | 68.5 | 67.9 | 63.5 | 63.6 | | Math (Pass@1) | 48.1 | 27.8 | 38.6 | 29.2 | 28.4 | | Human Eval (Pass@1) | 72.0 | 64.6 | 65.9 | 48.8 | 49.4 | Comparison with Closed Source and Larger Models | | Pixtral 12B | Claude-3 Haiku | Gemini-1.5 Flash 8B (0827) | . |LLaVA-OV 72B | GPT-4o | Claude-3.5 Sonnet | |:-------------------:|:-------------:|:----------------:|:----------------------:|:--------:|:----:|:-------------------:|:-------------------:| | MMMU (CoT) | 52.5 | 50.4 | 50.7 | |54.4 | 68.6 | 68.0 | | Mathvista (CoT) | 58.0 | 44.8 | 56.9 | |57.2 | 64.6 | 64.4 | | ChartQA (CoT) | 81.8 | 69.6 | 78.0 | |66.9 |85.1 | 87.6 | | DocVQA (ANLS) | 90.7 | 74.6 | 79.5 | | 91.6 |88.9 | 90.3 | | VQAv2 (VQA Match) | 78.6 | 68.4 | 65.5 | | 83.8 |77.8 | 70.7 | We recommend using Pixtral with the vLLM library to implement production-ready inference pipelines with Pixtral. Also make sure you have `mistralcommon >= 1.4.4` installed: You can also make use of a ready-to-go docker image. You can also pass multiple images per message and/or pass multi-turn conversations You can find more examples and tests directly in vLLM. - Examples - Tests You can also use pixtral in a server/client setting. We recommend using mistral-inference to quickly try out / "vibe-check" Pixtral. Make sure to have `mistralinference >= 1.4.1` installed. After installing `mistralinference`, a `mistral-chat` CLI command should be available in your environment. You can pass text and images or image urls to the model in instruction-following mode as follows: You can also run the model in a Python shell as follows. The Pixtral model does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

NaNK
license:apache-2.0
3,685
670

Mixtral-8x22B-v0.1

Model Card for Mixtral-8x22B The Mixtral-8x22B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. For full details of this model please read our release blog post. Warning This repo contains weights that are compatible with vLLM serving of the model as well as Hugging Face transformers library. It is based on the original Mixtral torrent release, but the file format and parameter names are different. By default, transformers will load the model in full precision. Therefore you might be interested to further reduce down the memory requirements to run the model through the optimizations we offer in HF ecosystem: Notice Mixtral-8x22B is a pretrained base model and therefore does not have any moderation mechanisms. The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall

NaNK
license:apache-2.0
3,646
231

Ministral-3-3B-Reasoning-2512-GGUF

NaNK
license:apache-2.0
3,414
15

Devstral-Small-2507_gguf

> [!Note] > At Mistral, we don't yet have too much experience with providing GGUF-quantized checkpoints > to the community, but want to help improving the ecosystem going forward. > If you encounter any problems with the provided checkpoints here, please open a discussion or pull request Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this benchmark. This is the GGUF version of the Devstral-Small-2507 model. We released the BF16 weights as well as the following quantized format: - Q80 - Q5KM - Q4KM It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed. For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community. Updates compared to `Devstral Small 1.0`: - The performance has been increased, please refer to the benchmark results. - `Devstral Small 1.1` is still great when paired with OpenHands. This new version also generalizes better to other prompts and coding environments. - Supports Mistral's function calling format. Key Features: - Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. - lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: A 128k context window. - Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size. We recommend to use Devstral with the OpenHands scaffold as explained here. To use it local with a GGUF-quantized checkpoint, see the following section. Download the weights from huggingface and then run Devstral using the llama.cpp CLI or llama.cpp server: You can serve the model locally with LM Studio. Download LM Studio and install it Install `lms cli ~/.lmstudio/bin/lms bootstrap` In a bash terminal, run `lms import Devstral-Small-2507-Q4KM.gguf` in the directory where you've downloaded the model checkpoint (e.g. `mistralai/Devstral-Small-2507gguf`) Open the LM Studio application, click the terminal icon to get into the developer tab. Click select a model to load and select `Devstral Small 2507`. Toggle the status button to start the model, in setting oggle Serve on Local Network to be on. On the right tab, you will see an API identifier which should be `devstral-small-2507` and an api address under API Usage. Keep note of this address, we will use it in the next step. You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker The server will start at http://0.0.0.0:3000. Follow instruction here to configure the LLM calls from OpenHands.

llama.cpp
2,973
42

Mistral-Large-3-675B-Instruct-2512-NVFP4

NaNK
license:apache-2.0
2,509
46

Ministral-3-8B-Reasoning-2512-GGUF

NaNK
license:apache-2.0
2,348
8

Devstral-Small-2505_gguf

NaNK
license:apache-2.0
903
75

Ministral-3-3B-Instruct-2512-ONNX

NaNK
license:apache-2.0
887
23

Mistral-Large-3-675B-Instruct-2512

NaNK
license:apache-2.0
829
190

Magistral-Small-2506_gguf

> [!Note] > At Mistral, we don't yet have too much experience with providing GGUF-quantized checkpoints > to the community, but want to help improving the ecosystem going forward. > If you encounter any problems with the provided checkpoints here, please open a discussion or pull request Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters. Magistral-Small-2506gguf is a GGUF (quantized) version of Magistral-Small-2506. Key Features: - Reasoning: Capable of long chains of reasoning traces before providing an answer. - lightweight: with its compact size of just 24 billion parameters, Magistral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM once quantized, making it an appropriate model for local deployment and on-device use. - Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. - Context Window: We recommend setting the context window to 40k. We converted the original weights to gguf with this requirement. Theoretically, an even larger 128k context window is supported but untested. - Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size. Please see the original model card for benchmarks results. We recommend to use Magistral-Small-2506gguf with llama.ccp. Follow the install or build instructions to get started. `Magistral-Small-2506gguf` does not support function calling. In the rest of this usage guide, we assume you have the `llama-cli` and `llama-server` binaries available. Download the weights from the huggingface hub using the `huggingface-cli`: You can interact with the model using the `llama-cli`'s llama.ccp tool. Make sure to add `--jinja` to use our tokenizer. It uses the default system prompt for Magistral. The default system prompt is in English, but you can customize it if you want by passing the `-sys "yoursystemprompt"` argument to `llama-cli`. By default, the context size of llama.ccp is 4096, but you can increase it to 40,960. We also recommend to set the temperature to `0.7` and the topp to `0.95`. Now you can pass to the model the prompt examples or your own prompts ! You can also use the `llama-server` to run the model as a server. Make sure to add `--jinja` to use our tokenizer and increase the context size to 40,960. We also recommend to set the temperature to `0.7` and the topp to `0.95`. Now you can interact with the model directly where it is served in the browser, via curl or by using the OpenAI client. Here is a list of questions to help you test the model. 2. `John is one of 4 children. The first sister is 4 years old. Next year, the second sister will be twice as old as the first sister. The third sister is two years older than the second sister. The third sister is half the age of her older brother. How old is John?`

NaNK
llama.ccp
415
69

Magistral-Small-2507-GGUF

NaNK
llama.cpp
395
10

Mistral-Small-4-119B-2603-eagle

NaNK
license:apache-2.0
365
40

Mistral-Small-4-119B-2603-NVFP4

NaNK
license:apache-2.0
168
42

Mistral Nemo Instruct FP8 2407

The Mistral-Nemo-Instruct-FP8-2407 Large Language Model (LLM) is a quantized instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size. For more details about this model please refer to our release blog post. Key features - Released under the Apache 2 License - Pre-trained and instructed versions - Trained with a 128k context window - Trained on a large proportion of multilingual and code data - Drop-in replacement of Mistral 7B Model Architecture Mistral Nemo is a transformer model, with the following architecture choices: - Layers: 40 - Dim: 5,120 - Head dim: 128 - Hidden dim: 14,336 - Activation Function: SwiGLU - Number of heads: 32 - Number of kv-heads: 8 (GQA) - Vocabulary size: 217 ~= 128k - Rotary embeddings (theta = 1M) | Benchmark | Score | | --- | --- | | HellaSwag (0-shot) | 83.5% | | Winogrande (0-shot) | 76.8% | | OpenBookQA (0-shot) | 60.6% | | CommonSenseQA (0-shot) | 70.4% | | TruthfulQA (0-shot) | 50.3% | | MMLU (5-shot) | 68.0% | | TriviaQA (5-shot) | 73.8% | | NaturalQuestions (5-shot) | 31.2% | | Language | Score | | --- | --- | | French | 62.3% | | German | 62.7% | | Spanish | 64.6% | | Italian | 61.3% | | Portuguese | 63.3% | | Russian | 59.2% | | Chinese | 59.0% | | Japanese | 59.0% | You can use Mistral-Nemo-Instruct-FP8-2407 in a server/client settings. 2. To ping the client, you can use a simple Python snippet: The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

NaNK
license:apache-2.0
146
22

Mistral-Large-3-675B-Instruct-2512-BF16

NaNK
license:apache-2.0
80
10

Pixtral-Large-Instruct-2411

65
424

Mistral-Large-3-675B-Instruct-2512-Eagle

NaNK
license:apache-2.0
39
22

Pixtral-12B-Base-2409

NaNK
license:apache-2.0
38
104

Mistral-Large-3-675B-Base-2512

NaNK
license:apache-2.0
13
26

Leanstral-2603

license:apache-2.0
4
16