cyankiwi

125 models • 1 total models in database

Sort by:

Tongyi-DeepResearch-30B-A3B-AWQ-4bit

We present Tongyi DeepResearch, an agentic large language model featuring 30 billion total parameters, with only 3 billion activated per token. Developed by Tongyi Lab, the model is specifically designed for long-horizon, deep information-seeking tasks. Tongyi-DeepResearch demonstrates state-of-the-art performance across a range of agentic search benchmarks, including Humanity's Last Exam, BrowserComp, BrowserComp-ZH, WebWalkerQA, GAIA, xbench-DeepSearch and FRAMES. - ⚙️ Fully automated synthetic data generation pipeline: We design a highly scalable data synthesis pipeline, which is fully automatic and empowers agentic pre-training, supervised fine-tuning, and reinforcement learning. - 🔄 Large-scale continual pre-training on agentic data: Leveraging diverse, high-quality agentic interaction data to extend model capabilities, maintain freshness, and strengthen reasoning performance. - 🔁 End-to-end reinforcement learning: We employ a strictly on-policy RL approach based on a customized Group Relative Policy Optimization framework, with token-level policy gradients, leave-one-out advantage estimation, and selective filtering of negative samples to stabilize training in a non‑stationary environment. - 🤖 Agent Inference Paradigm Compatibility: At inference, Tongyi-DeepResearch is compatible with two inference paradigms: ReAct, for rigorously evaluating the model's core intrinsic abilities, and an IterResearch-based 'Heavy' mode, which uses a test-time scaling strategy to unlock the model's maximum performance ceiling. You can download the model then run the inference scipts in https://github.com/Alibaba-NLP/DeepResearch.

NaNK

license:apache-2.0

520

Kimi-Linear-48B-A3B-Instruct-AWQ-4bit

NaNK

license:mit

508

Qwen3.5-122B-A10B-AWQ-8bit

NaNK

license:apache-2.0

494

Qwen3-Coder-30B-A3B-Instruct-AWQ-8bit

NaNK

license:apache-2.0

439

Qwen3.5-397B-A17B-AWQ-4bit

NaNK

license:apache-2.0

392

Qwen3.5-2B-AWQ-BF16-INT4

NaNK

license:apache-2.0

375

Qwen3-Next-80B-A3B-Thinking-AWQ-8bit

NaNK

license:apache-2.0

348

GLM-4.5-Air-Derestricted-AWQ-4bit

NaNK

unlimited

275

Kimi-Linear-48B-A3B-Instruct-AWQ-8bit

- Quantization Method: cyankiwi AWQ v1.0 - Bits: 8 - Group Size: 32 - Calibration Dataset: nvidia/Llama-Nemotron-Post-Training-Dataset - Quantization Tool: llm-compressor | Type | Kimi-Linear-48B-A3B-Instruct | Kimi-Linear-48B-A3B-Instruct-AWQ-8bit | |:---------------:|:----------------:|:----------------:| | Memory Size | 91.5 GB | 50.4 GB | | KV Cache per Token | 243.0 kB | 121.5 kB | | KV Cache per Context | 243.0 GB | 121.5 GB | | Benchmarks | Kimi-Linear-48B-A3B-Instruct | Kimi-Linear-48B-A3B-Instruct-AWQ-8bit | |:---------------:|:----------------:|:----------------:| | Perplexity | 1.54038 | 1.54041 | | GPQA Diamond | 51.0 | 57.1 | | AIME25 | 40.0 | 50.0 | (a) On MMLU-Pro (4k context length), Kimi Linear achieves 51.0 performance with similar speed as full attention. On RULER (128k context length), it shows Pareto-optimal performance (84.3) and 3.98x speedup. (b) Kimi Linear achieves 6.3x faster TPOT compared to MLA, offering significant speedups at long sequence lengths (1M tokens). Kimi Linear is a hybrid linear attention architecture that outperforms traditional full attention methods across various contexts, including short, long, and reinforcement learning (RL) scaling regimes. At its core is Kimi Delta Attention (KDA)—a refined version of Gated DeltaNet that introduces a more efficient gating mechanism to optimize the use of finite-state RNN memory. Kimi Linear achieves superior performance and hardware efficiency, especially for long-context tasks. It reduces the need for large KV caches by up to 75% and boosts decoding throughput by up to $6\times$ for contexts as long as 1M tokens. We open-source the KDA kernel in FLA, and release two versions model checkpoints trained with 5.7T tokens. | Model | #Total Params | #Activated Params | Context Length | Download Link | | :------------------: | :---------------: | :-------------------: | :----------------: | :------------------------------------------------------------------------------: | | Kimi-Linear-Base | 48B | 3B | 1M | 🤗 Hugging Face | | Kimi-Linear-Instruct | 48B | 3B | 1M | 🤗 Hugging Face | - Kimi Delta Attention (KDA): A linear attention mechanism that refines the gated delta rule with finegrained gating. - Hybrid Architecture: A 3:1 KDA-to-global MLA ratio reduces memory usage while maintaining or surpassing the quality of full attention. - Superior Performance: Outperforms full attention in a variety of tasks, including long-context and RL-style benchmarks on 1.4T token training runs with fair comparisons. - High Throughput: Achieves up to $6\times$ faster decoding and significantly reduces time per output token (TPOT). To use the Kimi Linear model, we recommend the following environment: `python` >= 3.10 `torch` >= 2.6 `fla-core` >= 0.4.0 For deployment, you can use the latest vllm to create an OpenAI-compatible API endpoint.

NaNK

license:mit

259

Step-3.5-Flash-AWQ-4bit

NaNK

license:apache-2.0

234

GLM-4.5V-AWQ-8bit

NaNK

license:mit

227

GLM-4.5V-AWQ-4bit

NaNK

license:mit

214

QwenLong-L1.5-30B-A3B-AWQ-4bit

NaNK

license:apache-2.0

199

GLM-4.7-AWQ-4bit

NaNK

license:mit

197

Ministral-3-8B-Reasoning-2512-AWQ-4bit

NaNK

license:apache-2.0

192

Ministral-3-14B-Instruct-2512-AWQ-8bit

NaNK

license:apache-2.0

176

ERNIE-4.5-VL-28B-A3B-Thinking-AWQ-8bit

NaNK

license:apache-2.0

171

Qwen3-Next-80B-A3B-Instruct-AWQ-8bit

NaNK

license:apache-2.0

159

MiniMax-M2-AWQ-4bit

NaNK

license:mit

159

GLM-5.1-AWQ-4bit

NaNK

license:mit

154

Qwen3-VL-30B-A3B-Thinking-AWQ-8bit

NaNK

license:apache-2.0

141

Qwen3-30B-A3B-Thinking-2507-AWQ-8bit

NaNK

license:apache-2.0

141

GLM-4.5-Air-AWQ-8bit

NaNK

license:mit

117

Solar-Open-100B-AWQ-8bit

NaNK

—

107

Ministral-3-14B-Reasoning-2512-AWQ-8bit

NaNK

license:apache-2.0

104

Ministral-3-8B-Reasoning-2512-AWQ-8bit

NaNK

license:apache-2.0

104

Qwen3-Nemotron-32B-RLBFF-AWQ-4bit

- Quantization Method: AWQ - Bits: 4 - Group Size: 32 - Calibration Dataset: nvidia/Llama-Nemotron-Post-Training-Dataset - Quantization Tool: llm-compressor Qwen3-Nemotron-32B-RLBFF is a large language model that leverages Qwen/Qwen3-32B as the foundation and is fine-tuned to improve the quality of LLM-generated responses in the default thinking mode. Given a conversation with multiple turns between user and assistant and a user-specified principle, it generates a response the final user turn. This is a research model described in and is released to support the following research paper: https://arxiv.org/abs/2509.21319 As of 24 Sep 2025, this model achieves Arena Hard V2 of 55.6% and WildBench Score of 70.33% and MT Bench of 9.50. This means that our model is substantially improved over the initial Qwen3-32B model and has similar performance compared to DeepSeek R1 and O3-mini at less than 5% of the inference cost (as indicated on openrouter). GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License. Qwen3-Nemotron-32B-RLBFF generates an response to a user query. HuggingFace 10/27/2025 via https://huggingface.co/nvidia/Qwen3-Nemotron-32B-RLBFF RLBFF HelpSteer3 HelpSteer3-Preference HelpSteer2-Preference SteerLM method HelpSteer HelpSteer2 Arena Hard V2, WildBench and MT Bench are common benchmarks for measuring general-domain model capabilities. | Model | MT Bench (GPT-4-Turbo) | Arena Hard v2 (95% CI) | WildBench Overall | Creative | Planning | Data Analy. | Info. Seek. | Coding | In/M | Out/M | Cost | |:--------------------------------------------|:------------------:|:----------------------:|:----------------:|:---------:|:---------:|:-----------:|:-----------:|:-------:|:----:|:----:|:----:| | Qwen3-Nemotron-32B-RLBFF | 9.50 | 55.6 (-1.6 / +1.4) | 70.33 | 71.73 | 70.73 | 69.37 | 68.96 | 70.94 | 0.018 | 0.072 | 1× | | Qwen3-32B | 9.38 | 44.0 (-1.6 / +1.5) | 67.57 | 68.63 | 67.95 | 64.68 | 66.78 | 69.53 | 0.018 | 0.072 | 1× | | o3-mini | 9.26 | 50.0 (-0.0 / +0.0) | 71.64 | 69.04 | 72.44 | 74.37 | 65.81 | 73.21 | 1.1 | 4.4 | 61× | | Claude-3.7-Sonnet (Thinking) | 8.93 | 54.2 (-2.0 / +1.8) | 65.45 | 66.72 | 65.94 | 63.59 | 63.08 | 67.36 | 3 | 15 | 188× | | DeepSeek R1 | 9.49 | 57.4 (-2.0 / +2.0) | 64.24 | 70.75 | 66.29 | 59.20 | 68.56 | 61.04 | 0.4 | 2 | 25× | Model Architecture: Architecture Type: Transformer Network Architecture: Qwen3 We developed this model using Qwen/Qwen3-32B as its foundation. This model contains 32 billion parameters. Input: Input Type(s): Text Input Format: String Input Parameters: One Dimensional (1D) Other Properties Related to Input: Max of 128k tokens (but trained only on conversations up to 4K tokens) Output: Output Type(s): Text Output Format: String Output Parameters: One-Dimensional (1D) Other Properties Related to Output: Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): [NeMo-RL - 0.3] Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper NVIDIA Turing You can use the model using HuggingFace Transformers library with 1 or more 80GB GPUs (NVIDIA Ampere or newer) with at least 70GB of free disk space to accomodate the download. Alternatively, you can use vLLM for accelerated inference. This code has been tested on Transformers v4.57.0, torch v2.3.0a0+40ec155e58.nv24.3 and 1 H100 80GB GPUs, but any setup that supports Qwen/Qwen3-32B should support this model as well. If you run into problems, you can consider doing pip install -U transformers. Dataset Name: HelpSteer3 Dataset Link: https://huggingface.co/datasets/nvidia/HelpSteer3 Data Collection Method by dataset [Hybrid: Human, Synthetic] Properties: 77,564 prompt-responses, each annotated with up to 3 annotations of free-text feedback (each being 50-250 words long) elaborating upon the overall helpfulness of the response. Dataset Name: HelpSteer3 Dataset Link: https://huggingface.co/datasets/nvidia/HelpSteer3 Data Collection Method by dataset [Hybrid: Human, Synthetic] Properties: 4,078 prompt-responses, each annotated with up to 3 annotations of free-text feedback (each being 50-250 words long) elaborating upon the overall helpfulness of the response. Dataset Name: Arena Hard Auto V2 Dataset Link: https://github.com/lmarena/arena-hard-auto Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Dataset Name: WildBench Dataset Link: https://github.com/allenai/WildBench Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Dataset Name: MT-Bench Dataset Link: https://huggingface.co/spaces/lmsys/mt-bench Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety and Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. If you find this model useful, please cite the following work:

NaNK

—

102

GLM-4.6-AWQ-4bit

- Quantization Method: AWQ - Bits: 4 - Group Size: 32 - Calibration Dataset: nvidia/Llama-Nemotron-Post-Training-Dataset - Quantization Tool: llm-compressor 📖 Check out the GLM-4.6 technical blog , technical report(GLM-4.5) , and Zhipu AI technical documentation . Compared with GLM-4.5, GLM-4.6 brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios. We evaluated GLM-4.6 across eight public benchmarks covering agents, reasoning, and coding. Results show clear gains over GLM-4.5, with GLM-4.6 also holding competitive advantages over leading domestic and international models such as DeepSeek-V3.1-Terminus and Claude Sonnet 4. Both GLM-4.5 and GLM-4.6 use the same inference method. For general evaluations, we recommend using a sampling temperature of 1.0. For code-related evaluation tasks (such as LCB), it is further recommended to set: - For tool-integrated reasoning, please refer to this doc. - For search benchmark, we design a specific format for searching toolcall in thinking mode to support search agent, please refer to this. for the detailed template.

NaNK

license:mit

Qwen3-VL-235B-A22B-Instruct-AWQ-4bit

- Quantization Method: cyankiwi AWQ v1.0 - Bits: 4 - Group Size: 32 - Calibration Dataset: HuggingFaceM4/FineVision - Quantization Tool: llm-compressor Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on‑demand deployment. Visual Agent: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from images/videos. Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. Long Context & Video Understanding: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. Enhanced Multimodal Reasoning: Excels in STEM/Math—causal analysis and logical, evidence-based answers. Upgraded Visual Recognition: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc. Expanded OCR: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. Text Understanding on par with pure LLMs: Seamless text–vision fusion for lossless, unified comprehension. 1. Interleaved-MRoPE: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning. 2. DeepStack: Fuses multi‑level ViT features to capture fine‑grained details and sharpen image–text alignment. 3. Text–Timestamp Alignment: Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling. This is the weight repository for Qwen3-VL-235B-A22B-Instruct. Below, we provide simple examples to show how to use Qwen3-VL with 🤖 ModelScope and 🤗 Transformers. The code of Qwen3-VL has been in the latest Hugging Face transformers and we advise you to build from source with command: Here we show a code snippet to show how to use the chat model with `transformers`: If you find our work helpful, feel free to give us a cite.

NaNK

license:apache-2.0

Olmo-3-32B-Think-AWQ-4bit

NaNK

license:apache-2.0

Ministral-3-3B-Instruct-2512-AWQ-8bit

NaNK

license:apache-2.0

Qwen3-VL-235B-A22B-Thinking-AWQ-4bit

- Quantization Method: cyankiwi AWQ v1.0 - Bits: 4 - Group Size: 32 - Calibration Dataset: 5CD-AI/LLaVA-CoT-o1-Instruct - Quantization Tool: llm-compressor Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on‑demand deployment. Visual Agent: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from images/videos. Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. Long Context & Video Understanding: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. Enhanced Multimodal Reasoning: Excels in STEM/Math—causal analysis and logical, evidence-based answers. Upgraded Visual Recognition: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc. Expanded OCR: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. Text Understanding on par with pure LLMs: Seamless text–vision fusion for lossless, unified comprehension. 1. Interleaved-MRoPE: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning. 2. DeepStack: Fuses multi‑level ViT features to capture fine‑grained details and sharpen image–text alignment. 3. Text–Timestamp Alignment: Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling. This is the weight repository for Qwen3-VL-235B-A22B-Thinking. Below, we provide simple examples to show how to use Qwen3-VL with 🤖 ModelScope and 🤗 Transformers. The code of Qwen3-VL has been in the latest Hugging face transformers and we advise you to build from source with command: Here we show a code snippet to show you how to use the chat model with `transformers`: If you find our work helpful, feel free to give us a cite.

NaNK

license:apache-2.0

GLM-5-AWQ-4bit

NaNK

license:mit

MiniMax-M2-BF16

- Base Model: MiniMaxAI/MiniMax-M2 - Conversion Tool: DeepSeek-V3 Today, we release and open source MiniMax-M2, a Mini model built for Max coding & agentic workflows. MiniMax-M2 redefines efficiency for agents. It's a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever. Superior Intelligence. According to benchmarks from Artificial Analysis, MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use. Its composite score ranks #1 among open-source models globally. Advanced Coding. Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages. Agent Performance. MiniMax-M2 plans and executes complex, long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains evidence traceable, and gracefully recovers from flaky steps. Efficient Design. With 10 billion activated parameters (230 billion in total), MiniMax-M2 delivers lower latency, lower cost, and higher throughput for interactive agents and batched sampling—perfectly aligned with the shift toward highly deployable models that still shine on coding and agentic tasks. These comprehensive evaluations test real-world end-to-end coding and agentic tool use: editing real repos, executing commands, browsing the web, and delivering functional solutions. Performance on this suite correlates with day-to-day developer experience in terminals, IDEs, and CI. | Benchmark | MiniMax-M2 | Claude Sonnet 4 | Claude Sonnet 4.5 | Gemini 2.5 Pro | GPT-5 (thinking) | GLM-4.6 | Kimi K2 0905 | DeepSeek-V3.2 | |-----------|------------|-----------------|-------------------|-----------------|------------------|---------|---------------|----------------| | SWE-bench Verified | 69.4 | 72.7 | 77.2 | 63.8 | 74.9 | 68 | 69.2 | 67.8 | | Multi-SWE-Bench | 36.2 | 35.7 | 44.3 | / | / | 30 | 33.5 | 30.6 | | SWE-bench Multilingual | 56.5 | 56.9 | 68 | / | / | 53.8 | 55.9 | 57.9 | | Terminal-Bench | 46.3 | 36.4 | 50 | 25.3 | 43.8 | 40.5 | 44.5 | 37.7 | | ArtifactsBench | 66.8 | 57.3 | 61.5 | 57.7 | 73 | 59.8 | 54.2 | 55.8 | | BrowseComp | 44 | 12.2 | 19.6 | 9.9 | 54.9 | 45.1 | 14.1 | 40.1 | | BrowseComp-zh | 48.5 | 29.1 | 40.8 | 32.2 | 65 | 49.5 | 28.8 | 47.9 | | GAIA (text only) | 75.7 | 68.3 | 71.2 | 60.2 | 76.4 | 71.9 | 60.2 | 63.5 | | xbench-DeepSearch | 72 | 64.6 | 66 | 56 | 77.8 | 70 | 61 | 71 | | HLE (w/ tools) | 31.8 | 20.3 | 24.5 | 28.4 | 35.2 | 30.4 | 26.9 | 27.2 | | τ²-Bench | 77.2 | 65.5 | 84.7 | 59.2 | 80.1 | 75.9 | 70.3 | 66.7 | | FinSearchComp-global | 65.5 | 42 | 60.8 | 42.6 | 63.9 | 29.2 | 29.5 | 26.2 | | AgentCompany | 36 | 37 | 41 | 39.3 | / | 35 | 30 | 34 | >Notes: Data points marked with an asterisk () are taken directly from the model's official tech report or blog. All other metrics were obtained using the evaluation methods described below. >- SWE-bench Verified: We use the same scaffold as R2E-Gym (Jain et al. 2025) on top of OpenHands to test with agents on SWE tasks. All scores are validated on our internal infrastructure with 128k context length, 100 max steps, and no test-time scaling. All git-related content is removed to ensure agent sees only the code at the issue point. >- Multi-SWE-Bench & SWE-bench Multilingual: All scores are averaged across 8 runs using the claude-code CLI (300 max steps) as the evaluation scaffold. >- Terminal-Bench: All scores are evaluated with the official claude-code from the original Terminal-Bench repository(commit `94bf692`), averaged over 8 runs to report the mean pass rate. >- ArtifactsBench: All Scores are computed by averaging three runs with the official implementation of ArtifactsBench, using the stable Gemini-2.5-Pro as the judge model. >- BrowseComp & BrowseComp-zh & GAIA (text only) & xbench-DeepSearch: All scores reported use the same agent framework as WebExplorer (Liu et al. 2025), with minor tools description adjustment. We use the 103-sample text-only GAIA validation subset following WebExplorer (Liu et al. 2025). >- HLE (w/ tools): All reported scores are obtained using search tools and a Python tool. The search tools employ the same agent framework as WebExplorer (Liu et al. 2025), and the Python tool runs in a Jupyter environment. We use the text-only HLE subset. >- τ²-Bench: All scores reported use "extended thinking with tool use", and employ GPT-4.1 as the user simulator. >- FinSearchComp-global: Official results are reported for GPT-5-Thinking, Gemini 2.5 Pro, and Kimi-K2. Other models are evaluated using the open-source FinSearchComp (Hu et al. 2025) framework using both search and Python tools, launched simultaneously for consistency. >- AgentCompany: All scores reported use OpenHands 0.42 agent framework. We align with Artificial Analysis, which aggregates challenging benchmarks using a consistent methodology to reflect a model’s broader intelligence profile across math, science, instruction following, coding, and agentic tool use. | Metric (AA) | MiniMax-M2 | Claude Sonnet 4 | Claude Sonnet 4.5 | Gemini 2.5 Pro | GPT-5 (thinking) | GLM-4.6 | Kimi K2 0905 | DeepSeek-V3.2 | |-----------------|----------------|---------------------|------------------------|---------------------|----------------------|-------------|------------------|-------------------| | AIME25 | 78 | 74 | 88 | 88 | 94 | 86 | 57 | 88 | | MMLU-Pro | 82 | 84 | 88 | 86 | 87 | 83 | 82 | 85 | | GPQA-Diamond | 78 | 78 | 83 | 84 | 85 | 78 | 77 | 80 | | HLE (w/o tools) | 12.5 | 9.6 | 17.3 | 21.1 | 26.5 | 13.3 | 6.3 | 13.8 | | LiveCodeBench (LCB) | 83 | 66 | 71 | 80 | 85 | 70 | 61 | 79 | | SciCode | 36 | 40 | 45 | 43 | 43 | 38 | 31 | 38 | | IFBench | 72 | 55 | 57 | 49 | 73 | 43 | 42 | 54 | | AA-LCR | 61 | 65 | 66 | 66 | 76 | 54 | 52 | 69 | | τ²-Bench-Telecom | 87 | 65 | 78 | 54 | 85 | 71 | 73 | 34 | | Terminal-Bench-Hard | 24 | 30 | 33 | 25 | 31 | 23 | 23 | 29 | | AA Intelligence | 61 | 57 | 63 | 60 | 69 | 56 | 50 | 57 | >AA: All scores of MiniMax-M2 aligned with Artificial Analysis Intelligence Benchmarking Methodology (https://artificialanalysis.ai/methodology/intelligence-benchmarking). All scores of other models reported from https://artificialanalysis.ai/. By maintaining activations around 10B , the plan → act → verify loop in the agentic workflow is streamlined, improving responsiveness and reducing compute overhead: - Faster feedback cycles in compile-run-test and browse-retrieve-cite chains. - More concurrent runs on the same budget for regression suites and multi-seed explorations. - Simpler capacity planning with smaller per-request memory and steadier tail latency. In short: 10B activations = responsive agent loops + better unit economics. If you need frontier-style coding and agents without frontier-scale costs, MiniMax-M2 hits the sweet spot: fast inference speeds, robust tool-use capabilities, and a deployment-friendly footprint. We look forward to your feedback and to collaborating with developers and researchers to bring the future of intelligent collaboration one step closer. - Our product MiniMax Agent, built on MiniMax-M2, is now publicly available and free for a limited time: https://agent.minimax.io/ - The MiniMax-M2 API is now live on the MiniMax Open Platform and is free for a limited time: https://platform.minimax.io/docs/guides/text-generation - The MiniMax-M2 model weights are now open-source, allowing for local deployment and use: https://huggingface.co/MiniMaxAI/MiniMax-M2. Download the model from HuggingFace repository: https://huggingface.co/MiniMaxAI/MiniMax-M2. We recommend using the following inference frameworks (listed alphabetically) to serve the model: We recommend using SGLang to serve MiniMax-M2. SGLang provides solid day-0 support for MiniMax-M2 model. Please refer to our SGLang Deployment Guide for more details, and thanks so much for our collaboration with the SGLang team. We recommend using vLLM to serve MiniMax-M2. vLLM provides efficient day-0 support of MiniMax-M2 model, check https://docs.vllm.ai/projects/recipes/en/latest/MiniMax/MiniMax-M2.html for latest deployment guide. We also provide our vLLM Deployment Guide. Inference Parameters We recommend using the following parameters for best performance: `temperature=1.0`, `topp = 0.95`, `topk = 40`. IMPORTANT: MiniMax-M2 is an interleaved thinking model. Therefore, when using it, it is important to retain the thinking content from the assistant's turns within the historical messages. In the model's output content, we use the ` ... ` format to wrap the assistant's thinking content. When using the model, you must ensure that the historical content is passed back in its original format. Do not remove the ` ... ` part, otherwise, the model's performance will be negatively affected.

NaNK

license:mit

Ministral-3-3B-Reasoning-2512-AWQ-4bit

NaNK

license:apache-2.0

GLM-4.5-AWQ-4bit

NaNK

license:mit

MiroThinker-v1.0-30B-AWQ-4bit

NaNK

license:mit

Magistral-Small-2507-AWQ-4bit

NaNK

license:apache-2.0

GLM-4.6V-AWQ-8bit

NaNK

license:mit

INTELLECT-3.1-AWQ-4bit

NaNK

license:mit

Nemotron-Orchestrator-8B-AWQ-8bit

NaNK

—

nomos-1-AWQ-4bit

NaNK

license:apache-2.0

Olmo-3-32B-Think-AWQ-8bit

NaNK

license:apache-2.0

MiniMax-M2-REAP-162B-A10B-AWQ-4bit

NaNK

—

Qwen3-Nemotron-32B-RLBFF-AWQ-8bit

- Quantization Method: AWQ - Bits: 8 - Group Size: 32 - Calibration Dataset: nvidia/Llama-Nemotron-Post-Training-Dataset - Quantization Tool: llm-compressor Qwen3-Nemotron-32B-RLBFF is a large language model that leverages Qwen/Qwen3-32B as the foundation and is fine-tuned to improve the quality of LLM-generated responses in the default thinking mode. Given a conversation with multiple turns between user and assistant and a user-specified principle, it generates a response the final user turn. This is a research model described in and is released to support the following research paper: https://arxiv.org/abs/2509.21319 As of 24 Sep 2025, this model achieves Arena Hard V2 of 55.6% and WildBench Score of 70.33% and MT Bench of 9.50. This means that our model is substantially improved over the initial Qwen3-32B model and has similar performance compared to DeepSeek R1 and O3-mini at less than 5% of the inference cost (as indicated on openrouter). GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License. Qwen3-Nemotron-32B-RLBFF generates an response to a user query. HuggingFace 10/27/2025 via https://huggingface.co/nvidia/Qwen3-Nemotron-32B-RLBFF RLBFF HelpSteer3 HelpSteer3-Preference HelpSteer2-Preference SteerLM method HelpSteer HelpSteer2 Arena Hard V2, WildBench and MT Bench are common benchmarks for measuring general-domain model capabilities. | Model | MT Bench (GPT-4-Turbo) | Arena Hard v2 (95% CI) | WildBench Overall | Creative | Planning | Data Analy. | Info. Seek. | Coding | In/M | Out/M | Cost | |:--------------------------------------------|:------------------:|:----------------------:|:----------------:|:---------:|:---------:|:-----------:|:-----------:|:-------:|:----:|:----:|:----:| | Qwen3-Nemotron-32B-RLBFF | 9.50 | 55.6 (-1.6 / +1.4) | 70.33 | 71.73 | 70.73 | 69.37 | 68.96 | 70.94 | 0.018 | 0.072 | 1× | | Qwen3-32B | 9.38 | 44.0 (-1.6 / +1.5) | 67.57 | 68.63 | 67.95 | 64.68 | 66.78 | 69.53 | 0.018 | 0.072 | 1× | | o3-mini | 9.26 | 50.0 (-0.0 / +0.0) | 71.64 | 69.04 | 72.44 | 74.37 | 65.81 | 73.21 | 1.1 | 4.4 | 61× | | Claude-3.7-Sonnet (Thinking) | 8.93 | 54.2 (-2.0 / +1.8) | 65.45 | 66.72 | 65.94 | 63.59 | 63.08 | 67.36 | 3 | 15 | 188× | | DeepSeek R1 | 9.49 | 57.4 (-2.0 / +2.0) | 64.24 | 70.75 | 66.29 | 59.20 | 68.56 | 61.04 | 0.4 | 2 | 25× | Model Architecture: Architecture Type: Transformer Network Architecture: Qwen3 We developed this model using Qwen/Qwen3-32B as its foundation. This model contains 32 billion parameters. Input: Input Type(s): Text Input Format: String Input Parameters: One Dimensional (1D) Other Properties Related to Input: Max of 128k tokens (but trained only on conversations up to 4K tokens) Output: Output Type(s): Text Output Format: String Output Parameters: One-Dimensional (1D) Other Properties Related to Output: Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): [NeMo-RL - 0.3] Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper NVIDIA Turing You can use the model using HuggingFace Transformers library with 1 or more 80GB GPUs (NVIDIA Ampere or newer) with at least 70GB of free disk space to accomodate the download. Alternatively, you can use vLLM for accelerated inference. This code has been tested on Transformers v4.57.0, torch v2.3.0a0+40ec155e58.nv24.3 and 1 H100 80GB GPUs, but any setup that supports Qwen/Qwen3-32B should support this model as well. If you run into problems, you can consider doing pip install -U transformers. Dataset Name: HelpSteer3 Dataset Link: https://huggingface.co/datasets/nvidia/HelpSteer3 Data Collection Method by dataset [Hybrid: Human, Synthetic] Properties: 77,564 prompt-responses, each annotated with up to 3 annotations of free-text feedback (each being 50-250 words long) elaborating upon the overall helpfulness of the response. Dataset Name: HelpSteer3 Dataset Link: https://huggingface.co/datasets/nvidia/HelpSteer3 Data Collection Method by dataset [Hybrid: Human, Synthetic] Properties: 4,078 prompt-responses, each annotated with up to 3 annotations of free-text feedback (each being 50-250 words long) elaborating upon the overall helpfulness of the response. Dataset Name: Arena Hard Auto V2 Dataset Link: https://github.com/lmarena/arena-hard-auto Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Dataset Name: WildBench Dataset Link: https://github.com/allenai/WildBench Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Dataset Name: MT-Bench Dataset Link: https://huggingface.co/spaces/lmsys/mt-bench Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety and Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. If you find this model useful, please cite the following work:

NaNK

—

JanusCoder-14B-AWQ-4bit

- Quantization Method: AWQ - Bits: 4 - Group Size: 32 - Calibration Dataset: nvidia/Llama-Nemotron-Post-Training-Dataset - Quantization Tool: llm-compressor 💻Github Repo • 🤗Model Collections • 📜Technical Report We introduce JanusCoder and JanusCoderV, a suite of open-source foundational models designed to establish a unified visual-programmatic interface for code intelligence. This model suite is built upon open-source language models (such as Qwen3-8B and 14B) and multimodal models (such as Qwen2.5-VL and InternVL3.5-8B). The JanusCoder series is trained on JANUSCODE-800K—the largest multimodal code corpus to date, generated by an innovative synthesis toolkit, covering everything from standard charts to complex interactive Web UIs and code-driven animations. This enables the models to uniformly handle diverse visual-programmatic tasks, such as generating code from textual instructions, visual inputs, or a combination of both, rather than building specialized models for isolated tasks. JanusCoder excels at flexible content generation (like data visualizations and interactive front-ends) as well as precise, program-driven editing of visual effects and complex animation construction. | Model Name | Description | Download | | --- | --- | --- | | JanusCoder-8B | 8B text model based on Qwen3-8B. | 🤗 Model | | 👉 JanusCoder-14B | 14B text model based on Qwen3-14B. | 🤗 Model | | JanusCoderV-7B | 7B multimodal model based on Qwen2.5-VL-7B. | 🤗 Model | | JanusCoderV-8B | 8B multimodal model based on InternVL3.5-8B. | 🤗 Model | We evaluate the JanusCoder model on various benchmarks that span code interlligence tasks on multiple PLs: | Model | JanusCoder-14B | Qwen3-14B | Qwen2.5-Coder-32B-Instruct | LLaMA3-8B-Instruct | GPT-4o | | --- | --- | --- | --- | --- | --- | | PandasPlotBench (Task) | 86 | 78 | 82 | 69 | 85 | | ArtifactsBench | 41.1 | 36.5 | 35.5 | 36.5 | 37.9 | | DTVBench (Manim) | 8.41 | 6.63 | 9.61 | 4.92 | 10.60 | | DTVBench (Wolfram) | 5.97 | 5.08 | 4.98 | 3.15 | 5.97 | The following provides demo code illustrating how to generate text using JanusCoder-14B. > Please use transformers >= 4.55.0 to ensure the model works normally. Citation 🫶 If you are interested in our work or find the repository / checkpoints / benchmark / data helpful, please consider using the following citation format when referencing our papers:

NaNK

license:apache-2.0

JanusCoder-14B-AWQ-8bit

- Quantization Method: AWQ - Bits: 8 - Group Size: 32 - Calibration Dataset: nvidia/Llama-Nemotron-Post-Training-Dataset - Quantization Tool: llm-compressor 💻Github Repo • 🤗Model Collections • 📜Technical Report We introduce JanusCoder and JanusCoderV, a suite of open-source foundational models designed to establish a unified visual-programmatic interface for code intelligence. This model suite is built upon open-source language models (such as Qwen3-8B and 14B) and multimodal models (such as Qwen2.5-VL and InternVL3.5-8B). The JanusCoder series is trained on JANUSCODE-800K—the largest multimodal code corpus to date, generated by an innovative synthesis toolkit, covering everything from standard charts to complex interactive Web UIs and code-driven animations. This enables the models to uniformly handle diverse visual-programmatic tasks, such as generating code from textual instructions, visual inputs, or a combination of both, rather than building specialized models for isolated tasks. JanusCoder excels at flexible content generation (like data visualizations and interactive front-ends) as well as precise, program-driven editing of visual effects and complex animation construction. | Model Name | Description | Download | | --- | --- | --- | | JanusCoder-8B | 8B text model based on Qwen3-8B. | 🤗 Model | | 👉 JanusCoder-14B | 14B text model based on Qwen3-14B. | 🤗 Model | | JanusCoderV-7B | 7B multimodal model based on Qwen2.5-VL-7B. | 🤗 Model | | JanusCoderV-8B | 8B multimodal model based on InternVL3.5-8B. | 🤗 Model | We evaluate the JanusCoder model on various benchmarks that span code interlligence tasks on multiple PLs: | Model | JanusCoder-14B | Qwen3-14B | Qwen2.5-Coder-32B-Instruct | LLaMA3-8B-Instruct | GPT-4o | | --- | --- | --- | --- | --- | --- | | PandasPlotBench (Task) | 86 | 78 | 82 | 69 | 85 | | ArtifactsBench | 41.1 | 36.5 | 35.5 | 36.5 | 37.9 | | DTVBench (Manim) | 8.41 | 6.63 | 9.61 | 4.92 | 10.60 | | DTVBench (Wolfram) | 5.97 | 5.08 | 4.98 | 3.15 | 5.97 | The following provides demo code illustrating how to generate text using JanusCoder-14B. > Please use transformers >= 4.55.0 to ensure the model works normally. Citation 🫶 If you are interested in our work or find the repository / checkpoints / benchmark / data helpful, please consider using the following citation format when referencing our papers:

NaNK

license:apache-2.0

JoyAI-LLM-Flash-AWQ-4bit

NaNK

—

Nanbeige4.1-3B-AWQ-4bit

NaNK

llama

Trinity-Mini-AWQ-4bit

NaNK

license:apache-2.0

INTELLECT-3-AWQ-8bit

NaNK

license:mit

nomos-1-AWQ-8bit

NaNK

license:apache-2.0

Nanbeige4.1-3B-AWQ-8bit

NaNK

llama

JanusCoder-8B-AWQ-8bit

- Quantization Method: AWQ - Bits: 8 - Group Size: 32 - Calibration Dataset: nvidia/Llama-Nemotron-Post-Training-Dataset - Quantization Tool: llm-compressor 💻Github Repo • 🤗Model Collections • 📜Technical Report We introduce JanusCoder and JanusCoderV, a suite of open-source foundational models designed to establish a unified visual-programmatic interface for code intelligence. This model suite is built upon open-source language models (such as Qwen3-8B and 14B) and multimodal models (such as Qwen2.5-VL and InternVL3.5-8B). The JanusCoder series is trained on JANUSCODE-800K—the largest multimodal code corpus to date, generated by an innovative synthesis toolkit, covering everything from standard charts to complex interactive Web UIs and code-driven animations. This enables the models to uniformly handle diverse visual-programmatic tasks, such as generating code from textual instructions, visual inputs, or a combination of both, rather than building specialized models for isolated tasks. JanusCoder excels at flexible content generation (like data visualizations and interactive front-ends) as well as precise, program-driven editing of visual effects and complex animation construction. | Model Name | Description | Download | | --- | --- | --- | | 👉 JanusCoder-8B | 8B text model based on Qwen3-8B. | 🤗 Model | | JanusCoder-14B | 14B text model based on Qwen3-14B. | 🤗 Model | | JanusCoderV-7B | 7B multimodal model based on Qwen2.5-VL-7B. | 🤗 Model | | JanusCoderV-8B | 8B multimodal model based on InternVL3.5-8B. | 🤗 Model | We evaluate the JanusCoder model on various benchmarks that span code interlligence tasks on multiple PLs: | Model | JanusCoder-8B | Qwen3-8B | Qwen2.5-Coder-7B-Instruct | LLaMA3-8B-Instruct | GPT-4o | | --- | --- | --- | --- | --- | --- | | PandasPlotBench (Task) | 80 | 74 | 76 | 69 | 85 | | ArtifactsBench | 39.6 | 36.5 | 26.0 | 36.5 | 37.9 | | DTVBench (Manim) | 9.70 | 6.20 | 8.56 | 4.92 | 10.60 | | DTVBench (Wolfram) | 6.07 | 5.18 | 4.04 | 3.15 | 5.97 | The following provides demo code illustrating how to generate text using JanusCoder-8B. > Please use transformers >= 4.55.0 to ensure the model works normally. Citation 🫶 If you are interested in our work or find the repository / checkpoints / benchmark / data helpful, please consider using the following citation format when referencing our papers:

NaNK

license:apache-2.0

INTELLECT-3-AWQ-4bit

NaNK

license:mit

JanusCoder-8B-AWQ-4bit

- Quantization Method: AWQ - Bits: 4 - Group Size: 32 - Calibration Dataset: nvidia/Llama-Nemotron-Post-Training-Dataset - Quantization Tool: llm-compressor 💻Github Repo • 🤗Model Collections • 📜Technical Report We introduce JanusCoder and JanusCoderV, a suite of open-source foundational models designed to establish a unified visual-programmatic interface for code intelligence. This model suite is built upon open-source language models (such as Qwen3-8B and 14B) and multimodal models (such as Qwen2.5-VL and InternVL3.5-8B). The JanusCoder series is trained on JANUSCODE-800K—the largest multimodal code corpus to date, generated by an innovative synthesis toolkit, covering everything from standard charts to complex interactive Web UIs and code-driven animations. This enables the models to uniformly handle diverse visual-programmatic tasks, such as generating code from textual instructions, visual inputs, or a combination of both, rather than building specialized models for isolated tasks. JanusCoder excels at flexible content generation (like data visualizations and interactive front-ends) as well as precise, program-driven editing of visual effects and complex animation construction. | Model Name | Description | Download | | --- | --- | --- | | 👉 JanusCoder-8B | 8B text model based on Qwen3-8B. | 🤗 Model | | JanusCoder-14B | 14B text model based on Qwen3-14B. | 🤗 Model | | JanusCoderV-7B | 7B multimodal model based on Qwen2.5-VL-7B. | 🤗 Model | | JanusCoderV-8B | 8B multimodal model based on InternVL3.5-8B. | 🤗 Model | We evaluate the JanusCoder model on various benchmarks that span code interlligence tasks on multiple PLs: | Model | JanusCoder-8B | Qwen3-8B | Qwen2.5-Coder-7B-Instruct | LLaMA3-8B-Instruct | GPT-4o | | --- | --- | --- | --- | --- | --- | | PandasPlotBench (Task) | 80 | 74 | 76 | 69 | 85 | | ArtifactsBench | 39.6 | 36.5 | 26.0 | 36.5 | 37.9 | | DTVBench (Manim) | 9.70 | 6.20 | 8.56 | 4.92 | 10.60 | | DTVBench (Wolfram) | 6.07 | 5.18 | 4.04 | 3.15 | 5.97 | The following provides demo code illustrating how to generate text using JanusCoder-8B. > Please use transformers >= 4.55.0 to ensure the model works normally. Citation 🫶 If you are interested in our work or find the repository / checkpoints / benchmark / data helpful, please consider using the following citation format when referencing our papers:

NaNK

license:apache-2.0