cpatonn

137 models • 2 total models in database

Sort by:

Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit

Method vllm-project/llm-compressor and nvidia/Llama-Nemotron-Post-Training-Dataset were used to quantize the original model. For further quantization arguments and configurations information, please visit config.json and recipe.yaml. Inference Please install the latest vllm releases for better support: Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit example usage: Qwen3-Coder is available in multiple sizes. Today, we're excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements: - Significant Performance among open models on Agentic Coding, Agentic Browser-Use, and other foundational coding tasks. - Long-context Capabilities with native support for 256K tokens, extendable up to 1M tokens using Yarn, optimized for repository-scale understanding. - Agentic Coding supporting for most platform such as Qwen Code, CLINE, featuring a specially designed function call format. Qwen3-Coder-30B-A3B-Instruct has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 30.5B in total and 3.3B activated - Number of Layers: 48 - Number of Attention Heads (GQA): 32 for Q and 4 for KV - Number of Experts: 128 - Number of Activated Experts: 8 - Context Length: 262,144 natively. NOTE: This model supports only non-thinking mode and does not generate `` `` blocks in its output. Meanwhile, specifying `enablethinking=False` is no longer required. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation. We advise you to use the latest version of `transformers`. Define Tools tools=[ { "type":"function", "function":{ "name": "squarethenumber", "description": "output the square of the number.", "parameters": { "type": "object", "required": ["inputnum"], "properties": { 'inputnum': { 'type': 'number', 'description': 'inputnum is a number that will be squared' } }, } } } ] import OpenAI Define LLM client = OpenAI( # Use a custom endpoint compatible with OpenAI API baseurl='http://localhost:8000/v1', # apibase apikey="EMPTY" ) messages = [{'role': 'user', 'content': 'square the number 1024'}] completion = client.chat.completions.create( messages=messages, model="Qwen3-Coder-30B-A3B-Instruct", maxtokens=65536, tools=tools, ) @misc{qwen3technicalreport, title={Qwen3 Technical Report}, author={Qwen Team}, year={2025}, eprint={2505.09388}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.09388}, } ```

cpatonn

Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit

GLM-4.5-Air-AWQ-4bit

Qwen3-Next-80B-A3B-Instruct-AWQ-8bit

Qwen3-30B-A3B-Instruct-2507-AWQ-4bit

Qwen3-VL-32B-Thinking-AWQ-4bit

Qwen3-Next-80B-A3B-Instruct-AWQ-4bit

Qwen3-Next-80B-A3B-Thinking-AWQ-4bit

InternVL3_5-38B-AWQ-4bit

Qwen3-VL-30B-A3B-Instruct-AWQ-4bit

NVIDIA-Nemotron-Nano-12B-v2-AWQ-8bit

granite-4.0-h-micro-AWQ-4bit

Qwen3-VL-8B-Instruct-AWQ-8bit

Qwen3-VL-8B-Instruct-AWQ-4bit

InternVL3_5-14B-AWQ-4bit

Llama-3_3-Nemotron-Super-49B-v1_5-AWQ-4bit

Qwen3-4B-Instruct-2507-AWQ-4bit

Qwen3-30B-A3B-Thinking-2507-AWQ-4bit

Qwen3-Omni-30B-A3B-Instruct-AWQ-4bit

Devstral-Small-2507-AWQ-4bit

Qwen3-VL-4B-Instruct-AWQ-4bit

NVIDIA-Nemotron-Nano-12B-v2-AWQ-4bit

Qwen3-VL-8B-Thinking-AWQ-4bit

Qwen3-Omni-30B-A3B-Captioner-AWQ-4bit

Qwen3-VL-32B-Instruct-AWQ-4bit

granite-4.0-h-small-AWQ-4bit

Qwen3-VL-30B-A3B-Thinking-AWQ-4bit

NVIDIA-Nemotron-Nano-9B-v2-AWQ-4bit

Magistral-Small-2509-AWQ-4bit

Qwen3-Omni-30B-A3B-Thinking-AWQ-4bit

Ling-flash-2.0-AWQ-8bit

Qwen3-30B-A3B-Instruct-2507-AWQ-8bit

Qwen3-VL-8B-Thinking-AWQ-8bit

Qwen3-4B-Thinking-2507-AWQ-4bit

Qwen3-Coder-30B-A3B-Instruct-GPTQ-4bit

GLM-4.5V-AWQ-4bit

Qwen3-Omni-30B-A3B-Instruct-AWQ-8bit

Qwen3-Coder-30B-A3B-Instruct-AWQ-8bit

Magistral-Small-2507-AWQ-4bit

Qwen3-VL-4B-Thinking-AWQ-4bit

granite-4.0-h-tiny-AWQ-4bit

GLM-4.5V-AWQ-8bit

Apriel-1.5-15b-Thinker-AWQ-8bit

Apriel-1.5-15b-Thinker-AWQ-4bit

GLM-4.5-Air-GPTQ-4bit

Kimi-Dev-72B-AWQ-4bit

GLM-4.5-Air-AWQ-8bit

KAT-Dev-AWQ-4bit

Hermes-4-70B-AWQ-4bit

Qwopus3.5-27B-v3-AWQ-INT8-INT4

Qwen3-VL-4B-Instruct-AWQ-8bit

Qwen3-Next-80B-A3B-Thinking-AWQ-8bit

KAT-Dev-72B-Exp-AWQ-4bit

Qwen3-30B-A3B-Thinking-2507-AWQ-8bit

Magistral-Small-2509-AWQ-8bit

Qwen3-VL-30B-A3B-Instruct-AWQ-8bit

NVIDIA-Nemotron-Nano-9B-v2-AWQ-8bit

Ring-flash-2.0-AWQ-8bit

Qwen3-VL-32B-Instruct-AWQ-8bit

Qwopus3.5-27B-v3-AWQ-BF16-INT8

Qwen3-Coder-30B-A3B-Instruct-GPTQ-8bit

Qwen3-4B-Instruct-2507-AWQ-8bit

Ring-flash-2.0-AWQ-4bit

GLM-4.5-AWQ-4bit

Qwen3-VL-30B-A3B-Thinking-AWQ-8bit

granite-4.0-h-small-AWQ-8bit

Ling-mini-2.0-AWQ-4bit

InternVL3_5-14B-AWQ-8bit

Qwen3-VL-32B-Thinking-AWQ-8bit

Ring-mini-2.0-AWQ-4bit

Apertus-8B-Instruct-2509-GPTQ-4bit

gpt-oss-20b-BF16

Qwen3-Omni-30B-A3B-Thinking-AWQ-8bit

Qwen3-VL-4B-Thinking-AWQ-8bit

Ling-flash-2.0-AWQ-4bit

InternVL3_5-38B-AWQ-8bit

granite-4.0-h-tiny-AWQ-8bit

Qwopus3.5-27B-v3-AWQ-BF16-INT4

Tongyi-DeepResearch-30B-A3B-AWQ-8bit

DeepSeek-V3.1-GPTQ-4bit