Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound

34
23
80.0B
license:apache-2.0
by
Intel
Language Model
OTHER
80B params
New
34 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
179GB+ RAM
Mobile
Laptop
Server
Quick Summary

This model is a mixed int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-Next-80B-A3B-Instruct generated by intel/auto-round via RTN(no algorithm tuning).

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
75GB+ RAM

Code Examples

How To Usepythontransformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Intel/Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
  model_name,
  dtype="auto",
  device_map="auto",
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)
"""
content: A large language model (LLM) is a type of artificial intelligence system trained on vast amounts of text data to understand and generate human-like language. These models, such as GPT, PaLM, or LLaMA, use deep learning architectures—typically based on the transformer network—to predict the next word in a sequence, enabling them to answer questions, write essays, translate languages, and even code. LLMs learn patterns, context, and relationships in language without explicit programming, making them versatile tools for a wide range of natural language tasks. Their scale—often with billions or trillions of parameters—allows them to capture nuanced linguistic features, though they also require significant computational resources and raise important ethical and safety considerations.
"""

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.