Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound

Name: Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound
Author: Intel

80.0B

license:apache-2.0

Intel

Language Model

OTHER

80B params

New

34 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

179GB+ RAM

Mobile

Laptop

Server

Quick Summary

This model is a mixed int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-Next-80B-A3B-Instruct generated by intel/auto-round via RTN(no algorithm tuning).

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

75GB+ RAM

Code Examples

How To Usepythontransformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Intel/Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
  model_name,
  dtype="auto",
  device_map="auto",
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)
"""
content: A large language model (LLM) is a type of artificial intelligence system trained on vast amounts of text data to understand and generate human-like language. These models, such as GPT, PaLM, or LLaMA, use deep learning architectures—typically based on the transformer network—to predict the next word in a sequence, enabling them to answer questions, write essays, translate languages, and even code. LLMs learn patterns, context, and relationships in language without explicit programming, making them versatile tools for a wide range of natural language tasks. Their scale—often with billions or trillions of parameters—allows them to capture nuanced linguistic features, though they also require significant computational resources and raise important ethical and safety considerations.
"""

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.