Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound
34
23
80.0B
license:apache-2.0
by
Intel
Language Model
OTHER
80B params
New
34 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
179GB+ RAM
Mobile
Laptop
Server
Quick Summary
This model is a mixed int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-Next-80B-A3B-Instruct generated by intel/auto-round via RTN(no algorithm tuning).
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
75GB+ RAM
Code Examples
How To Usepythontransformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Intel/Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype="auto",
device_map="auto",
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
"""
content: A large language model (LLM) is a type of artificial intelligence system trained on vast amounts of text data to understand and generate human-like language. These models, such as GPT, PaLM, or LLaMA, use deep learning architectures—typically based on the transformer network—to predict the next word in a sequence, enabling them to answer questions, write essays, translate languages, and even code. LLMs learn patterns, context, and relationships in language without explicit programming, making them versatile tools for a wide range of natural language tasks. Their scale—often with billions or trillions of parameters—allows them to capture nuanced linguistic features, though they also require significant computational resources and raise important ethical and safety considerations.
"""Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.