Qwen3-8B-FP8-dynamic
19.7K
11
8.0B
32 languages
license:apache-2.0
by
RedHatAI
Language Model
OTHER
8B params
Fair
20K downloads
Community-tested
Edge AI:
Mobile
Laptop
Server
18GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
8GB+ RAM
Code Examples
Deploymentpythontransformers
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "RedHatAI/Qwen3-8B-FP8-dynamic"
number_gpus = 1
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=20, min_p=0, max_tokens=256)
messages = [
{"role": "user", "content": prompt}
]
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [{"role": "user", "content": "Give me a short introduction to large language model."}]
prompts = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
llm = LLM(model=model_id, tensor_parallel_size=number_gpus)
outputs = llm.generate(prompts, sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)Creationpythontransformers
from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model_stub = "Qwen/Qwen3-8B"
model_name = model_stub.split("/")[-1]
model = AutoModelForCausalLM.from_pretrained(model_stub)
tokenizer = AutoTokenizer.from_pretrained(model_stub)
# Configure the quantization algorithm and scheme
recipe = QuantizationModifier(
ignore=["lm_head"],
targets="Linear",
scheme="FP8_dynamic",
)
# Apply quantization
oneshot(
model=model,
recipe=recipe,
)
# Save to disk in compressed-tensors format
save_path = model_name + "-FP8-dynamic"
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)
print(f"Model and tokenizer saved to: {save_path}")Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.