Qwen3-4B-Thinking-2507-quantized.w4a16
51
license:apache-2.0
by
RedHatAI
Language Model
OTHER
4B params
New
51 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
9GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
4GB+ RAM
Code Examples
text
lm_eval --model local-chat-completions \
--tasks mmlu_pro_chat \
--model_args "model=RedHatAI/Qwen3-4B-Thinking-2507.w4a16,max_length=262144,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=64,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200" \
--num_fewshot 0 \
--apply_chat_template \
--gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,top_k=20,min_p=0.0,max_gen_toks=64000text
lm_eval --model local-chat-completions \
--tasks ifeval \
--model_args "model=RedHatAI/Qwen3-4B-Thinking-2507.w4a16,max_length=262144,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=64,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200" \
--num_fewshot 0 \
--apply_chat_template \
--gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,top_k=20,min_p=0.0,max_gen_toks=64000text
lm_eval --model local-chat-completions \
--tasks gsm8k_platinum_cot_llama \
--model_args "model=RedHatAI/Qwen3-4B-Thinking-2507.w4a16,max_length=262144,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=64,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200" \
--num_fewshot 0 \
--apply_chat_template \
--gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,top_k=20,min_p=0.0,max_gen_toks=64000Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.