IFEval Leaderboard
Instruction Following Evaluation
Why This Matters
Reliability in following instructions - crucial for automation and production systems
Good Scores
70%+ is reliable, 80%+ is very dependable, 85%+ is production-ready
Use Cases
- •Automated workflows
- •Code generation
- •Document formatting
- •API integrations
Peak Score
89.98
Average
45.64
Models Tested
53,438
Median Score
45.48
Efficiency Leaders
Best performance per billion parameters - The smart choices
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
ChatWaifu_v1.4
100.0M params • Score: 56.91
Efficiency
569.06
Performance by Model Size
How different size classes perform on this benchmark
medium
Avg Score: 48.46
large
Avg Score: 51.80
xlarge
Avg Score: 64.33
🏆 Open Source Champions
Top permissively licensed models
📈 Most Downloaded Models
Popularity meets performance
📄 License Analysis
Performance by license type
🔧 Framework Analysis
Performance by framework
About IFEval
IFEval tests a model's ability to follow specific instructions precisely, including formatting requirements, length constraints, and structural specifications. Critical for real-world application reliability.
Test These Models Yourself
Run benchmarks on your own data with these platforms
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIModal
Run this model on serverless GPU
Deploy in seconds with $30 free credits. Pay only for what you use.
Get $30 Free CreditsRunPod
Rent GPU starting at $0.34/hour
Deploy on cloud GPU or serverless. 70% cheaper than AWS.
Start from $0.34/hrDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.
Complete Leaderboard
Top 50 models ranked by IFEval performance
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.3-70B-Instruct
89.98
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
Llama-3.1-70B-Instruct
86.69
Score
calme-2.1-qwen2.5-72b
86.62
Score
calme-2.1-qwen2.5-72b
86.62
Score