OpenLLM v2 Average Leaderboard
Average score across OpenLLM Leaderboard v2 benchmarks
Why This Matters
Holistic performance metric - best single indicator of overall model capability
Good Scores
50%+ is capable, 60%+ is strong, 70%+ is state-of-the-art
Use Cases
- •General comparison
- •Model selection
- •Performance tracking
- •Capability assessment
Peak Score
52.08
Average
21.63
Models Tested
53,438
Median Score
21.73
Efficiency Leaders
Best performance per billion parameters - The smart choices
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
ChatWaifu_v1.4
100.0M params • Score: 25.71
Efficiency
257.07
Performance by Model Size
How different size classes perform on this benchmark
medium
Avg Score: 21.00
large
Avg Score: 29.05
xlarge
Avg Score: 37.18
🏆 Open Source Champions
Top permissively licensed models
📈 Most Downloaded Models
Popularity meets performance
📄 License Analysis
Performance by license type
🔧 Framework Analysis
Performance by framework
About OpenLLM v2 Average
The OpenLLM v2 Average combines scores from multiple benchmarks including MMLU Pro, GPQA, BBH, MATH Lvl5, MuSR, and IFEval to provide a holistic measure of model performance across diverse capabilities.
Test These Models Yourself
Run benchmarks on your own data with these platforms
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIModal
Run this model on serverless GPU
Deploy in seconds with $30 free credits. Pay only for what you use.
Get $30 Free CreditsRunPod
Rent GPU starting at $0.34/hour
Deploy on cloud GPU or serverless. 70% cheaper than AWS.
Start from $0.34/hrDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.
Complete Leaderboard
Top 50 models ranked by OpenLLM v2 Average performance
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
Calme 3.2 Instruct 78b
52.08
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
calme-3.1-instruct-78b
51.29
Score
CalmeRys-78B-Orpo-v0.1
51.23
Score
CalmeRys-78B-Orpo-v0.1
51.23
Score