290,000+ Benchmark Results
AI Model Benchmarks
Compare performance across industry-standard benchmarks. Updated daily with latest scores from 40,000+ models.
MMLU Pro
Professional knowledge & reasoning across 57 subjects
41,103 models
View Rankings →Arena ELO
Human preference rankings from 500K+ votes
2,391 models
View Rankings →BIG-Bench Hard
Complex reasoning tasks beyond average human performance
41,103 models
View Rankings →GPQA
Graduate-level expert knowledge evaluation
41,103 models
View Rankings →MATH Level 5
Competition-level mathematics problems
41,103 models
View Rankings →MuSR
Multistep soft reasoning benchmark
41,103 models
View Rankings →IFEval
Instruction following precision
41,103 models
View Rankings →OpenLLM v2 Avg
Aggregate score across all benchmarks
41,103 models
View Rankings →Why Benchmarks Matter
Benchmarks provide objective, reproducible metrics for comparing AI models across different architectures, sizes, and training approaches. We aggregate scores from trusted evaluators to help you make informed decisions.
What We Track
- ✓Official benchmark scores with sources
- ✓Daily updates from major leaderboards
- ✓Historical performance tracking
Our Approach
- ✓No composite scores - see raw performance
- ✓Context matters - licensing, pricing, size
- ✓Quantization-aware comparisons