290,000+ Benchmark Results

AI Model Benchmarks

Compare performance across industry-standard benchmarks. Updated daily with latest scores from 40,000+ models.

MMLU Pro

Professional knowledge & reasoning across 57 subjects

View Rankings →

Arena ELO

Human preference rankings from 500K+ votes

View Rankings →

BIG-Bench Hard

Complex reasoning tasks beyond average human performance

View Rankings →

GPQA

Graduate-level expert knowledge evaluation

View Rankings →

MATH Level 5

Competition-level mathematics problems

View Rankings →

MuSR

Multistep soft reasoning benchmark

View Rankings →

IFEval

Instruction following precision

View Rankings →

OpenLLM v2 Avg

Aggregate score across all benchmarks

View Rankings →

Why Benchmarks Matter

Benchmarks provide objective, reproducible metrics for comparing AI models across different architectures, sizes, and training approaches. We aggregate scores from trusted evaluators to help you make informed decisions.

What We Track

✓Official benchmark scores with sources
✓Daily updates from major leaderboards
✓Historical performance tracking

Our Approach

✓No composite scores - see raw performance
✓Context matters - licensing, pricing, size
✓Quantization-aware comparisons