GPQA Leaderboard
Graduate-Level Google-Proof Q&A Benchmark
Why This Matters
Graduate-level scientific knowledge - essential for research and specialized domains
Good Scores
35%+ is good (experts score ~65%), 45%+ is excellent, 50%+ is exceptional
Use Cases
- •Scientific research tools
- •Technical documentation
- •Academic assistance
- •Expert systems
Peak Score
29.42
Average
6.57
Models Tested
53,438
Median Score
5.93
Efficiency Leaders
Best performance per billion parameters - The smart choices
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
ChatWaifu_v1.4
100.0M params • Score: 7.61
Efficiency
76.06
Performance by Model Size
How different size classes perform on this benchmark
large
Avg Score: 10.72
🏆 Open Source Champions
Top permissively licensed models
📈 Most Downloaded Models
Popularity meets performance
📄 License Analysis
Performance by license type
🔧 Framework Analysis
Performance by framework
About GPQA
GPQA is a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. Questions are designed to be difficult for laypersons but answerable by experts in the field.
Test These Models Yourself
Run benchmarks on your own data with these platforms
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIModal
Run this model on serverless GPU
Deploy in seconds with $30 free credits. Pay only for what you use.
Get $30 Free CreditsRunPod
Rent GPU starting at $0.34/hour
Deploy on cloud GPU or serverless. 70% cheaper than AWS.
Start from $0.34/hrDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.
Complete Leaderboard
Top 50 models ranked by GPQA performance
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-MS-Nevoria-70b
29.42
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
L3.3-Nevoria-R1-70b
29.19
Score
70B-L3.3-Cirrus-x1
26.62
Score
70B-L3.3-Cirrus-x1
26.62
Score