Reasoning

BIG-Bench Hard Leaderboard

Challenging tasks from BIG-Bench that require advanced reasoning

Why This Matters

Measures ability to solve problems that stumped earlier models - indicates true reasoning capability

Good Scores

50%+ is competent, 65%+ is strong, 75%+ is exceptional

Use Cases

  • Complex decision support
  • Strategic planning tools
  • Advanced problem solving
  • Logic-based applications

About BIG-Bench Hard

BIG-Bench Hard (BBH) is a suite of 23 challenging BIG-Bench tasks where prior language models did not outperform average human-rater performance. It tests complex reasoning, world knowledge, and multi-step problem solving.

Last updated: 1/14/2026

Test These Models Yourself

Run benchmarks on your own data with these platforms

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Modal

Run this model on serverless GPU

Most Popular

Deploy in seconds with $30 free credits. Pay only for what you use.

Get $30 Free Credits

RunPod

Rent GPU starting at $0.34/hour

Best Value

Deploy on cloud GPU or serverless. 70% cheaper than AWS.

Start from $0.34/hr

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.

Complete Leaderboard

Top 50 models ranked by BIG-Bench Hard performance

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#4

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#5

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#6

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#7

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#8

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#9

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#10

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#11

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#12

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#13

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#14

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#15

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#16

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#17

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#18

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#19

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#20

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#21

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#22

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#23

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#24

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#25

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#26

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#27

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#28

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#29

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#30

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#31

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#32

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#33

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#34

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#35

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#36

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#37

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#38

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#39

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#40

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#41

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#42

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#43

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#44

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#45

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#46

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#47

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#48

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#49

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score

#50

T3Q Qwen2.5 14b V1.0 E3

JungZoonalicense:apache-2.0

65.47

Score