MuSR Leaderboard
Multistep Soft Reasoning Benchmark
Why This Matters
Multi-step logical reasoning - essential for complex problem-solving and analysis
Good Scores
55%+ is good, 65%+ is strong, 75%+ is excellent
Use Cases
- •Business analytics
- •Strategy development
- •Causal analysis
- •Complex investigations
Peak Score
38.69
Average
9.93
Models Tested
53,438
Median Score
10.15
Efficiency Leaders
Best performance per billion parameters - The smart choices
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
ChatWaifu_v1.4
100.0M params • Score: 20.02
Efficiency
200.25
Performance by Model Size
How different size classes perform on this benchmark
large
Avg Score: 13.53
xlarge
Avg Score: 16.44
🏆 Open Source Champions
Top permissively licensed models
📈 Most Downloaded Models
Popularity meets performance
📄 License Analysis
Performance by license type
🔧 Framework Analysis
Performance by framework
About MuSR
MuSR evaluates complex reasoning requiring multiple steps of inference and soft reasoning across diverse scenarios. Tests model's ability to chain together logical steps.
Test These Models Yourself
Run benchmarks on your own data with these platforms
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIModal
Run this model on serverless GPU
Deploy in seconds with $30 free credits. Pay only for what you use.
Get $30 Free CreditsRunPod
Rent GPU starting at $0.34/hour
Deploy on cloud GPU or serverless. 70% cheaper than AWS.
Start from $0.34/hrDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.
Complete Leaderboard
Top 50 models ranked by MuSR performance
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
T3Q Qwen2.5 14b V1.0 E3
38.69
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
Calme 3.2 Instruct 78b
38.53
Score
calme-3.1-instruct-78b
36.50
Score
calme-3.1-instruct-78b
36.50
Score