PRIME-RL

7 models • 1 total models in database

Sort by:

Eurus-2-7B-PRIME

P1-30B-A3B

P1: Mastering Physics Olympiads with Reinforcement Learning High-performance mid-scale model for physics reasoning P1-30B-A3B is the mid-size variant of the P1 series, a high-performance open-source language model specialized in physics reasoning. Built on Qwen3-30B-A3B-Thinking-2507 and refined through multi-stage reinforcement learning on curated physics competition data, P1-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems. - 🥈 IPhO 2025 Silver-tier Performance: Strong competitive showing at international physics olympiad (18.5/30 points) - 🥇 HiPhO Excellence: 8 gold medals, 4 silver medals, and 1 bronze medal across 13 physics contests | Model | Score | Medal | |:-----:|:-----:|:-----:| | P1-30B-A3B | 18.5 | 🥈 Silver | | DeepSeek-R1 | 18.5 | 🥈 Silver | | Qwen3-235B-A22B-Thinking-2507 | 17.1 | 🥈 Silver | | Qwen3-30B-A3B-Thinking-2507 | 15.6 | 🥈 Silver | | Category | P1-30B-A3B | Qwen3-235B-A22B | DeepSeek-R1 | Qwen3-30B-A3B (Base) | |:--------:|:----------:|:---------------:|:-----------:|:--------------------:| | Overall Score | 32.5 | 33.5 | 32.9 | 29.9 | | Gold Medals (🥇) | 8 | 10 | 9 | 6 | | Silver Medals (🥈) | 4 | 3 | 3 | 6 | | Bronze Medals (🥉) | 1 | 0 | 1 | 1 | | Total Contests | 13 | 13 | 13 | 13 | Beyond physics reasoning, P1 improves across multiple domains. As shown below, P1-30B-A3B outperforms its base model Qwen3-30B-A3B-Thinking-2507 on math, coding, and STEM benchmarks, demonstrating strong generalization of physics reasoning. | Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench | |:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:| | Qwen3-30B-A3B-Thinking-2507 (Base) | 90.4 | 85.0 | 71.3 | 73.0 | 11.6 | 66.7 | 76.6 | | P1-30B-A3B | 91.0 | 91.0 | 76.9 | 74.4 | 14.3 | 68.1 | 77.0 | We are grateful to the open-source community for their invaluable contributions. Special thanks to: - Qwen3 - for providing the foundational base models that powered our research - slime - for their innovative work on efficient reinforcement learning framework that powered our training pipeline - verl - for the versatile reinforcement learning framework that enabled our training pipeline - sglang - for the efficient LLM serving and inference infrastructure - Megatron-LM - for the large-scale model training framework

NaNK

license:apache-2.0

Eurus-2-7B-SFT

NaNK

license:apache-2.0

P1 235B A22B

P1: Mastering Physics Olympiads with Reinforcement Learning Achieving gold medal at the International Physics Olympiad (IPhO 2025) P1-235B-A22B is the flagship model of the P1 series, a state-of-the-art open-source large language model specialized in physics reasoning. Built on Qwen3-235B-A22B-Thinking-2507 and tuned through multi-stage reinforcement learning on curated physics competition data, P1-235B-A22B marks a historic achievement as the first open-source model to win gold at the International Physics Olympiad (IPhO 2025). - 🏆 IPhO 2025 Gold Medal: First open-source model to achieve gold medal status (21.2/30 points) - 🥇 HiPhO Benchmark Leader: 12 gold medals and 1 silver medal across 13 top international physics contests - 🥇 Overall Champion: When paired with PhysicsMinions multi-agent system, achieves #1 ranking with 38.4 points, surpassing Gemini-2.5-Pro (37.7) and GPT-5 (37.4) | Model | Score | Medal | Rank | |:-----:|:-----:|:-----:|:----:| | P1-235B-A22B + PhysicsMinions | 23.2 | 🥇 Gold | 1st | | Gemini-2.5-Pro | 22.2 | 🥇 Gold | 2nd | | GPT-5 | 22.3 | 🥇 Gold | 3rdh | | P1-235B-A22B | 21.2 | 🥇 Gold | 4th | | Category | P1-235B-A22B | P1-235B-A22B + PhysicsMinions | Gemini-2.5-Pro | GPT-5 | |:--------:|:------------:|:-----------------------------:|:--------------:|:-----:| | Overall Score | 35.9 | 38.4 🏆 | 37.7 | 37.4 | | Gold Medals (🥇) | 12 | 12 | 12 | 11 | | Silver Medals (🥈) | 1 | 1 | 1 | 2 | | Total Contests | 13 | 13 | 13 | 13 | P1-235B-A22B demonstrates excellent general capabilities across various benchmarks. As shown below, P1-235B-A22B achieves better performance than its base model Qwen3-235B-A22B-Thinking-2507 on multiple tasks, further validating the strong generalization of P1 series models. | Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench | |:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:| | Qwen3-235B-A22B-Thinking-2507 (Base) | 94.6 | 94.2 | 81.7 | 79.4 | 17.5 | 76.2 | 80.3 | | P1-235B-A22B | 95.0 | 95.0 | 80.8 | 81.4 | 19.1 | 75.8 | 79.8 | We are grateful to the open-source community for their invaluable contributions. Special thanks to: - Qwen3 - for providing the foundational base models that powered our research - slime - for their innovative work on efficient reinforcement learning framework that powered our training pipeline - verl - for the versatile reinforcement learning framework that enabled our training pipeline - sglang - for the efficient LLM serving and inference infrastructure - Megatron-LM - for the large-scale model training framework

NaNK

license:apache-2.0

PRIME-RL

Eurus-2-7B-PRIME

P1-30B-A3B

Eurus-2-7B-SFT

P1 235B A22B

EurusPRM-Stage2

EurusPRM-Stage1

Eurus-2-7B-PRIME-Zero