Ziyao1010

2 models • 1 total models in database

Sort by:

FL Qwen2.5 1.5B Math Demo

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 4.1.1 - Tokenizers: 0.21.4

NaNK

—

Qwen2.5-1.5B-Open-R1-GRPO-Math

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 4.1.1 - Tokenizers: 0.21.4

NaNK

—