Ziyao1010
2 models • 1 total models in database
Sort by:
FL Qwen2.5 1.5B Math Demo
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 4.1.1 - Tokenizers: 0.21.4
NaNK
—
89
2
Qwen2.5-1.5B-Open-R1-GRPO-Math
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 4.1.1 - Tokenizers: 0.21.4
NaNK
—
3
0