Dongwei
Rationalyst_reasoning_datasets
Qwen-2.5-7B
DeepSeek-R1-Distill-Qwen-7B-GRPO
Core purpose is to provide a distilled version of the DeepSeek model. It is based on the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B model and utilizes the DigitalLearningGmbH/MATH-lighteval dataset with the transformers library.
Qwen-2.5-7B_Base_Math_smalllr
DeepSeek-R1-Distill-Qwen-1.5B-GRPO_Math
Model Card for DeepSeek-R1-Distill-Qwen-1.5B-GRPOMath This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1 - Datasets: 3.2.0 - Tokenizers: 0.21.0
Qwen-2.5-7B_Base_Math_smallestlr
Qwen-2.5-7B_Base_Math_smallestlr_newdata
Model Card for Qwen-2.5-7BBaseMathsmallestlrnewdata This model is a fine-tuned version of Qwen/Qwen2.5-Math-7B on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1+cu121 - Datasets: 3.2.0 - Tokenizers: 0.21.0
Qwen-2.5-7B_Base_Math_smalllr_longer
DeepSeek-R1-Distill-Qwen-7B-GRPO_Math
Model Card for DeepSeek-R1-Distill-Qwen-7B-GRPOMath This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1+cu121 - Datasets: 3.2.0 - Tokenizers: 0.21.0
Qwen2.5-1.5B-Open-R1-GRPO_Math
This model is a fine-tuned version of Qwen/Qwen2.5-Math-1.5B on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1 - Datasets: 3.2.0 - Tokenizers: 0.21.0
DeepSeek-R1-Distill-Qwen-7B-GRPO_Math_lowlr
Qwen2.5-1.5B-Open-R1-GRPO_Math_smalllr
Qwen2.5-1.5B-Open-R1-GRPO
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1+cu121 - Datasets: 3.2.0 - Tokenizers: 0.21.0