Thrillcrazyer
Qwen-1.5B_GSPO
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1
Qwen-1.5B_THIP
Qwen-1.5B_GRPO
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1
Qwen-1.5B_THIP_GRPO
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.23.1 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1
QWEN7_THIP
TACReward7B
Qwen-1.5B_THIP2
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1
Qwen-1.5B_DRGRPO
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1
Qwen-7B_SFT
Qwen-7B_THIP
Qwen-7B_THIP_1204
Qwen3-1.7B_ver0.2_5000
QWEN3-1.7B_LDS
QWEN3-8B_LDS_ver2
PHI4_LDS
Qwen-1.5B_SFT
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been trained using TRL. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1
Qwen-1.5B_THIP_1204
QWEN3-1.7B_LDS_ver2
LDS_4B
Qwen-1.5B_THIP_GSPO_PMREWARD
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.22.2 - Transformers: 4.55.4 - Pytorch: 2.7.1 - Datasets: 3.6.0 - Tokenizers: 0.21.4