Thrillcrazyer

25 models • 2 total models in database
Sort by:

Qwen-1.5B_GSPO

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1

NaNK
488
0

Qwen-1.5B_THIP

NaNK
374
5

Qwen-1.5B_GRPO

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1

NaNK
320
0

Qwen-1.5B_THIP_GRPO

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.23.1 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1

NaNK
314
0

QWEN7_THIP

NaNK
214
0

TACReward7B

NaNK
191
6

Qwen-1.5B_THIP2

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1

NaNK
172
0

Qwen-1.5B_DRGRPO

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1

NaNK
142
0

Qwen-7B_SFT

NaNK
114
0

Qwen-7B_THIP

NaNK
82
6

Qwen-7B_THIP_1204

NaNK
67
0

Qwen3-1.7B_ver0.2_5000

NaNK
49
0

QWEN3-1.7B_LDS

NaNK
45
0

QWEN3-8B_LDS_ver2

NaNK
39
0

PHI4_LDS

28
0

Qwen-1.5B_SFT

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been trained using TRL. - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.8.0+cu128 - Datasets: 4.2.0 - Tokenizers: 0.22.1

NaNK
28
0

Qwen-1.5B_THIP_1204

NaNK
22
0

QWEN3-1.7B_LDS_ver2

NaNK
21
0

LDS_4B

NaNK
license:apache-2.0
12
0

Qwen-1.5B_THIP_GSPO_PMREWARD

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the DeepMath-103k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.22.2 - Transformers: 4.55.4 - Pytorch: 2.7.1 - Datasets: 3.6.0 - Tokenizers: 0.21.4

NaNK
12
0

Qwen4B_ver0.2

NaNK
10
0

QWEN7_GRPO

5
0

Qwen-1.5B_THIP_RLOO

NaNK
4
0

Qwen-7B_TAC_GRPO

NaNK
2
0

Qwen-7B_TAC_GSPO

NaNK
2
0