Dongwei

13 models • 1 total models in database

Sort by:

Rationalyst_reasoning_datasets

llama

109

Qwen-2.5-7B

NaNK

—

DeepSeek-R1-Distill-Qwen-7B-GRPO

Core purpose is to provide a distilled version of the DeepSeek model. It is based on the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B model and utilizes the DigitalLearningGmbH/MATH-lighteval dataset with the transformers library.

NaNK

—

Qwen-2.5-7B_Base_Math_smalllr

NaNK

—

DeepSeek-R1-Distill-Qwen-1.5B-GRPO_Math

Model Card for DeepSeek-R1-Distill-Qwen-1.5B-GRPOMath This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1 - Datasets: 3.2.0 - Tokenizers: 0.21.0

NaNK

—

Qwen-2.5-7B_Base_Math_smallestlr

NaNK

—

Qwen-2.5-7B_Base_Math_smallestlr_newdata

Model Card for Qwen-2.5-7BBaseMathsmallestlrnewdata This model is a fine-tuned version of Qwen/Qwen2.5-Math-7B on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1+cu121 - Datasets: 3.2.0 - Tokenizers: 0.21.0

NaNK

—

Qwen-2.5-7B_Base_Math_smalllr_longer

NaNK

—

DeepSeek-R1-Distill-Qwen-7B-GRPO_Math

Model Card for DeepSeek-R1-Distill-Qwen-7B-GRPOMath This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1+cu121 - Datasets: 3.2.0 - Tokenizers: 0.21.0

NaNK

—

Qwen2.5-1.5B-Open-R1-GRPO_Math

This model is a fine-tuned version of Qwen/Qwen2.5-Math-1.5B on the DigitalLearningGmbH/MATH-lighteval dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1 - Datasets: 3.2.0 - Tokenizers: 0.21.0

NaNK

—

DeepSeek-R1-Distill-Qwen-7B-GRPO_Math_lowlr

NaNK

—

Qwen2.5-1.5B-Open-R1-GRPO_Math_smalllr

NaNK

—

Qwen2.5-1.5B-Open-R1-GRPO

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.15.0.dev0 - Transformers: 4.49.0.dev0 - Pytorch: 2.5.1+cu121 - Datasets: 3.2.0 - Tokenizers: 0.21.0

NaNK

—