Yukang

34 models • 1 total models in database

Sort by:

Qwen2.5-3B-Open-R1-Code-GRPO

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on the open-r1/verifiable-coding-problems-python dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1

NaNK

—

Llama-2-7b-longlora-32k

NaNK

llama

Qwen2.5-7B-Open-R1-GRPO

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the open-r1/OpenR1-Math-220k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1

NaNK

—

Qwen2.5-14B-Open-R1-GRPO

NaNK

—

Llama-2-70b-chat-longlora-32k

NaNK

llama

Qwen2.5-3B-Open-R1-GRPO

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on the open-r1/OpenR1-Math-220k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1

NaNK

—

Llama-2-70b-longlora-32k

NaNK

llama

Llama-2-13b-longlora-32k

NaNK

llama

LongAlpaca-7B-16k

NaNK

llama

Llama-2-7b-longlora-8k-ft

NaNK

llama

Llama-2-13b-longlora-16k

NaNK

llama

Llama-2-13b-longlora-18k-ft

NaNK

llama

zephyr-7b-sft-full

NaNK

—

Qwen2.5-32B-Open-R1-GRPO

This model is a fine-tuned version of None. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.21.0 - Transformers: 4.52.3 - Pytorch: 2.7.0 - Datasets: 3.6.0 - Tokenizers: 0.21.4

NaNK

—