Yukang

34 models • 1 total models in database
Sort by:

Llama-2-7b-longlora-32k-ft

NaNK
llama
1,245
5

LongAlpaca-70B

NaNK
llama
699
21

LongAlpaca-7B

NaNK
llama
660
15

Llama-2-13b-chat-longlora-32k-sft

NaNK
llama
644
22

Llama-2-13b-longlora-64k

NaNK
llama
640
10

Llama-2-7b-longlora-100k-ft

NaNK
llama
634
52

Llama-2-70b-chat-longlora-32k-sft

NaNK
llama
633
10

Llama-2-13b-longlora-16k-ft

NaNK
llama
628
3

Llama-2-7b-longlora-16k-ft

NaNK
llama
626
2

LongAlpaca-13B

NaNK
llama
624
14

Llama 2 13b Longlora 32k Ft

NaNK
llama
537
10

LongAlpaca-13B-16k

NaNK
llama
252
4

LongAlpaca-70B-16k

NaNK
llama
60
2

Qwen2.5-3B-Open-R1-Code-GRPO

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on the open-r1/verifiable-coding-problems-python dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1

NaNK
12
0

Llama-2-7b-longlora-32k

NaNK
llama
8
7

Qwen2.5-7B-Open-R1-GRPO

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the open-r1/OpenR1-Math-220k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1

NaNK
5
0

Qwen2.5-14B-Open-R1-GRPO

NaNK
5
0

Llama-2-70b-chat-longlora-32k

NaNK
llama
3
9

Qwen2.5-3B-Open-R1-GRPO

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on the open-r1/OpenR1-Math-220k dataset. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.18.0 - Transformers: 4.52.3 - Pytorch: 2.6.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1

NaNK
3
0

Llama-2-70b-longlora-32k

NaNK
llama
2
18

Llama-2-13b-longlora-32k

NaNK
llama
2
5

LongAlpaca-7B-16k

NaNK
llama
2
5

Llama-2-7b-longlora-8k-ft

NaNK
llama
2
3

Llama-2-13b-longlora-16k

NaNK
llama
2
2

Llama-2-13b-longlora-18k-ft

NaNK
llama
2
0

zephyr-7b-sft-full

NaNK
2
0

Qwen2.5-32B-Open-R1-GRPO

This model is a fine-tuned version of None. It has been trained using TRL. This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. - TRL: 0.21.0 - Transformers: 4.52.3 - Pytorch: 2.7.0 - Datasets: 3.6.0 - Tokenizers: 0.21.4

NaNK
2
0

zephyr-7b-dpo-full

NaNK
1
1

LongAlpaca-70B-lora

NaNK
0
8

Llama-2-7b-longlora-8k

NaNK
llama
0
5

Llama-2-7b-longlora-16k

NaNK
llama
0
2

Llama-2-13b-longlora-8k

NaNK
llama
0
2

Llama-2-13b-longlora-8k-ft

NaNK
llama
0
2

FocalsConv

0
1