yueqis

101 models • 6 total models in database
Sort by:

web-qwen-coder-7b-3epochs-30k-5e-5

NaNK
llama-factory
536
0

Web Qwen Coder 32b 2epochs 30k 5e 5

NaNK
529
1

Web Qwen Coder 14b 3epochs 25k 5e 5

This model is a fine-tuned version of yueqis/web-qwen-coder-14b-3epochs-25k-5e-5 on the web dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 64 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.57.1 - Pytorch 2.8.0+cu128 - Datasets 4.0.0 - Tokenizers 0.22.1

NaNK
llama-factory
405
1

Web Qwen Coder 32b 3epochs 30k 5e 5

NaNK
371
1

non_web_sweagent-qwen-coder-32b-3epochs-20k-5e-5

NaNK
360
0

full-qwen-coder-32b-3epochs-28k-5e-5

NaNK
330
0

Sweagent Qwen Coder 32b 3epochs 32k 5e 5

NaNK
llama-factory
249
1

Web Qwen Coder 32b 1epoch 30k 5e 5

NaNK
232
1

swe_only-qwen-coder-32b-3epochs-20k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-32B-Instruct on the sweonly dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
224
0

non_web-qwen-coder-32b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-32B-Instruct on the nonweb dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
218
0

swe_only_sweagent-qwen-coder-32b-3epochs-20k-5e-5

NaNK
182
0

non_web_sweagent-qwen-coder-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the nonwebsweagent dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
168
0

swe_only_original-qwen-coder-7b-3epochs-30k-5e-5

NaNK
llama-factory
163
0

web-qwen-coder-7b-1epoch-30k-5e-5

NaNK
158
0

non_web-qwen-coder-32b-3epochs-20k-5e-5

NaNK
149
0

non_web-qwen-coder-7b-3epochs-30k-5e-5

NaNK
llama-factory
142
0

web-qwen-coder-14b-2epochs-25k-5e-5

NaNK
134
0

web-qwen-coder-14b-1epoch-25k-5e-5

NaNK
132
0

non_web-qwen-coder-14b-3epochs-25k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-14B-Instruct on the nonweb dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
125
0

non_web_sweagent-qwen-coder-14b-3epochs-24k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-14B-Instruct on the nonwebsweagent dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
123
0

Non Web Mcp Qwen Coder 7b 1epoch 30k 5e 5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the nonwebmcp dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 64 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.57.1 - Pytorch 2.8.0+cu128 - Datasets 4.0.0 - Tokenizers 0.22.1

NaNK
llama-factory
118
1

web-qwen-coder-7b-2epochs-30k-5e-5

NaNK
114
0

swe_only_original-qwen-coder-14b-3epochs-25k-5e-5

NaNK
llama-factory
113
0

swe_only_sweagent-qwen-coder-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the sweonlysweagent dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
111
0

swe_only_original_5k-qwen-coder-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the sweonlyoriginal5k dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
104
0

non_web-qwen3-8b-3epochs-25k-5e-5

This model is a fine-tuned version of Qwen/Qwen3-8B on the nonweb dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
94
0

openhands-llama3_nemotron_49b-1epochs-30k-5e-5

NaNK
92
0

swe_only-qwen-coder-7b-3epochs-30k-5e-5

NaNK
llama-factory
84
0

full_sft_non_web-qwen3-8b-25k-1e-5

This model is a fine-tuned version of Qwen/Qwen3-8B on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.3317 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
17
0

full_sft_v0.4_args

This model is a fine-tuned version of Qwen/Qwen3-8B on the fullsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3446 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 32 - totalevalbatchsize: 8 - optimizer: Use adamwtorchfused with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.3427 | 0.4049 | 1000 | 0.3676 | | 0.3254 | 0.8097 | 2000 | 0.3476 | - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
14
0

swe_original-qwen-7b-30k

NaNK
llama-factory
14
0

unified-agent-sample-30k-4epoch

13
0

go-browse-wa-final-2epochs-2e05lr-len24k

This model is a fine-tuned version of Qwen/Qwen3-8B on the go-browse-wa dataset. It achieves the following results on the evaluation set: - Loss: 0.3853 The following hyperparameters were used during training: - learningrate: 2e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 2 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
13
0

go-browse-trim-2epochs-2e05lr-len24k

This model is a fine-tuned version of Qwen/Qwen3-8B on the go-browse dataset. It achieves the following results on the evaluation set: - Loss: 0.1270 The following hyperparameters were used during training: - learningrate: 2e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 2 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
13
0

unified-agent-sample-30k-3epoch

11
0

go-browse-trim-2epochs-mask_history-2e05lr-len24k

NaNK
llama-factory
11
0

full_sft_v0_4

This model is a fine-tuned version of Qwen/Qwen3-8B on the v0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3654 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
11
0

full_sft_v0_4_14B

This model is a fine-tuned version of Qwen/Qwen3-14B on the fullsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.2949 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 32 - totalevalbatchsize: 8 - optimizer: Use adamwtorchfused with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.3214 | 0.4049 | 1000 | 0.3258 | | 0.3208 | 0.8097 | 2000 | 0.2988 | - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
11
0

non_web_sft_v0_4_qwen3_8B_24K

This model is a fine-tuned version of Qwen/Qwen3-8B on the nonwebsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3102 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
11
0

non_web_sft_v0_4_qwen_8B_24K

NaNK
llama-factory
11
0

swe_only_sweagent-qwen3-8b

This model is a fine-tuned version of Qwen/Qwen3-8B on the sweonlysweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.2691 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
11
0

full_sft_sweagent-qwen3-8b

NaNK
llama-factory
11
0

swe_only_think-qwen3-8b

NaNK
llama-factory
11
0

full_sft_v0_4_mcp-qwen3-8b

NaNK
llama-factory
11
0

swe_only_mcp-qwen3-8b

This model is a fine-tuned version of Qwen/Qwen3-8B on the sweonlymcp dataset. It achieves the following results on the evaluation set: - Loss: 0.2769 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
11
0

unified-agent-swe-mix-sample

This model is a fine-tuned version of Qwen/Qwen3-8B on the swe-mix-sample dataset. It achieves the following results on the evaluation set: - Loss: 0.1673 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
10
0

swe_only-qwen-7b-30k

NaNK
llama-factory
10
0

swe_only-qwen-7b-30k_1

NaNK
llama-factory
8
0

swe_only_sweagent-qwen-7b-30k

NaNK
llama-factory
7
0

go-browse-trim-2epochs-2e05lr-len24k-ckpt100

6
0

full_sft_sweagent-qwen-7b-30k

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftsweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.2499 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
6
0

swe_only_sweagent_func_thoughts-qwen-7b-30k

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the sweonlysweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.2917 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
6
0

unified-agent-5epoch

NaNK
llama-factory
5
0

unified-agent-sample-30k-2epoch

5
0

go-browse-wa-len24k-grad4-lr2e-5-2epochs

NaNK
llama-factory
5
0

unified-agent-agenttuning-sample

4
0

web-sample

This model is a fine-tuned version of Qwen/Qwen3-8B on the web-sample dataset. It achieves the following results on the evaluation set: - Loss: 0.4692 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

Qwen3-8B-30K-sft-nopacking

This model is a fine-tuned version of Qwen/Qwen3-8B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 8 - totaltrainbatchsize: 128 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

unified-agent-SWE-smith_5kTrajectories-3epochs

This model is a fine-tuned version of Qwen/Qwen3-8B on the SWE-smith5kTrajectories dataset. It achieves the following results on the evaluation set: - Loss: 0.1589 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

Qwen3-14B-30K-sft-nopacking-lr5e-6

NaNK
llama-factory
4
0

qwen8b-web-long-sample

NaNK
llama-factory
4
0

Qwen3-14B-30K-sft-nopacking-lr1e-5-single-node

This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 64 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

Qwen3-14B-30K-sft-nopacking-lr1e-5-single-node-val-split

Qwen3-14B-30K-sft-nopacking-lr1e-5-single-node-val-split This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. It achieves the following results on the evaluation set: - Loss: 0.3302 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

v0_3-len24k-sample-80k

This model is a fine-tuned version of Qwen/Qwen3-8B on the v0.2 dataset. It achieves the following results on the evaluation set: - Loss: 0.3129 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

swe_original-qwen-7b-25k-3epochs-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the sweoriginal dataset. It achieves the following results on the evaluation set: - Loss: 0.0027 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

full_sft_non_web-qwen-7b-25k-3epochs-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.2291 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

full_sft_non_web-qwen-7b-25k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.2784 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

full_sft-qwen-7b-25k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3212 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

full_sft_mcp_10k-qwen-7b-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftmcp dataset. It achieves the following results on the evaluation set: - Loss: 0.4245 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

full_assistant_owl-qwen-7b-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullassistantowl dataset. It achieves the following results on the evaluation set: - Loss: 0.0646 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

full_sft_non_web-qwen-coder-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.2527 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
4
0

unified-agent-sample

NaNK
llama-factory
3
1

unified-agent-30k

3
0

unified-agent-sample-60k

NaNK
llama-factory
3
0

unified-agent-sample-80k

This model is a fine-tuned version of Qwen/Qwen3-8B on the sample-80k dataset. It achieves the following results on the evaluation set: - Loss: 0.3979 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

Qwen3-14B-30K-sft

This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 64 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 3.6.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

Qwen3-8B-30K-sft

This model is a fine-tuned version of Qwen/Qwen3-8B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 2 - totaltrainbatchsize: 32 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 3.6.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

unified-agent-code-mix-sample

This model is a fine-tuned version of Qwen/Qwen3-8B on the code-mix-sample dataset. It achieves the following results on the evaluation set: - Loss: 0.3868 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

unified-agent-SWE-smith_5kTrajectories-2epochs

NaNK
llama-factory
3
0

unified-agent-SWE-smith_5kTrajectories

This model is a fine-tuned version of Qwen/Qwen3-8B on the SWE-smith5kTrajectories dataset. It achieves the following results on the evaluation set: - Loss: 0.2062 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

qwen14b-sample-30k

NaNK
llama-factory
3
0

web-long-sample

NaNK
llama-factory
3
0

go-browse-wa

This model is a fine-tuned version of Qwen/Qwen3-8B on the go-browse-wa dataset. It achieves the following results on the evaluation set: - Loss: 0.4563 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

go-browse-wa-2epochs

NaNK
llama-factory
3
0

unified_agent_data_30k_filtered_polished

This model is a fine-tuned version of Qwen/Qwen3-8B on the unifiedagentdata30kfilteredpolished dataset. It achieves the following results on the evaluation set: - Loss: 0.4546 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

nnetnav-2epochs

This model is a fine-tuned version of Qwen/Qwen3-8B on the nnetnav-live and the nnetnav-wa datasets. It achieves the following results on the evaluation set: - Loss: 0.9091 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 2 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

swe_original-qwen-7b-25k

NaNK
llama-factory
3
0

swe_original-qwen-7b-30k-3epochs-5e-5

NaNK
llama-factory
3
0

full_user_owl-qwen-7b-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fulluserowl dataset. It achieves the following results on the evaluation set: - Loss: 0.3807 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

non_web_mcp-qwen3-8b-28k-5e-5

This model is a fine-tuned version of Qwen/Qwen3-8B on the nonwebmcp dataset. It achieves the following results on the evaluation set: - Loss: 0.2480 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
3
0

Qwen3-14B-30K-sft-nopacking-lr1e-6

NaNK
llama-factory
2
0

web-sample-filtered-len24k

This model is a fine-tuned version of Qwen/Qwen3-8B on the web-sample-filtered dataset. It achieves the following results on the evaluation set: - Loss: 0.5447 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
2
0

swe_only_sweagent-qwen-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the sweonlysweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.0843 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
2
0

full_sft_non_web-qwen-7b-3epochs-30k-5e-5

NaNK
llama-factory
2
0

full_sft_mcp-qwen-7b-30k-5e-5

NaNK
llama-factory
2
0

full_sft_mcp_1k-qwen-7b-30k-5e-5

NaNK
llama-factory
2
0

Qwen3-14B-30K-sft-nopacking

This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 8 - totaltrainbatchsize: 128 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1

NaNK
llama-factory
1
0

go-browse-wa-2epochs-2.5

NaNK
llama-factory
1
0

unified_agent_data_30k_filtered_unpolished

NaNK
llama-factory
1
0

full_sft_non_web-qwen-7b-25k-2epochs-5e-5

NaNK
1
0

agenttuning-qwen3-8b-16k-1e-5

This model is a fine-tuned version of Qwen/Qwen3-8B on the agenttuning dataset. It achieves the following results on the evaluation set: - Loss: 0.3331 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

NaNK
llama-factory
1
0