yueqis

536

Web Qwen Coder 32b 2epochs 30k 5e 5

529

Web Qwen Coder 14b 3epochs 25k 5e 5

This model is a fine-tuned version of yueqis/web-qwen-coder-14b-3epochs-25k-5e-5 on the web dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 64 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.57.1 - Pytorch 2.8.0+cu128 - Datasets 4.0.0 - Tokenizers 0.22.1

405

swe_only-qwen-coder-32b-3epochs-20k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-32B-Instruct on the sweonly dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

224

non_web-qwen-coder-32b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-32B-Instruct on the nonweb dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

218

swe_only_sweagent-qwen-coder-32b-3epochs-20k-5e-5

182

non_web_sweagent-qwen-coder-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the nonwebsweagent dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

168

swe_only_original-qwen-coder-7b-3epochs-30k-5e-5

163

web-qwen-coder-7b-1epoch-30k-5e-5

158

non_web-qwen-coder-32b-3epochs-20k-5e-5

149

non_web-qwen-coder-7b-3epochs-30k-5e-5

142

web-qwen-coder-14b-2epochs-25k-5e-5

134

web-qwen-coder-14b-1epoch-25k-5e-5

132

non_web-qwen-coder-14b-3epochs-25k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-14B-Instruct on the nonweb dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

125

non_web_sweagent-qwen-coder-14b-3epochs-24k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-14B-Instruct on the nonwebsweagent dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

123

Non Web Mcp Qwen Coder 7b 1epoch 30k 5e 5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the nonwebmcp dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 64 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.57.1 - Pytorch 2.8.0+cu128 - Datasets 4.0.0 - Tokenizers 0.22.1

118

web-qwen-coder-7b-2epochs-30k-5e-5

114

swe_only_original-qwen-coder-14b-3epochs-25k-5e-5

113

swe_only_sweagent-qwen-coder-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the sweonlysweagent dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

111

swe_only_original_5k-qwen-coder-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the sweonlyoriginal5k dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

104

non_web-qwen3-8b-3epochs-25k-5e-5

This model is a fine-tuned version of Qwen/Qwen3-8B on the nonweb dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

openhands-llama3_nemotron_49b-1epochs-30k-5e-5

swe_only-qwen-coder-7b-3epochs-30k-5e-5

full_sft_non_web-qwen3-8b-25k-1e-5

This model is a fine-tuned version of Qwen/Qwen3-8B on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.3317 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_sft_v0.4_args

This model is a fine-tuned version of Qwen/Qwen3-8B on the fullsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3446 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 32 - totalevalbatchsize: 8 - optimizer: Use adamwtorchfused with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.3427 | 0.4049 | 1000 | 0.3676 | | 0.3254 | 0.8097 | 2000 | 0.3476 | - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

swe_original-qwen-7b-30k

unified-agent-sample-30k-4epoch

go-browse-wa-final-2epochs-2e05lr-len24k

This model is a fine-tuned version of Qwen/Qwen3-8B on the go-browse-wa dataset. It achieves the following results on the evaluation set: - Loss: 0.3853 The following hyperparameters were used during training: - learningrate: 2e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 2 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

go-browse-trim-2epochs-2e05lr-len24k

This model is a fine-tuned version of Qwen/Qwen3-8B on the go-browse dataset. It achieves the following results on the evaluation set: - Loss: 0.1270 The following hyperparameters were used during training: - learningrate: 2e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 2 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

unified-agent-sample-30k-3epoch

go-browse-trim-2epochs-mask_history-2e05lr-len24k

full_sft_v0_4

This model is a fine-tuned version of Qwen/Qwen3-8B on the v0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3654 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_sft_v0_4_14B

This model is a fine-tuned version of Qwen/Qwen3-14B on the fullsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.2949 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 32 - totalevalbatchsize: 8 - optimizer: Use adamwtorchfused with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.3214 | 0.4049 | 1000 | 0.3258 | | 0.3208 | 0.8097 | 2000 | 0.2988 | - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

non_web_sft_v0_4_qwen3_8B_24K

This model is a fine-tuned version of Qwen/Qwen3-8B on the nonwebsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3102 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

non_web_sft_v0_4_qwen_8B_24K

swe_only_sweagent-qwen3-8b

This model is a fine-tuned version of Qwen/Qwen3-8B on the sweonlysweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.2691 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_sft_sweagent-qwen3-8b

swe_only_think-qwen3-8b

full_sft_v0_4_mcp-qwen3-8b

swe_only_mcp-qwen3-8b

This model is a fine-tuned version of Qwen/Qwen3-8B on the sweonlymcp dataset. It achieves the following results on the evaluation set: - Loss: 0.2769 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

unified-agent-swe-mix-sample

This model is a fine-tuned version of Qwen/Qwen3-8B on the swe-mix-sample dataset. It achieves the following results on the evaluation set: - Loss: 0.1673 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

swe_only-qwen-7b-30k

swe_only-qwen-7b-30k_1

swe_only_sweagent-qwen-7b-30k

go-browse-trim-2epochs-2e05lr-len24k-ckpt100

full_sft_sweagent-qwen-7b-30k

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftsweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.2499 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

swe_only_sweagent_func_thoughts-qwen-7b-30k

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the sweonlysweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.2917 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

unified-agent-5epoch

unified-agent-sample-30k-2epoch

go-browse-wa-len24k-grad4-lr2e-5-2epochs

unified-agent-agenttuning-sample

web-sample

This model is a fine-tuned version of Qwen/Qwen3-8B on the web-sample dataset. It achieves the following results on the evaluation set: - Loss: 0.4692 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

Qwen3-8B-30K-sft-nopacking

This model is a fine-tuned version of Qwen/Qwen3-8B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 8 - totaltrainbatchsize: 128 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1

unified-agent-SWE-smith_5kTrajectories-3epochs

This model is a fine-tuned version of Qwen/Qwen3-8B on the SWE-smith5kTrajectories dataset. It achieves the following results on the evaluation set: - Loss: 0.1589 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

Qwen3-14B-30K-sft-nopacking-lr5e-6

qwen8b-web-long-sample

Qwen3-14B-30K-sft-nopacking-lr1e-5-single-node

This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 64 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1

Qwen3-14B-30K-sft-nopacking-lr1e-5-single-node-val-split

Qwen3-14B-30K-sft-nopacking-lr1e-5-single-node-val-split This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. It achieves the following results on the evaluation set: - Loss: 0.3302 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1

v0_3-len24k-sample-80k

This model is a fine-tuned version of Qwen/Qwen3-8B on the v0.2 dataset. It achieves the following results on the evaluation set: - Loss: 0.3129 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

swe_original-qwen-7b-25k-3epochs-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the sweoriginal dataset. It achieves the following results on the evaluation set: - Loss: 0.0027 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_sft_non_web-qwen-7b-25k-3epochs-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.2291 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_sft_non_web-qwen-7b-25k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.2784 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_sft-qwen-7b-25k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3212 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_sft_mcp_10k-qwen-7b-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftmcp dataset. It achieves the following results on the evaluation set: - Loss: 0.4245 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_assistant_owl-qwen-7b-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullassistantowl dataset. It achieves the following results on the evaluation set: - Loss: 0.0646 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_sft_non_web-qwen-coder-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.2527 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

unified-agent-sample

unified-agent-30k

unified-agent-sample-60k

unified-agent-sample-80k

This model is a fine-tuned version of Qwen/Qwen3-8B on the sample-80k dataset. It achieves the following results on the evaluation set: - Loss: 0.3979 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

Qwen3-14B-30K-sft

This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 64 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 3.6.0 - Tokenizers 0.21.1

Qwen3-8B-30K-sft

This model is a fine-tuned version of Qwen/Qwen3-8B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 2 - totaltrainbatchsize: 32 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 3.6.0 - Tokenizers 0.21.1

unified-agent-code-mix-sample

This model is a fine-tuned version of Qwen/Qwen3-8B on the code-mix-sample dataset. It achieves the following results on the evaluation set: - Loss: 0.3868 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

unified-agent-SWE-smith_5kTrajectories-2epochs

unified-agent-SWE-smith_5kTrajectories

This model is a fine-tuned version of Qwen/Qwen3-8B on the SWE-smith5kTrajectories dataset. It achieves the following results on the evaluation set: - Loss: 0.2062 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

qwen14b-sample-30k

web-long-sample

go-browse-wa

This model is a fine-tuned version of Qwen/Qwen3-8B on the go-browse-wa dataset. It achieves the following results on the evaluation set: - Loss: 0.4563 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

go-browse-wa-2epochs

unified_agent_data_30k_filtered_polished

This model is a fine-tuned version of Qwen/Qwen3-8B on the unifiedagentdata30kfilteredpolished dataset. It achieves the following results on the evaluation set: - Loss: 0.4546 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

nnetnav-2epochs

This model is a fine-tuned version of Qwen/Qwen3-8B on the nnetnav-live and the nnetnav-wa datasets. It achieves the following results on the evaluation set: - Loss: 0.9091 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 2 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

swe_original-qwen-7b-25k

swe_original-qwen-7b-30k-3epochs-5e-5

full_user_owl-qwen-7b-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fulluserowl dataset. It achieves the following results on the evaluation set: - Loss: 0.3807 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

non_web_mcp-qwen3-8b-28k-5e-5

This model is a fine-tuned version of Qwen/Qwen3-8B on the nonwebmcp dataset. It achieves the following results on the evaluation set: - Loss: 0.2480 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

Qwen3-14B-30K-sft-nopacking-lr1e-6

web-sample-filtered-len24k

This model is a fine-tuned version of Qwen/Qwen3-8B on the web-sample-filtered dataset. It achieves the following results on the evaluation set: - Loss: 0.5447 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

swe_only_sweagent-qwen-7b-3epochs-30k-5e-5

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the sweonlysweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.0843 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1

full_sft_non_web-qwen-7b-3epochs-30k-5e-5

full_sft_mcp-qwen-7b-30k-5e-5

full_sft_mcp_1k-qwen-7b-30k-5e-5

Qwen3-14B-30K-sft-nopacking

This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 8 - totaltrainbatchsize: 128 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1

go-browse-wa-2epochs-2.5

unified_agent_data_30k_filtered_unpolished

full_sft_non_web-qwen-7b-25k-2epochs-5e-5

agenttuning-qwen3-8b-16k-1e-5

This model is a fine-tuned version of Qwen/Qwen3-8B on the agenttuning dataset. It achieves the following results on the evaluation set: - Loss: 0.3331 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1