yueqis
web-qwen-coder-7b-3epochs-30k-5e-5
Web Qwen Coder 32b 2epochs 30k 5e 5
Web Qwen Coder 14b 3epochs 25k 5e 5
This model is a fine-tuned version of yueqis/web-qwen-coder-14b-3epochs-25k-5e-5 on the web dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 64 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.57.1 - Pytorch 2.8.0+cu128 - Datasets 4.0.0 - Tokenizers 0.22.1
Web Qwen Coder 32b 3epochs 30k 5e 5
non_web_sweagent-qwen-coder-32b-3epochs-20k-5e-5
full-qwen-coder-32b-3epochs-28k-5e-5
Sweagent Qwen Coder 32b 3epochs 32k 5e 5
Web Qwen Coder 32b 1epoch 30k 5e 5
swe_only-qwen-coder-32b-3epochs-20k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-32B-Instruct on the sweonly dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
non_web-qwen-coder-32b-3epochs-30k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-32B-Instruct on the nonweb dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
swe_only_sweagent-qwen-coder-32b-3epochs-20k-5e-5
non_web_sweagent-qwen-coder-7b-3epochs-30k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the nonwebsweagent dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
swe_only_original-qwen-coder-7b-3epochs-30k-5e-5
web-qwen-coder-7b-1epoch-30k-5e-5
non_web-qwen-coder-32b-3epochs-20k-5e-5
non_web-qwen-coder-7b-3epochs-30k-5e-5
web-qwen-coder-14b-2epochs-25k-5e-5
web-qwen-coder-14b-1epoch-25k-5e-5
non_web-qwen-coder-14b-3epochs-25k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-14B-Instruct on the nonweb dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
non_web_sweagent-qwen-coder-14b-3epochs-24k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-14B-Instruct on the nonwebsweagent dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
Non Web Mcp Qwen Coder 7b 1epoch 30k 5e 5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the nonwebmcp dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 64 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.57.1 - Pytorch 2.8.0+cu128 - Datasets 4.0.0 - Tokenizers 0.22.1
web-qwen-coder-7b-2epochs-30k-5e-5
swe_only_original-qwen-coder-14b-3epochs-25k-5e-5
swe_only_sweagent-qwen-coder-7b-3epochs-30k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the sweonlysweagent dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
swe_only_original_5k-qwen-coder-7b-3epochs-30k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the sweonlyoriginal5k dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
non_web-qwen3-8b-3epochs-25k-5e-5
This model is a fine-tuned version of Qwen/Qwen3-8B on the nonweb dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 32 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 512 - totalevalbatchsize: 256 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3.0 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
openhands-llama3_nemotron_49b-1epochs-30k-5e-5
swe_only-qwen-coder-7b-3epochs-30k-5e-5
full_sft_non_web-qwen3-8b-25k-1e-5
This model is a fine-tuned version of Qwen/Qwen3-8B on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.3317 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_sft_v0.4_args
This model is a fine-tuned version of Qwen/Qwen3-8B on the fullsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3446 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 32 - totalevalbatchsize: 8 - optimizer: Use adamwtorchfused with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.3427 | 0.4049 | 1000 | 0.3676 | | 0.3254 | 0.8097 | 2000 | 0.3476 | - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
swe_original-qwen-7b-30k
unified-agent-sample-30k-4epoch
go-browse-wa-final-2epochs-2e05lr-len24k
This model is a fine-tuned version of Qwen/Qwen3-8B on the go-browse-wa dataset. It achieves the following results on the evaluation set: - Loss: 0.3853 The following hyperparameters were used during training: - learningrate: 2e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 2 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
go-browse-trim-2epochs-2e05lr-len24k
This model is a fine-tuned version of Qwen/Qwen3-8B on the go-browse dataset. It achieves the following results on the evaluation set: - Loss: 0.1270 The following hyperparameters were used during training: - learningrate: 2e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 2 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
unified-agent-sample-30k-3epoch
go-browse-trim-2epochs-mask_history-2e05lr-len24k
full_sft_v0_4
This model is a fine-tuned version of Qwen/Qwen3-8B on the v0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3654 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_sft_v0_4_14B
This model is a fine-tuned version of Qwen/Qwen3-14B on the fullsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.2949 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 32 - totalevalbatchsize: 8 - optimizer: Use adamwtorchfused with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.3214 | 0.4049 | 1000 | 0.3258 | | 0.3208 | 0.8097 | 2000 | 0.2988 | - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
non_web_sft_v0_4_qwen3_8B_24K
This model is a fine-tuned version of Qwen/Qwen3-8B on the nonwebsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3102 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
non_web_sft_v0_4_qwen_8B_24K
swe_only_sweagent-qwen3-8b
This model is a fine-tuned version of Qwen/Qwen3-8B on the sweonlysweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.2691 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_sft_sweagent-qwen3-8b
swe_only_think-qwen3-8b
full_sft_v0_4_mcp-qwen3-8b
swe_only_mcp-qwen3-8b
This model is a fine-tuned version of Qwen/Qwen3-8B on the sweonlymcp dataset. It achieves the following results on the evaluation set: - Loss: 0.2769 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
unified-agent-swe-mix-sample
This model is a fine-tuned version of Qwen/Qwen3-8B on the swe-mix-sample dataset. It achieves the following results on the evaluation set: - Loss: 0.1673 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
swe_only-qwen-7b-30k
swe_only-qwen-7b-30k_1
swe_only_sweagent-qwen-7b-30k
go-browse-trim-2epochs-2e05lr-len24k-ckpt100
full_sft_sweagent-qwen-7b-30k
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftsweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.2499 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
swe_only_sweagent_func_thoughts-qwen-7b-30k
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the sweonlysweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.2917 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
unified-agent-5epoch
unified-agent-sample-30k-2epoch
go-browse-wa-len24k-grad4-lr2e-5-2epochs
unified-agent-agenttuning-sample
web-sample
This model is a fine-tuned version of Qwen/Qwen3-8B on the web-sample dataset. It achieves the following results on the evaluation set: - Loss: 0.4692 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
Qwen3-8B-30K-sft-nopacking
This model is a fine-tuned version of Qwen/Qwen3-8B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 8 - totaltrainbatchsize: 128 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1
unified-agent-SWE-smith_5kTrajectories-3epochs
This model is a fine-tuned version of Qwen/Qwen3-8B on the SWE-smith5kTrajectories dataset. It achieves the following results on the evaluation set: - Loss: 0.1589 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
Qwen3-14B-30K-sft-nopacking-lr5e-6
qwen8b-web-long-sample
Qwen3-14B-30K-sft-nopacking-lr1e-5-single-node
This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 64 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1
Qwen3-14B-30K-sft-nopacking-lr1e-5-single-node-val-split
Qwen3-14B-30K-sft-nopacking-lr1e-5-single-node-val-split This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. It achieves the following results on the evaluation set: - Loss: 0.3302 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1
v0_3-len24k-sample-80k
This model is a fine-tuned version of Qwen/Qwen3-8B on the v0.2 dataset. It achieves the following results on the evaluation set: - Loss: 0.3129 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
swe_original-qwen-7b-25k-3epochs-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the sweoriginal dataset. It achieves the following results on the evaluation set: - Loss: 0.0027 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_sft_non_web-qwen-7b-25k-3epochs-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.2291 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_sft_non_web-qwen-7b-25k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.2784 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_sft-qwen-7b-25k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftv0.4 dataset. It achieves the following results on the evaluation set: - Loss: 0.3212 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_sft_mcp_10k-qwen-7b-30k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullsftmcp dataset. It achieves the following results on the evaluation set: - Loss: 0.4245 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_assistant_owl-qwen-7b-30k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fullassistantowl dataset. It achieves the following results on the evaluation set: - Loss: 0.0646 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_sft_non_web-qwen-coder-7b-3epochs-30k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the fullsftnonweb dataset. It achieves the following results on the evaluation set: - Loss: 0.2527 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
unified-agent-sample
unified-agent-30k
unified-agent-sample-60k
unified-agent-sample-80k
This model is a fine-tuned version of Qwen/Qwen3-8B on the sample-80k dataset. It achieves the following results on the evaluation set: - Loss: 0.3979 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
Qwen3-14B-30K-sft
This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 64 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 3.6.0 - Tokenizers 0.21.1
Qwen3-8B-30K-sft
This model is a fine-tuned version of Qwen/Qwen3-8B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 2 - totaltrainbatchsize: 32 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 3.6.0 - Tokenizers 0.21.1
unified-agent-code-mix-sample
This model is a fine-tuned version of Qwen/Qwen3-8B on the code-mix-sample dataset. It achieves the following results on the evaluation set: - Loss: 0.3868 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
unified-agent-SWE-smith_5kTrajectories-2epochs
unified-agent-SWE-smith_5kTrajectories
This model is a fine-tuned version of Qwen/Qwen3-8B on the SWE-smith5kTrajectories dataset. It achieves the following results on the evaluation set: - Loss: 0.2062 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
qwen14b-sample-30k
web-long-sample
go-browse-wa
This model is a fine-tuned version of Qwen/Qwen3-8B on the go-browse-wa dataset. It achieves the following results on the evaluation set: - Loss: 0.4563 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
go-browse-wa-2epochs
unified_agent_data_30k_filtered_polished
This model is a fine-tuned version of Qwen/Qwen3-8B on the unifiedagentdata30kfilteredpolished dataset. It achieves the following results on the evaluation set: - Loss: 0.4546 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
nnetnav-2epochs
This model is a fine-tuned version of Qwen/Qwen3-8B on the nnetnav-live and the nnetnav-wa datasets. It achieves the following results on the evaluation set: - Loss: 0.9091 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 2 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
swe_original-qwen-7b-25k
swe_original-qwen-7b-30k-3epochs-5e-5
full_user_owl-qwen-7b-30k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the fulluserowl dataset. It achieves the following results on the evaluation set: - Loss: 0.3807 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
non_web_mcp-qwen3-8b-28k-5e-5
This model is a fine-tuned version of Qwen/Qwen3-8B on the nonwebmcp dataset. It achieves the following results on the evaluation set: - Loss: 0.2480 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
Qwen3-14B-30K-sft-nopacking-lr1e-6
web-sample-filtered-len24k
This model is a fine-tuned version of Qwen/Qwen3-8B on the web-sample-filtered dataset. It achieves the following results on the evaluation set: - Loss: 0.5447 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
swe_only_sweagent-qwen-7b-3epochs-30k-5e-5
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the sweonlysweagent dataset. It achieves the following results on the evaluation set: - Loss: 0.0843 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 3 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
full_sft_non_web-qwen-7b-3epochs-30k-5e-5
full_sft_mcp-qwen-7b-30k-5e-5
full_sft_mcp_1k-qwen-7b-30k-5e-5
Qwen3-14B-30K-sft-nopacking
This model is a fine-tuned version of Qwen/Qwen3-14B on the unifiedagent30K dataset. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 16 - gradientaccumulationsteps: 8 - totaltrainbatchsize: 128 - totalevalbatchsize: 128 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1.0 - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 2.21.0 - Tokenizers 0.21.1
go-browse-wa-2epochs-2.5
unified_agent_data_30k_filtered_unpolished
full_sft_non_web-qwen-7b-25k-2epochs-5e-5
agenttuning-qwen3-8b-16k-1e-5
This model is a fine-tuned version of Qwen/Qwen3-8B on the agenttuning dataset. It achieves the following results on the evaluation set: - Loss: 0.3331 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupratio: 0.05 - numepochs: 1 - Transformers 4.51.3 - Pytorch 2.7.0+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1