sonthenguyen
NeuralHermes-2.5-Mistral-7B
zephyr-sft-bnb-4bit-DPO-mtbc-213steps
TrainOutput(globalstep=213, trainingloss=0.09253080396371667, metrics={'trainruntime': 1906.7032, 'trainsamplespersecond': 1.791, 'trainstepspersecond': 0.112, 'totalflos': 0.0, 'trainloss': 0.09253080396371667, 'epoch': 0.4991212653778559}) --- basemodel: unsloth/zephyr-sft-bnb-4bit language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - mistral - trl - dpo --- - Developed by: sonthenguyen - License: apache-2.0 - Finetuned from model : unsloth/zephyr-sft-bnb-4bit This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.
zephyr-sft-bnb-4bit-DPO-mtbo-137steps
zephyr-sft-bnb-4bit-DPO-mtbr-180steps
TrainOutput(globalstep=180, trainingloss=0.06576051639137069, metrics={'trainruntime': 1641.641, 'trainsamplespersecond': 1.752, 'trainstepspersecond': 0.11, 'totalflos': 0.0, 'trainloss': 0.06576051639137069, 'epoch': 0.5006954102920723}) --- basemodel: unsloth/zephyr-sft-bnb-4bit language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - mistral - trl - dpo --- - Developed by: sonthenguyen - License: apache-2.0 - Finetuned from model : unsloth/zephyr-sft-bnb-4bit This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.