fblgit
UNA-SimpleSmaug-34b-v1beta
Scoring 04-February-2024 #1 34B model, outperforming its original base model Smaug-34B-v0.1 with `77.41` 😎 Oh, btw.. this one went thru SFT so the abacus inside Smaug is back to normal.. so you can further train/dpo him .. RESET!.. UPDATES March : Stills undisputed 34B King Smaug 70B stills undisputed 70B King ==== And people wonders.. why there is no UNA of Hermes or Smaug 70B? << i dont think is worth the time to spend on a model that is widely known for not being too useful, likely UNA can fix some of the internal mess.. for Hermes, we spoke chitchat quick a couple times but nothing solid, but we would like to make a reborn of excellent models using UNA, just liek we did with UNA-Dolphin where we saw relevant performance is short time. === Applied UNA only on the Attention, not on the MLP's Is based on Smaug SimpleMath dataset It was trained on Axolotl Experiment The thing here is to understand whats the impact of SimpleMath applied at the attention layer during a SFT session and how it impacts on the neural network overall. Results: Improving mathematican and reasoning capabilities without degrading and presserving previous training sessions. And enjoy our ModelSimilarities tool detector https://github.com/fblgit/model-similarity where we confirmed numerically the bloodties of the model. Evals | Metric |Value| |---------------------------------|----:| |Avg. |77.41| |AI2 Reasoning Challenge (25-Shot)|74.57| |HellaSwag (10-Shot) |86.74| |MMLU (5-Shot) |76.68| |TruthfulQA (0-shot) |70.17| |Winogrande (5-shot) |83.82| |GSM8k (5-shot) |72.48| Citations To abacusai for making Smaug-34B, the Bagel, and all the magic behind the base model. If you use the model, provide citation even for merges or anything. Open LLM Leaderboard Evaluation Results Detailed results can be found here Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |23.12| |IFEval (0-Shot) |45.56| |BBH (3-Shot) |32.78| |MATH Lvl 5 (4-Shot)| 0.15| |GPQA (0-shot) | 8.95| |MuSR (0-shot) |11.96| |MMLU-PRO (5-shot) |39.33|
una-cybertron-7b-v2-bf16
Model Card for una-cybertron-7b-v2-bf16 (UNA: Uniform Neural Alignment) We strike back, introducing Cybertron 7B v2 a 7B MistralAI based model, best on it's series. Trained on SFT, DPO and UNA (Unified Neural Alignment) on multiple datasets. He scores EXACTLY #1 with 69.67+ score on HF LeaderBoard board, #8 ALL SIZES top score. v1 Scoring #1 at 2 December 2023 with 69.43 ..few models were releasse .. but only 1 can survive: CYBERTRON! v2 Scoring #1 at 5 December 2023 with 69.67 | Model | Average | ARC (25-s) | HellaSwag (10-s) | MMLU (5-s) | TruthfulQA (MC) (0-s) | Winogrande (5-s) | GSM8K (5-s) | | --- | --- | --- | --- | --- | --- | --- | --- | | mistralai/Mistral-7B-v0.1 | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 | | Intel/neural-chat-7b-v3-2 | 68.29 | 67.49 | 83.92 | 63.55 | 59.68 | 79.95 | 55.12 | | perlthoughts/Chupacabra-7B-v2 | 63.54 | 66.47 | 85.17 | 64.49 | 57.6 | 79.16 | 28.35 | | fblgit/una-cybertron-7b-v1-fp16 | 69.49 | 68.43 | 85.85 | 63.34 | 63.28 | 80.90 | 55.12 | | fblgit/una-cybertron-7b-v2-bf16 | 69.67 | 68.26 | 85.?4 | 63.23 | 64.63 | 81.37 | 55.04 | The model excels in mathematics, logic, reasoning, overall very smart. He can make a deep reasoning over the context and prompt, it gives the impression of not missing details around. Adiestrated with UNA: Uniform Neural Alignment technique (paper going out soon). What is NOT UNA? Its not a merged layers model. Is not SLERP or SLURP or similar. What is UNA? A formula & A technique to TAME models When will be released the code and paper? When have time, contribute and it'll be faster. - Developed by: juanako.ai - Author: Xavier M. - Investors CONTACT HERE - Model type: MistralAI 7B - Funded by Cybertron's H100's with few hours training. Prompt The model is very good, works well on almost any prompt but ChatML format and Alpaca System gets the best Users also report that exllamav2HF loader, 8bpw-h8 exl2 quant, simple-1 preset provides good results - Transformers 4.35.0-UNA - Pytorch 2.1.0 - Datasets 2.14.6 - Tokenizers 0.14.1 Citations If you find Cybertron, Juanako or any of our models useful, specially if you use it for your big brand.. or you clone/merge my modelsm, cite please: Special thanks to @TheBloke & @bartowski for converting the models and their support to the community. Thank you! Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |---------------------------------|----:| |Avg. |69.67| |AI2 Reasoning Challenge (25-Shot)|68.26| |HellaSwag (10-Shot) |85.85| |MMLU (5-Shot) |63.23| |TruthfulQA (0-shot) |64.63| |Winogrande (5-shot) |80.98| |GSM8k (5-shot) |55.04|
UNAversal-2x7B-v1
UNA-POLAR-10.7B-InstructMath-v2
UNAversal-8x7B-v1beta
LUNA-SOLARkrautLM-Instruct
juanako-7b-UNA
This model is a fine-tuned version of fblgit/juanako-7b-UNA-v2-phase-1 on the HuggingFaceH4/ultrafeedbackbinarized dataset. It outperforms in many aspects most of the current Mistral based models and is the latest and most powerful juanako version as of now. | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Winogrande (5-s) | GSM8K (5-s) | DROP (3-s) | | --- | --- | --- | --- | --- | --- | --- | --- | --- | |mistralai/Mistral-7B-v0.1 | 50.32 | 59.58 | 83.31 | 64.16 | 42.15 | 78.37 | 18.12 | 6.14 | | Intel/neural-chat-7b-v3-1 | 59.0 | 66.21 | 83.64 | 62.37 | 59.65 | 78.14 | 19.56 | 43.84 | | fblgit/juanako-7b-UNA | 59.91 | 68.17 | 85.34 | 62.47 | 65.13 | 78.85 | 20.7 | 38.74 | It scores: 59.91 according HuggingFace LLM Leaderboard. It scores: 65.1 with `big-refactor` branch of lm-eval-harness juanako uses UNA, Uniform Neural Alignment. A training technique that ease alignment between transformer layers yet to be published. Prompts The following prompts showed positive results, it may depend the task and needs further experimentation but this should work for starters: The following hyperparameters were used during training: - learningrate: 0.0001 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 14 - gradientaccumulationsteps: 16 - totaltrainbatchsize: 224 - totalevalbatchsize: 14 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lrschedulertype: linear - lrschedulerwarmupratio: 0.01 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.4795 | 0.2 | 56 | 0.4958 | -1.3684 | -2.6385 | 0.7552 | 1.2701 | -265.3887 | -241.2612 | -2.2572 | -2.4922 | | 0.4642 | 0.4 | 112 | 0.4859 | -1.0380 | -1.9769 | 0.7273 | 0.9389 | -258.7718 | -237.9569 | -2.2414 | -2.4751 | | 0.4758 | 0.61 | 168 | 0.4808 | -1.2594 | -2.3704 | 0.7343 | 1.1110 | -262.7074 | -240.1708 | -2.2305 | -2.4633 | | 0.4549 | 0.81 | 224 | 0.4768 | -1.1906 | -2.3201 | 0.7552 | 1.1295 | -262.2044 | -239.4827 | -2.2284 | -2.4610 | - Transformers 4.35.0-UNA - Pytorch 2.1.0 - Datasets 2.14.6 - Tokenizers 0.14.1 Thanks to all the brilliant humans behind the creation of AI, here some of the ones that we find relevant to our research. If you feel a citation is missing, please contact. Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |---------------------------------|----:| |Avg. |67.46| |AI2 Reasoning Challenge (25-Shot)|68.17| |HellaSwag (10-Shot) |85.34| |MMLU (5-Shot) |62.47| |TruthfulQA (0-shot) |65.13| |Winogrande (5-shot) |78.85| |GSM8k (5-shot) |44.81| Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |20.77| |IFEval (0-Shot) |48.37| |BBH (3-Shot) |30.42| |MATH Lvl 5 (4-Shot)| 2.87| |GPQA (0-shot) | 6.15| |MuSR (0-shot) |17.16| |MMLU-PRO (5-shot) |19.68|
una-cybertron-7b-v3-OMA
una-cybertron-7b-v1-fp16
una-xaberius-34b-v1beta
UNA-TheBeagle-7b-v1
UNA-TheBeagle-7b-v1 TheBeagle, a model of 7B parameters trained on The Bagel dataset. DPO & UNA applied over a set of curated DPO Pairs. - Scored #1 on the HF Leaderboard, dramatic scores!!! 73 ARC, and very well balanced! The dataset was generated using the original bagel code, including the decontamination step. As base model, we used the latest Intel's neural-chat model. It performs very good in many tasks, but its always better that you play with it by yourself. Ran with VLLM so expect them to dont be exactly as the one's shown in the board, but not too far :) For this release, we only applied UNA thru the perceptrons. It was done at a 3.5e-7 speed, and the training loop code is also the original one of the bagel and transformers-4.35.2-UNA Im not entirely sure of it, as we used the vanilla version of the bagel training code. But a good model should be able to generalize with different prompt formats, so feel free to give it a shot. Remember if you use UNA's models, cite it in your model card. Limitations Not for commercial use, and only for academic & research purposes.
UNA-SOLAR-10.7B-Instruct-v1.0
cybertron-v4-qw7B-MGS
WE ARE BACK Cybertron v4, #1 LLM in its class. Based on the amazing Qwen2.5 7B Here we use our novel approach called `MGS`. Its up to you to figure out what it means. Cybertron V4 went thru SFT over `Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1` Quantz Avaialble at https://huggingface.co/bartowski/cybertron-v4-qw7B-MGS-GGUF MGS, among other things.. a strategy of tackling corpora forgetful. Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |31.21| |IFEval (0-Shot) |62.64| |BBH (3-Shot) |37.04| |MATH Lvl 5 (4-Shot)|27.72| |GPQA (0-shot) | 8.05| |MuSR (0-shot) |13.20| |MMLU-PRO (5-shot) |38.59| Thanks to @rombodawg for contributing with a free to use Inference space hosted at: The following hyperparameters were used during training: - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - totaltrainbatchsize: 128 - totalevalbatchsize: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.7405 | 0.0007 | 1 | 0.5760 | | 0.6146 | 0.0502 | 71 | 0.5045 | | 0.5908 | 0.1003 | 142 | 0.4930 | | 0.5669 | 0.1505 | 213 | 0.4854 | | 0.5575 | 0.2007 | 284 | 0.4811 | | 0.535 | 0.2508 | 355 | 0.4765 | | 0.5161 | 0.3010 | 426 | 0.4736 | | 0.5268 | 0.3511 | 497 | 0.4726 | | 0.5119 | 0.4013 | 568 | 0.4701 | | 0.5329 | 0.4515 | 639 | 0.4687 | | 0.5167 | 0.5016 | 710 | 0.4673 | | 0.5105 | 0.5518 | 781 | 0.4660 | | 0.5203 | 0.6020 | 852 | 0.4653 | | 0.5035 | 0.6521 | 923 | 0.4646 | | 0.4903 | 0.7023 | 994 | 0.4641 | | 0.5031 | 0.7525 | 1065 | 0.4628 | | 0.5147 | 0.8026 | 1136 | 0.4629 | | 0.5037 | 0.8528 | 1207 | 0.4620 | | 0.5029 | 0.9029 | 1278 | 0.4620 | | 0.492 | 0.9531 | 1349 | 0.4621 | - PEFT 0.13.2 - Transformers 4.45.2 - Pytorch 2.3.0+cu121 - Datasets 3.0.1 - Tokenizers 0.20.1
TheBeagle-v2beta-32B-MGS
TheBeagle-v2beta-32B-MGS This model is an experimental version of our latest innovation: `MGS`. Its up to you to figure out what does it means, but its very explicit. We didn't applied our known `UNA` algorithm to the forward pass, but they are entirely compatible and operates in different parts of the neural network and in different ways, tho they both can be seen as a regularization technique. CHANGELOG UPDATE: 26/Oct Updated `tokenizerconfig.json` (from the basemodel) Regenerated Quants (being uploaded) Re-submitted Leaderboard Evaluation, MATH & IFeval have relevant updates Aligned LICENSE with `Qwen` terms. MGS MGS stands for... Many-Geeks-Searching... and thats it. Hint: `1+1 is 2, and 1+1 is not 3` We still believe on 1-Epoch should be enough, so we just did 1 Epoch only. Dataset Used here the first decent (corpora & size) dataset on the hub: `Magpie-Align/Magpie-Pro-300K-Filtered` Kudos to the Magpie team to contribute with some decent stuff that I personally think is very good to ablate. It achieves the following results on the evaluation set: - Loss: 0.5378 (1 Epoch), outperforming the baseline model. Quants On top of the Qwen LICENSE, we add an extra term for derivatives to include "Beagle" or "MGS" on the model name, this will help us to track better the study. Thank you The following hyperparameters were used during training: - learningrate: 8e-05 - trainbatchsize: 2 - evalbatchsize: 2 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 64 - totalevalbatchsize: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lrschedulertype: cosine - lrschedulerwarmupsteps: 25 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 9.8642 | 0.0012 | 1 | 0.7195 | | 2.077 | 0.0507 | 42 | 0.6161 | | 1.0325 | 0.1014 | 84 | 0.6093 | | 0.8945 | 0.1520 | 126 | 0.5962 | | 0.8532 | 0.2027 | 168 | 0.5869 | | 0.8185 | 0.2534 | 210 | 0.5805 | | 0.81 | 0.3041 | 252 | 0.5719 | | 0.7901 | 0.3548 | 294 | 0.5663 | | 0.7766 | 0.4054 | 336 | 0.5618 | | 0.7687 | 0.4561 | 378 | 0.5590 | | 0.7443 | 0.5068 | 420 | 0.5564 | | 0.7494 | 0.5575 | 462 | 0.5525 | | 0.7787 | 0.6081 | 504 | 0.5485 | | 0.7381 | 0.6588 | 546 | 0.5466 | | 0.7359 | 0.7095 | 588 | 0.5444 | | 0.7447 | 0.7602 | 630 | 0.5435 | | 0.7378 | 0.8109 | 672 | 0.5415 | | 0.7302 | 0.8615 | 714 | 0.5398 | | 0.7476 | 0.9122 | 756 | 0.5391 | | 0.715 | 0.9629 | 798 | 0.5378 | Open LLM Leaderboard Evaluation Results without chat template. Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |40.29| |IFEval (0-Shot) |45.03| |BBH (3-Shot) |58.07| |MATH Lvl 5 (4-Shot)|39.43| |GPQA (0-shot) |20.13| |MuSR (0-shot) |24.50| |MMLU-PRO (5-shot) |54.57| Thanks - Qwen Team for their outstanding model - MagPie Team for contributing plenty of datasets - Cybertron Cloud Compute
cybertron-v4-qw7B-UNAMGS
UNA IS BACK Cybertron v4 UNA-MGS, Based on the amazing Qwen2.5 7B SCORING #1 7-8B LLM WITH NO CONTAMINATION 21.11.2024 with avg. 31.82 This special edition went thru UNA at MLP layers just like miniclaus-1.5B Here we use our novel approach called `MGS`. Its up to you to figure out what it means. On top of that we used `UNA: Uniform Neural Alignment` Cybertron V4 went thru SFT with `MGS & UNA` over `Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1` dataset. Contamination Benchmark https://gair-nlp.github.io/benbench/ Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |31.82| |IFEval (0-Shot) |60.84| |BBH (3-Shot) |37.71| |MATH Lvl 5 (4-Shot)|29.91| |GPQA (0-shot) |10.85| |MuSR (0-shot) |12.69| |MMLU-PRO (5-shot) |38.89| We also followed https://arxiv.org/pdf/2410.21228 insights. The following hyperparameters were used during training: - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - totaltrainbatchsize: 64 - totalevalbatchsize: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.7824 | 0.0003 | 1 | 0.5555 | | 0.5489 | 0.0503 | 144 | 0.4848 | | 0.5348 | 0.1006 | 288 | 0.4732 | | 0.5256 | 0.1509 | 432 | 0.4670 | | 0.5172 | 0.2012 | 576 | 0.4621 | | 0.4882 | 0.2515 | 720 | 0.4578 | | 0.4848 | 0.3018 | 864 | 0.4550 | | 0.4678 | 0.3520 | 1008 | 0.4522 | | 0.4686 | 0.4023 | 1152 | 0.4502 | | 0.4775 | 0.4526 | 1296 | 0.4474 | | 0.4464 | 0.5029 | 1440 | 0.4454 | | 0.4772 | 0.5532 | 1584 | 0.4438 | | 0.4546 | 0.6035 | 1728 | 0.4425 | | 0.4661 | 0.6538 | 1872 | 0.4411 | | 0.4569 | 0.7041 | 2016 | 0.4399 | | 0.4529 | 0.7544 | 2160 | 0.4390 | | 0.4409 | 0.8047 | 2304 | 0.4380 | | 0.4405 | 0.8550 | 2448 | 0.4370 | | 0.4642 | 0.9053 | 2592 | 0.4363 | | 0.4566 | 0.9556 | 2736 | 0.4359 | - PEFT 0.13.2 - Transformers 4.45.2 (UNA & MGS patch) - Pytorch 2.3.0+cu121 - Datasets 3.0.1 - Tokenizers 0.20.1
miniclaus-qw1.5B-UNAMGS
Trained with `Magpie-Align/Magpie-Pro-MT-300K-v0.1` Using MGS & UNA (MLP) on this tiny but powerful model. It achieves the following results on the evaluation set: - Loss: 0.7193 Quants Available at: https://huggingface.co/bartowski/miniclaus-qw1.5B-UNAMGS-GGUF https://huggingface.co/QuantFactory/miniclaus-qw1.5B-UNAMGS-GGUF The following hyperparameters were used during training: - trainbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.1641 | 0.0007 | 1 | 0.8514 | | 0.9246 | 0.0503 | 76 | 0.7921 | | 0.8791 | 0.1006 | 152 | 0.7727 | | 0.8507 | 0.1509 | 228 | 0.7611 | | 0.8376 | 0.2012 | 304 | 0.7534 | | 0.793 | 0.2515 | 380 | 0.7467 | | 0.7834 | 0.3018 | 456 | 0.7421 | | 0.7807 | 0.3521 | 532 | 0.7384 | | 0.764 | 0.4023 | 608 | 0.7359 | | 0.7738 | 0.4526 | 684 | 0.7320 | | 0.7425 | 0.5029 | 760 | 0.7300 | | 0.7519 | 0.5532 | 836 | 0.7279 | | 0.7461 | 0.6035 | 912 | 0.7255 | | 0.7489 | 0.6538 | 988 | 0.7245 | | 0.7614 | 0.7041 | 1064 | 0.7222 | | 0.7576 | 0.7544 | 1140 | 0.7222 | | 0.7303 | 0.8047 | 1216 | 0.7209 | | 0.7332 | 0.8550 | 1292 | 0.7199 | | 0.7541 | 0.9053 | 1368 | 0.7202 | | 0.7369 | 0.9556 | 1444 | 0.7193 | - PEFT 0.13.2 - Transformers 4.45.2 - Pytorch 2.3.0+cu121 - Datasets 3.0.1 - Tokenizers 0.20.1 Thanks - Qwen Team for their outstanding model - MagPie Team for contributing plenty of datasets - Cybertron Cloud Compute
miniclaus-qw1.5B-UNAMGS-GRPO
This version is RL with GRPO on GSM8k for 1400 steps using this code: Trained with `Magpie-Align/Magpie-Pro-MT-300K-v0.1` and `GSM8k` Using MGS & UNA (MLP) on this tiny but powerful model, together with GRPO. There is some increased score in GSM and GPQA & MUSR, but this doesnt happens in all checkpoints and this is the one with the best marks. Thanks - Deepseek Team for the GRPO researches - HuggingFace for adopting GRPO on TRL - Qwen Team for their outstanding model - MagPie Team for contributing plenty of datasets - Cybertron Cloud Compute
UNA-ThePitbull-21.4B-v2
Introducing the best LLM in the industry. Nearly as good as a 70B, just a 21.4B based on saltlux/luxia-21.4b-alignment-v1.0 This model has not been poisoned to score high and be useless. We release him becaues its the real deal of EQ & IQ all together in a crazy powerful smart and conversational model. Quant Versions available at bartowski/UNA-ThePitbull-21.4B-v2-GGUF On V2 we implemented a different UNA strategy and covered partially the MLP's and Attention Layers. We also performed further SFT over V1 and further DPO over V1 and we'll release some of those soon as well. 1. SFT over V1 with `Replete-AI/codebagelhermes-2.5` at 1.0e-4 till 5.0e-5 for 1 epoch 2. DPO with: 1.0e-4 to minlr 5.0e-5 for 1 epoch `mlabonne/orpo-dpo-mix-40k` `jondurbin/py-dpo-v0.1` Evaluations Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |---------------------------------|----:| |Avg. |77.82| |AI2 Reasoning Challenge (25-Shot)|77.73| |HellaSwag (10-Shot) |91.79| |MMLU (5-Shot) |68.25| |TruthfulQA (0-shot) |78.24| |Winogrande (5-shot) |87.37| |GSM8k (5-shot) |63.53| Can only be compared with its non-una base model: the original luxia-21.4b and ThePitbull-v1 Citations mlabonne jondurbin & Replete-AI bartowski saltlux Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |22.60| |IFEval (0-Shot) |37.90| |BBH (3-Shot) |46.79| |MATH Lvl 5 (4-Shot)| 9.59| |GPQA (0-shot) | 6.94| |MuSR (0-shot) | 6.42| |MMLU-PRO (5-shot) |27.95|
UNA-POLAR-10.7B-InstructMath-v1
UNA-34BeagleSimpleMath-32K-v1
UNA-dolphin-2.6-mistral-7b-dpo-laser
juanako-7b-v1
UNA-ThePitbull-21.4-v1
pancho-v1-qw25-3B-UNAMGS
This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct: It achieves the following results on the evaluation set: - Loss: 0.6555 Model description Trained with MagPie: - Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered - Magpie-Align/Magpie-Pro-MT-300K-v0.1 License & Derivatives Any derivative (sft, merges, etc) using ANY layer from this model MUST include either `UNA` or `MGS` or `PANCHO` in their model name in order to obtain a LICENSE for derivatives of this model. The following hyperparameters were used during training: - learningrate: 2e-05 - seed: 42 - distributedtype: multi-GPU - numdevices: 8 - totaltrainbatchsize: 256 - totalevalbatchsize: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - numepochs: 1 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.2127 | 0.0015 | 1 | 0.8711 | | 0.9905 | 0.0509 | 35 | 0.7338 | | 0.9685 | 0.1019 | 70 | 0.7114 | | 0.9554 | 0.1528 | 105 | 0.6994 | | 0.9077 | 0.2037 | 140 | 0.6915 | | 0.9149 | 0.2547 | 175 | 0.6859 | | 0.9363 | 0.3056 | 210 | 0.6795 | | 0.8975 | 0.3566 | 245 | 0.6745 | | 0.9095 | 0.4075 | 280 | 0.6709 | | 0.9216 | 0.4584 | 315 | 0.6681 | | 0.9143 | 0.5094 | 350 | 0.6666 | | 0.8879 | 0.5603 | 385 | 0.6645 | | 0.9194 | 0.6112 | 420 | 0.6625 | | 0.9123 | 0.6622 | 455 | 0.6615 | | 0.9056 | 0.7131 | 490 | 0.6591 | | 0.9172 | 0.7641 | 525 | 0.6578 | | 0.886 | 0.8150 | 560 | 0.6566 | | 0.9155 | 0.8659 | 595 | 0.6568 | | 0.9029 | 0.9169 | 630 | 0.6560 | | 0.8942 | 0.9678 | 665 | 0.6555 | - PEFT 0.13.2 - Transformers 4.45.2 - Pytorch 2.3.0+cu121 - Datasets 3.0.1 - Tokenizers 0.20.1#