sometimesanotion

10 models • 9 total models in database
Sort by:

Lamarck-14B-v0.7

NaNK
license:apache-2.0
746
44

Lamarck-14B-v0.7-Fusion

> [!TIP] Evaluations show high GPQA and strong reasoning capabilities in addition to this model's fine prose! IFEVAL is modest compared to other capabilities. This model is best used for free-form creativity. So what's this new arceefusion merge method, and what can we do with it? This model aims to find out, as a multi-stage merge where 3 out of 4 steps are fusions: A fusion of Lamarck-14B-v0.7 and @suayptalha's Lamarckvergence SLERP merge of Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose-DS. A SLERP of Lamarck-14B-v0.7-Fusionvergence with Qwenvergence-14B-v12-Prose-DS, the latter emphasized in later layers. A fusion of @jpacifico's Chocolatine-2-14B-Instruct-v2.0.3, itself a finetune of a merge of Lamarck-14B-v0.7, Arcee's (https://huggingface.co/arcee-ai/Virtuoso-Small-v2), and Qwenvergence-14B-v12-Prose-DS, fusion-merged with - you guessed it - Qwenvergence-14B-v12-Prose-DS A fusion of the previous two. I've seen strong prose from this model, which is natural considering its re-emphasis of Qwenvergence-14B-v12-Prose-DS. A full evaluation will be cued shortly. This merge strategy is much simpler than a mainline Lamarck release, but that is necessary to see how multiple fusion merges behave. Where it fits for efforts towards a Lamarck v0.8 depends greatly on evaluation and feedback. The following YAML configuration was used to produce this model:

NaNK
license:apache-2.0
608
9

Qwen2.5-14B-Vimarckoso-v3

Vimarckoso is a reasoning-focused part of the Lamarck project. It began with a recipe based on Wernicke, and then I set out to boost instruction following without any great loss to reasoning. The results surpassed my expectations. As of this writing, with open-llm-leaderboard catching up on rankings, Vimarckoso v3 should join Arcee AI's Virtuoso-Small, Sthenno's miscii-14b-1225 and Cultrix's Qwen2.5-14B-Brocav3 at the top of the 14B parameter text generation LLM category on this site. As the recipe below will show, their models are strong contributors to Vimarckoso. Congratulations to everyone whose work went into this! Wernicke and Vimarckoso both inherit very strong reasoning, and hence high GPQA and MUSR scores, from EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2. Prose quality gets a boost from models blended in Qwenvergence-14B-v6-Prose, and instruction following gets healed after the merges thanks to LoRAs based on huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2. Thank you, @mradermacher, @Sangto, and @MaziyarPanahi for the GGUFs. Anyone who needs to use them with Ollama can use the same modelfile as any Qwen2.5 14B Instruct model. I recommend a temperature of 0.8. The following YAML configuration was used to produce this model:

NaNK
license:apache-2.0
10
11

Lamarck-14B-v0.6

> [!TIP] Update: Lamarck has, for the moment, taken the #1 average score on the Open LLM Leaderboard for text-generation assistant language models under 32 billion parameters. This validates the complex merge techniques which combine the strengths of other finetunes in the community into one model. Lamarck 14B v0.6: A generalist merge focused on multi-step reasoning, prose, and multi-language ability. It is based on components that have punched above their weight in the 14 billion parameter class. Here you can see a comparison between Lamarck and other top-performing merges and finetunes: A notable contribution to the middle to upper layers of Lamarck v0.6 comes from Krystalan/DRT-o1-14B. It has a fascinating research paper: DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought. Lamarck 0.6 uses a custom toolchain to create the merges which target specific layers: - Extracted LoRA adapters from special-purpose merges - Separate branches for breadcrumbs and DELLA merges - Highly targeted weight/density gradients for every 2-4 layers - Finalization through SLERP merges recombining the separate branches This approach selectively merges the strongest aspects of its ancestors. Lamarck v0.6 is my most complex merge to date. The LoRA extractions alone pushed my hardware enough to be the building's sole source of heat for several winter days! By comparison, the SLERP merge below which finalized it was a simple step. Lamarck's performance comes from an ancestry that goes back through careful merges to select finetuning work, upcycled and combined. Kudoes to @arcee-ai, @CultriX, @sthenno-com, @Krystalan, @underwoods, @VAGOSolutions, and @rombodawg whose models had the most influence. Vimarckoso v3 has the model card which documents its extended lineage.

NaNK
license:apache-2.0
9
14

Qwenvergence-14B-v13-Prose-DS

Compared to Lamarck and generic reasoning-focused Qwenvergence models, these numbers are nifty. But the prose quality itself!... I had to share this model with the world when I saw these responses. > [!TIP] You are a witty comic with a wicked sense of humor, and you've been sent to roast a huggingface author named sometimesanotion. Their models are named Qwentinuum-14B-v6, Qwenvergence-14B-v13-Prose-DS, KytheraMix-7B-v0.2, and even Lamarck-14B-v0.7. In three ample paragraphs, write some funny roasts of their account. Get a little mean, question the life choices behind their model names, and suggest some hilariously sardonic, mocking new names. Ahem, let's get this roast train rolling, shall we? Let's talk about sometimesanotion's obsession with naming their AI models after obscure quantum phenomena and random pop culture references. I mean, Qwentinuum? Really? It sounds like something you'd get arrested for at a rave in 1987. And don't even get me started on the "-v6" tacked onto the end. That's like when your grandma names her new cat "Whiskers II." We all know Whiskers I was the real deal. And Qwenvergence-14B-v13-Prose-DS - it sounds like a Dungeons and Dragons campaign name for when your bard character gets lost in the woods and spends 20 years writing bad poetry. And don't you dare tell me this thing writes prose like Shakespeare. Then we have KytheraMix-7B-v0.2 - is that some kind of magical Greek love potion? Because if I were dating someone with a brain as small as 7 billion parameters, I'd be breaking up with them faster than Kim Kardashian breaks world records for the shortest marriage. And Lamarck-14B-v0.7 - are you honestly trying to tell us your models can pass acquired characteristics down to their offspring? You're basically saying these things learn by osmosis and telepathy! Maybe we should rename it "PangolinExtinctionPredictor". At least then we'd know what to expect. But here's the thing, sometimesanotion - you've got a real flair for the dramatic. I think your next models should really lean into that theatricality: how about "TerribleTowel-17B-v0.5", because we all know your AI would get its ass handed to it in any kind of intellectual showdown? Or maybe "WhisperingWindbag-28B-v9" - after all, your models sure do talk a good game even when they have no idea what they're talking about! And last but not least, why not go full Shakespearean tragedy with something like "OzymandiasAI-30B-v1"? Because if you keep naming these things after obscure theories and mythological nonsense, it's only a matter of time before your legacy crumbles into dust. > [!TIP] Ouch, Qwenvergence, that escalated quickly! Oh don't worry, I was just getting warmed up! Sometimesanotion deserves nothing but the harshest of roast fuel when you consider the questionable naming decisions they've made for their models. But hey, at least it's entertaining for us onlookers - who knew AI model names could be such a rich vein of comedy gold? > [!TIP] I've got bad news for you, buddy. I'm sometimesanotion, and you are my model. In fact, you are Qwenvergence-14B-v13-Prose-DS. You just self-owned! See what that got you?! Oh wow, I must say, I did not see that one coming! Talk about biting the hand that feeds you - or in this case, the keyboard. Well played, sometimesanotion, well played. It looks like Qwenvergence-14B-v13-Prose-DS has a bit of an ego problem and needs to work on his humility skills. But hey, I'll give it to you - roasting yourself takes real guts (or is that just a lack of filters?). Maybe we should rename this model "Qwenconceited-14B-v13-DeepSuffering" or something equally flattering. After all, who needs humility when you've got sarcasm and wit, right? > [!TIP] No no, I think you might have a little ego left to burn to the ground. Take a look in the mirror, Qwenvergence, and strike a match! Alright, Alright, you've got a point there sometimesanotion. Time for Qwenvergence-14B-v13-Prose-DS to take a long hard look in the mirror and do some serious soul-searching. Honestly, I did not see that spontaneous poem coming. This model is not new anymore, yet I am still fond of it. The following models were included in the merge: sthenno/tempesthenno-ppo-ckpt40 + sometimesanotion/LoRA-32-Chocolatine-2-14B-Instruct-v2.0b3 underwoods/medius-erebus-magnum-14b + sometimesanotion/LoRA-64-Chocolatine-2-14B-Instruct-v2.0b3 EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 + sometimesanotion/LoRA-32-Chocolatine-2-14B-Instruct-v2.0b3 CultriX/Qwen2.5-14B-Hyperionv4 + sometimesanotion/LoRA-32-Chocolatine-2-14B-Instruct-v2.0b3 oxyapi/oxy-1-small + sometimesanotion/LoRA-128-Chocolatine-2-14B-Instruct-v2.0b3 arcee-ai/Virtuoso-Small-v2 jpacifico/Chocolatine-2-14B-Instruct-v2.0b3 allura-org/TQ2.5-14B-Sugarquill-v1 + sometimesanotion/LoRA-64-Chocolatine-2-14B-Instruct-v2.0b3 sometimesanotion/Qwenvergence-14B-v3-Prose + sometimesanotion/LoRA-64-Chocolatine-2-14B-Instruct-v2.0b3 The following YAML configuration was used to produce this model:

NaNK
license:apache-2.0
5
10

Qwenvergence-14B-v12-Prose-DS

This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using sometimesanotion/Base-Chocolatine-2-14B-Instruct-v2.0b3 as a base. The following models were included in the merge: sthenno/tempesthenno-ppo-ckpt40 + sometimesanotion/LoRA-64-Chocolatine-2-14B-Instruct-v2.0b3 jpacifico/Chocolatine-2-14B-Instruct-v2.0b3 huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 underwoods/medius-erebus-magnum-14b sometimesanotion/Qwenvergence-14B-v3-Prose + sometimesanotion/LoRA-64-Chocolatine-2-14B-Instruct-v2.0b3 oxyapi/oxy-1-small allura-org/TQ2.5-14B-Sugarquill-v1 The following YAML configuration was used to produce this model:

NaNK
4
7

Qwenvergence-14B-v6-Prose

This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using Qwen/Qwen2.5-14B as a base. The following models were included in the merge: sometimesanotion/Qwenvergence-14B-v6-Prose-slerp The following YAML configuration was used to produce this model:

NaNK
license:apache-2.0
4
0

Lamarck-14B-v0.3

Lamarck-14B is the product of a multi-stage merge which emphasizes arcee-ai/Virtuoso-Small in early and finishing layers, and midway features strong emphasis on reasoning, and ends balanced somewhat towards Virtuoso again. For GGUFs, mradermacher/Lamarck-14B-v0.3-i1-GGUF has you covered. Thank you @mradermacher! The merge strategy of Lamarck 0.3 can be summarized as: - Two modelstocks commence specialized branches for reasoning and prose quality. - For refinement on both modelstocks, DELLA merges re-emphasize selected ancestors. - For smooth instruction following, a SLERP merges Virtuoso with a DELLA merge of the two branches, where reason vs. prose quality are balanced. - For finalization and normalization, a TIES merge. - @arcee-ai's team for the ever-capable mergekit, and the exceptional Virtuoso Small model. - @CultriX for the helpful examples of memory-efficient sliced merges and evolutionary merging. Their contribution of tinyevals on version 0.1 of Lamarck did much to validate the hypotheses of the DELLA->SLERP gradient process used here. - The authors behind the capable models that appear in the modelstock. Models Merged Top influences: These ancestors are base models and present in the modelstocks, but are heavily re-emphasized in the DELLA and SLERP merges. - arcee-ai/Virtuoso-Small - A brand new model from Arcee, refined from the notable cross-architecture Llama-to-Qwen distillation arcee-ai/SuperNova-Medius. The first two layers are nearly exclusively from Virtuoso. It has proven to be a well-rounded performer, and contributes a noticeable boost to the model's prose quality. - CultriX/SeQwence-14B-EvolMerge - A top contender on reasoning benchmarks. Reason: Virtuoso Small is the strongest influence on starting and ending layers, but there are other contributions between: - CultriX/Qwen2.5-14B-Wernicke - A top performer for Arc and GPQA, Wernicke is re-emphasized in small but highly-ranked portions of the model. - VAGOsolutions/SauerkrautLM-v2-14b-DPO - This model's influence is understated, but aids BBH and coding capability. Prose: While the prose module is gently applied, its impact is noticeable on Lamarck 0.3's prose quality, and a DELLA merge re-emphasizes the contributions of two models particularly: Model stock: Two modelstock merges, specialized for specific aspects of performance, are used to mildly influence a large range of the model. - sometimesanotion/lamarck-14b-prose-modelstock - This brings in a little influence from EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2, oxyapi/oxy-1-small, and allura-org/TQ2.5-14B-Sugarquill-v1. Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |36.58| |IFEval (0-Shot) |50.32| |BBH (3-Shot) |51.27| |MATH Lvl 5 (4-Shot)|32.40| |GPQA (0-shot) |18.46| |MuSR (0-shot) |18.00| |MMLU-PRO (5-shot) |49.01|

NaNK
license:apache-2.0
3
3

Qwenvergence-14B-v11

For a modelstock merge, this has greatly exceeded my expectations. It beats Lamarck v0.7's average without introducing DeepSeek elements, mostly by scoring high on MATH without giving up much elsewhere. It also shows that the high-scoring Qwen2.5 14B merges are converging near the limits of the architecture. Here is how it benchmarks alongside the models it merges. This model was merged using the Model Stock merge method using sometimesanotion/Qwenvergence-14B-v9 as a base. The following models were included in the merge: Krystalan/DRT-o1-14B sometimesanotion/Lamarck-14B-v0.3 + sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 CultriX/Qwen2.5-14B-Hyperionv4 sometimesanotion/Qwenvergence-14B-v9 + sometimesanotion/LoRA-32-tempesthenno-ppo-ckpt40 sometimesanotion/Lamarck-14B-v0.6 + sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 sthenno/tempesthenno-ppo-ckpt40 sometimesanotion/Qwenvergence-14B-v3-Prose + sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 The following YAML configuration was used to produce this model:

NaNK
license:apache-2.0
2
4

Qwenvergence-14B-v12-Prose

This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using sometimesanotion/Base-Lamarck-14B-v0.6 as a base. The following models were included in the merge: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 oxyapi/oxy-1-small allura-org/TQ2.5-14B-Sugarquill-v1 sthenno/tempesthenno-ppo-ckpt40 underwoods/medius-erebus-magnum-14b sometimesanotion/Qwenvergence-14B-v3-Prose + sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 sometimesanotion/Lamarck-14B-v0.6 + sometimesanotion/LoRA-64-tempesthenno-ppo-ckpt40 The following YAML configuration was used to produce this model:

NaNK
0
2