MarsupialAI
Cydonia-22B-v1_iMat_GGUF
Monstral-123B-v2_GGUF
iMatrix GGUFs for https://huggingface.co/MarsupialAI/Monstral-123B-v2
Llama3_GGUF_Quant_Testing
Gemmasutra-Mini-2B-v1_iMatrix_GGUF
Big-Tiger-Gemma-27B-v1_iMatrix_GGUF
Gemmasutra-Pro-27B-v1_iMatrix_GGUF
Buddy-2B-v1_iMatrix_GGUF
Hercules-Qwen1.5-14B_iMatrix_GGUF
Rocinante-12B-v1_iMatrix_GGUF
Cat-Llama-3-70B-instruct_iMatrix_GGUF
Coomand-R-35B-v1_iMatrix_GGUF
Celeste-12B-V1.6_iMatrix_GGUF
MG-FinalMix-72B_iMatrix_GGUF
Fireplace-34b_iMatrix_GGUF
Moist-Miqu-70B-v1_iMatrix_GGUF
llama-3-70B-Instruct-abliterated_iMatrix_GGUF
Monstral-123B_iMat_GGUF
IxChel-L3-12B_iMatrix_GGUF
Llama-3SOME-8B-v1-BETA_iMatrix_GGUF
Mistral-Dory-12B_iMatrix_GGUF
Mini-Magnum-Unboxed-12B_iMatrix_GGUF
KunoichiVerse-7B_iMatrix_GGUF
Magnum-12b-v2_iMatrix_GGUF
Lumimaid-v0.2-12B_iMatrix_GGUF
Yi-34B-200k-v2_GGUF
L3.1-8B-Celeste-V1.5_iMatrix_GGUF
Lusca-33B_iMat_GGUF
Foredoomed-9B_iMatrix_GGUF
Yi-34B-200K-RPMerge_GGUF
Phi-3-mini-128k-instruct_iMatrix_GGUF
Captain-Adventure-32B_iMat_GGUF
Moistral-11B-v3_iMatrix_GGUF
Nautilus-70B-v0.1_iMat_GGUF
Llama-3.1-Nemotron-70B-Instruct_iMat_GGUF
Monstral-123B-v2
Qwen1.5-32B-Chat_iMatrix_GGUF
Yi-9B-200K_iMatrix_GGUF
Blossom-v5-32b_iMatrix_GGUF
Young-Children-Storyteller-Mistral-7B_iMatrix_GGUF
Buttocks-7B-v1.1_GGUF
Pygmalion-2-13b_iMatrix_GGUF
Garbage_9B_iMatrix_GGUF
Merged-RP-Stew-V2-34B_iMatrix_GGUF
Yi-6B-200k-v2_GGUF
Psyonic-Cetacean-20b-v2_iMatrix_GGUF
Moistral-11B-v2.1b-SOGGY_iMatrix_GGUF
Aqueducts-18B_iMatrix_GGUF
Moistral-11B-v4_iMatrix_GGUF
Qwen1.5-32B_iMatrix_GGUF
Cydonia-22B-v1.3_iMat_GGUF
GGUF quants of https://huggingface.co/TheDrummer/Cydonia-22B-v1.3 iMatrix generated using Kalomaze's groupsmerged.txt
SkunkApe-16b_iMatrix_GGUF
SkunkApe-14b_iMatrix_GGUF
Faro-Yi-34B-200K_iMatrix_GGUF
pippafeet-11B-0.2_iMatrix_GGUF
KobbleTiny-1.1B_iMatrix_GGUF
aanaphi2-v0.1_GGUF
HelloNurse-11b_GGUF
Melusine_103b_GGUF
NorLlama-3B_GGUF
JerseyDevil-14b_iMatrix_GGUF
KitchenSink_103b_iMatrix_GGUF
Moistral-11B-v1_iMatrix_GGUF
LaDameBlanche-v2-95b_iMatrix_GGUF
Dumbstral-169B_GGUF
Q4KS GGUF for https://huggingface.co/MarsupialAI/Dumbstral-169B No imat, no other quant schemes. This is all I'm willing to do for a model that nobody can reasonably run. FSM help Bartowski and Mradermacher if they choose to run full quant sets for this bastard.
KitchenSink_103b
Yeet 51b 200k
This model is a rotating-stack merge of three Yi 34b 200k models in a 51b (90 layer) configuration. My reasoning behind this merge was twofold: I'd never seen a stacked merge made from 34b models, and I thought that maybe this could give near-70b performance, but with a much larger context window while still fitting within 48GB of VRAM. I think the results are quite good. The model performs on par with many 70b models at RP, chat, and storywriting. At Q4KS it will fit into a pair of 24GB GPUs with 32k context. Coherency at 32k is excellent, and will probably remain very good well beyond that thanks to the 200k base training. The gotcha here is speed. While it inferences as you'd expect for the model size, it's much slower than a similarly-sized 8x7b MoE. And while I personally find the output of this model to outperform any mixtral finetune I've seen so far, those finetunes are getting better all the time, and this really is achingly slow with a lot of context. I'm getting less than half a token per second on a pair of P40s with a full 32k prompt. But that's not to say this model (or even the 51b stack concept) is useless. If you're patient, you can get extremely good output with very deep context on attainable hardware. There are undoubtedly niche scenarios where this model or similarly-constructed models might be ideal. Component models for the rotating stack are - adamo1139/Yi-34B-200K-AEZAKMI-v2 - brucethemoose/Yi-34B-200K-DARE-megamerge-v8 - taozi555/RpBird-Yi-34B-200k This model is uncensored and capable of generating objectionable material. However, it is not an explicitely-NSFW model, and it has never "gone rogue" and tried to insert NSFW content into SFW prompts in my experience. As with any LLM, no factual claims made by the model should be taken at face value. You know that boilerplate safety disclaimer that most professional models have? Assume this has it too. This model is for entertainment purposes only. FP16 and Q4KS GGUFs are located here: https://huggingface.co/MarsupialAI/Yeet51b200kGGUFQ4KSFP16 Prompt format Seems to work fine with Alpaca prompts. Considering the variety of components, other formats are likely to work to some extent. WTF is a rotating-stack merge? Inspired by Undi's experiments with stacked merges, Jeb Carter found that output quality and model initiative could be significantly improved by reversing the model order in the stack, and then doing a linear merge between the original and reversed stacks. That is what I did here. I created three passthrough stacked merges using the three source models (rotating the model order in each stack), and then doing a linear merge of all three stacks. The exact merge configs can be found in the recipe.txt file.
LaDameBlanche-103b_iMatrix_GGUF
Monstral-123B
Yeet_51b_200k_GGUF_Q4KS_FP16
SkunkApe-16b
Cydonia-22B-v1.3_EXL2_4.5bpw
Lusca-33B
Dumbstral-169B
IxChel-L3-12B
Monstral-123B_4.0bpw_EXL2
4bpw EXL2 quant of https://huggingface.co/MarsupialAI/Monstral-123B Default settings and dataset utilized for measurements.
Cydonia-22B-v1.3_EXL2_5.5bpw_H8
Monstral-123B_3.5bpw_EXL2
LaDameBlanche-v2-95b
SkunkApe-14b
Aqueducts-18B
HelloNurse-11b
LaDameBlanche-103b
JerseyDevil-14b
Moistral-11B-v4_EXL2
Moistral-11B-v3_exl2
Llama-3SOME-8B-v1-BETA_6.9bpw_exl2
Smegmma-9B-v1_elx2
Cydonia-22B-v1_EXL2
UnslopNemo-12B-v3_EXL2_6bpw_H8
6.0bpw EXL2 quant of https://huggingface.co/TheDrummer/UnslopNemo-12B-v3