Lewdiculous
L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix
My GGUF-IQ-Imatrix quants for Sao10K/L3-8B-Stheno-v3.2. Sao10K with Stheno again, another banger! I recommend checking his page for feedback and support. > [!IMPORTANT] > Quantization process: > For future reference, these quants have been done after the fixes from #6920 have been merged. > Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. > This was a bit more disk and compute intensive but hopefully avoided any losses during conversion. > If you noticed any issues let me know in the discussions. > [!NOTE] > General usage: > Use the latest version of KoboldCpp. > For 8GB VRAM GPUs, I recommend the Q4KM-imat (4.89 BPW) quant for up to 12288 context sizes. > > Presets: > Some compatible SillyTavern presets can be found here (Virt's Roleplay Presets). > Check discussions such as this one for other recommendations and samplers. > [!TIP] > Personal-support: > I apologize for disrupting your experience. > Currently I'm working on moving for a better internet provider. > If you want and you are able to... > You can spare some change over here (Ko-fi). > > Author-support: > You can support the author at their own page. Click here for the original model card information. Support me here if you're interested: Ko-fi: https://ko-fi.com/sao10k wink Euryale v2? I have done a test run with multiple variations of the models, merged back to its base at various weights, different training runs too, and this Sixth iteration is the one I like most. Changes compared to v3.1 \- Included a mix of SFW and NSFW Storywriting Data, thanks to Gryphe \- Included More Instruct / Assistant-Style Data \- Further cleaned up Roleplaying Samples from c2 Logs -> A few terrible, really bad samples escaped heavy filtering. Manual pass fixed it. \- Hyperparameter tinkering for training, resulting in lower loss levels. Testing Notes - Compared to v3.1 \- Handles SFW / NSFW seperately better. Not as overly excessive with NSFW now. Kinda balanced. \- Better at Storywriting / Narration. \- Better at Assistant-type Tasks. \- Better Multi-Turn Coherency -> Reduced Issues? \- Slightly less creative? A worthy tradeoff. Still creative. \- Better prompt / instruction adherence.
MN-12B-Lyra-v4-GGUF-IQ-Imatrix
Captain-Eris_Violet-V0.420-12B-GGUF-ARM-Imatrix
Llama-3-Lumimaid-8B-v0.1-OAS-GGUF-IQ-Imatrix
> [!IMPORTANT] > Updated! > Version (v2) files added! With imatrix data generated from the FP16 and conversions directly from the BF16. > This is a more disk and compute intensive so lets hope we get GPU inference support for BF16 models in llama.cpp. > Hopefully avoiding any losses in the model conversion, as has been the recently discussed topic on Llama-3 and GGUF lately. > If you are able to test them and notice any issues let me know in the discussions. > [!IMPORTANT] > Relevant: > These quants have been done after the fixes from llama.cpp/pull/6920 have been merged. > Use KoboldCpp version 1.64 or higher, make sure you're up-to-date. > [!TIP] > I apologize for disrupting your experience. > My upload speeds have been cooked and unstable lately. > If you want and you are able to... > You can support my various endeavors here (Ko-fi). GGUF-IQ-Imatrix quants for NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS. Author: "This model received the Orthogonal Activation Steering treatment, meaning it will rarely refuse any request." > [!WARNING] > Compatible SillyTavern presets here (simple) or here (Virt's Roleplay Presets - recommended). > Use the latest version of KoboldCpp. Use the provided presets for testing. > Feedback and support for the Authors is always welcome. > If there are any issues or questions let me know. > [!NOTE] > For 8GB VRAM GPUs, I recommend the Q4KM-imat (4.89 BPW) quant for up to 12288 context sizes. Llama3 trained on our RP datasets, we tried to have a balance between the ERP and the RP, not too horny, but just enough. We also added some non-RP dataset, making the model less dumb overall. It should look like a 40%/60% ratio for Non-RP/RP+ERP data. This model includes the new Luminae dataset from Ikari. This model have received the Orthogonal Activation Steering treatment, meaning it will rarely refuse any request. If you consider trying this model please give us some feedback either on the Community tab on hf or on our Discord Server. This repo contains FP16 files of Lumimaid-8B-v0.1-OAS. Training data used: - Aesir datasets - NoRobots - limarp - 8k ctx - toxic-dpo-v0.1-sharegpt - ToxicQAFinal - Luminae-i1 (70B/70B-alt) (i2 was not existing when the 70b started training) | Luminae-i2 (8B) (this one gave better results on the 8b) - Ikari's Dataset - Squish42/bluemoon-fandom-1-1-rp-cleaned - 50% (randomly) - NobodyExistsOnTheInternet/PIPPAsharegptv2test - 5% (randomly) - cgato/SlimOrcaDedupCleaned - 5% (randomly) - Airoboros (reduced) - Capybara (reduced) - Initial LumiMaid 8B Finetune - Undi95/Llama-3-Unholy-8B-e4 - Undi95/Llama-3-LewdPlay-8B IkariDev: Visit my retro/neocities style website please kek
CaptainErisNebula-12B-Chimera-v1.1-GGUF-IQ-Imatrix
> [!TIP] > # GGUF quants for Nitral-AI/CaptainErisNebula-12B-Chimera-v1.1's recipe. > [!IMPORTANT] > Author recommended initial SillyTavern presets: > - Chimera: v1.1 > - (backup)(633788596ef1810cc9ea9abd375bbda3e568bd37) > [!NOTE] > ## This is an improvement on the previous experimental version. > - Not "chaotic", and at a usable size for most people seeking to perform inference locally with good speeds. > - The model does not show excessive alignment, so it should be good for most scenarios/writing situations. > - Feel free to use some light system prompting to nudge it out of a blocker if needed. > - It does well in adhering to characters and instructions. Thank you so much, "crazy chef" and "mad scientist", Nitral!
Kunoichi-DPO-v2-7B-GGUF-Imatrix
> [!TIP] > Support: > My upload speeds have been cooked and unstable lately. > Realistically I'd need to move to get a better provider. > If you want and you are able to... > You can support my various endeavors here (Ko-fi). > I apologize for disrupting your experience. GGUF-Imatrix quantizations for SanjiWatsuki/Kunoichi-DPO-v2-7B. It stands for Importance Matrix, a technique used to improve the quality of quantized models. The Imatrix is calculated based on calibration data, and it helps determine the importance of different model activations during the quantization process. The idea is to preserve the most important information during quantization, which can help reduce the loss of model performance. One of the benefits of using an Imatrix is that it can lead to better model performance, especially when the calibration data is diverse. More information: [[1]](https://github.com/ggerganov/llama.cpp/discussions/5006) [[2]](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) If you want any specific quantization to be added, feel free to ask. `Base⇢ GGUF(F16)⇢ Imatrix-Data(F16)⇢ GGUF(Imatrix-Quants)` For --imatrix data, `imatrix-Kunoichi-DPO-v2-7B-F16.dat` was used. | Model | MT Bench | EQ Bench | MMLU | Logic Test | |----------------------|----------|----------|---------|-------------| | GPT-4-Turbo | 9.32 | - | - | - | | GPT-4 | 8.99 | 62.52 | 86.4 | 0.86 | | Kunoichi-DPO-v2-7B | 8.51 | 42.18 | 64.94| 0.58 | | Mixtral-8x7B-Instruct| 8.30 | 44.81 | 70.6 | 0.75 | | Kunoichi-DPO-7B | 8.29 | 41.60 | 64.83 | 0.59 | | Kunoichi-7B | 8.14 | 44.32 | 64.9 | 0.58 | | Starling-7B | 8.09 | - | 63.9 | 0.51 | | Claude-2 | 8.06 | 52.14 | 78.5 | - | | Silicon-Maid-7B | 7.96 | 40.44 | 64.7 | 0.54 | | Loyal-Macaroni-Maid-7B | 7.95 | 38.66 | 64.9 | 0.57 | | GPT-3.5-Turbo | 7.94 | 50.28 | 70 | 0.57 | | Claude-1 | 7.9 | - | 77 | - | | Openchat-3.5 | 7.81 | 37.08 | 64.3 | 0.39 | | Dolphin-2.6-DPO | 7.74 | 42.88 | 61.9 | 0.53 | | Zephyr-7B-beta | 7.34 | 38.71 | 61.4 | 0.30 | | Llama-2-70b-chat-hf | 6.86 | 51.56 | 63 | - | | Neural-chat-7b-v3-1 | 6.84 | 43.61 | 62.4 | 0.30 | | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench | |---|---:|---:|---:|---:|---:| | Kunoichi-DPO-7B|58.4| 45.08 | 74| 66.99| 47.52| | Kunoichi-DPO-v2-7B|58.31| 44.85| 75.05| 65.69| 47.65| | Kunoichi-7B|57.54| 44.99| 74.86| 63.72| 46.58| | OpenPipe/mistral-ft-optimized-1218| 56.85 | 44.74 | 75.6 | 59.89 | 47.17 | | Silicon-Maid-7B | 56.45| 44.74| 74.26| 61.5| 45.32| | mlabonne/NeuralHermes-2.5-Mistral-7B | 53.51 | 43.67 | 73.24 | 55.37 | 41.76 | | teknium/OpenHermes-2.5-Mistral-7B | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 | | openchat/openchat3.5 | 51.34 | 42.67 | 72.92 | 47.27 | 42.51 | | berkeley-nest/Starling-LM-7B-alpha | 51.16 | 42.06 | 72.72 | 47.33 | 42.53 | | HuggingFaceH4/zephyr-7b-beta | 50.99 | 37.33 | 71.83 | 55.1 | 39.7 | | Model | AlpacaEval2 | Length | | --------------------------- | ----------- | ------ | | GPT-4 | 23.58% | 1365 | | GPT-4 0314 | 22.07% | 1371 | | Mistral Medium | 21.86% | 1500 | | Mixtral 8x7B v0.1 | 18.26% | 1465 | | Kunoichi-DPO-v2 | 17.19% | 1785 | | Claude 2 | 17.19% | 1069 | | Claude | 16.99% | 1082 | | Gemini Pro | 16.85% | 1315 | | GPT-4 0613 | 15.76% | 1140 | | Claude 2.1 | 15.73% | 1096 | | Mistral 7B v0.2 | 14.72% | 1676 | | GPT 3.5 Turbo 0613 | 14.13% | 1328 | | LLaMA2 Chat 70B | 13.87% | 1790 | | LMCocktail-10.7B-v1 | 13.15% | 1203 | | WizardLM 13B V1.1 | 11.23% | 1525 | | Zephyr 7B Beta | 10.99% | 1444 | | OpenHermes-2.5-Mistral (7B) | 10.34% | 1107 | | GPT 3.5 Turbo 0301 | 9.62% | 827 | | Kunoichi-7B | 9.38% | 1492 | | GPT 3.5 Turbo 1106 | 9.18% | 796 | | GPT-3.5 | 8.56% | 1018 | | Phi-2 DPO | 7.76% | 1687 | | LLaMA2 Chat 13B | 7.70% | 1513 |
Nyanade_Stunna-Maid-7B-v0.2-GGUF-IQ-Imatrix
> [!TIP] > Support: > My upload speeds have been cooked and unstable lately. > Realistically I'd need to move to get a better provider. > If you want and you are able to... > You can support my various endeavors here (Ko-fi). > I apologize for disrupting your experience. #Roleplay #Multimodal #Vision #Based #Unhinged #Unaligned In this repository you can find GGUF-IQ-Imatrix quants for ChaoticNeutrals/NyanadeStunna-Maid-7B-v0.2 and if needed you can get some basic SillyTavern presets here, if you have issues with repetitiveness or lack or variety in responses I recommend changing the Temperature to 1.15, MinP to 0.075, RepPen to 1.15 and RepPenRange to 1024. > [!TIP] > Vision: > This is a #multimodal model that also has optional #vision capabilities. Expand the relevant sections bellow and read the full card information if you also want to make use that functionality. > > Quant options: > Reading bellow you can also find quant option recommendations for some common GPU VRAM capacities. "Unhinged RP with the spice of the previous 0.420 remixes, 32k context and vision capabilities." ⇲ Click here to expand/hide general common recommendations. Assuming a context size of 8192 for simplicity and 1GB of Operating System VRAM overhead with some safety margin to avoid overflowing buffers... For 11-12GB VRAM: A GPU with 11-12GB of VRAM capacity can comfortably use the Q6K-imat quant option and run it at good speeds. This is the same with or without using #vision capabilities. For 8GB VRAM: If not using #vision, for GPUs with 8GB of VRAM capacity the Q5KM-imat quant option will fit comfortably and should run at good speeds. If you are also using #vision from this model opt for the Q4KM-imat quant option to avoid filling the buffers and potential slowdowns. For 6GB VRAM: If not using #vision, for GPUs with 6GB of VRAM capacity the IQ3M-imat quant option should fit comfortably to run at good speeds. If you are also using #vision from this model opt for the IQ3XXS-imat quant option. ⇲ Click here to expand/hide more information about this topic. The latest of llama.cpp available at the time was used, with imatrix-with-rp-ex.txt as calibration data. ⇲ Click here to expand/hide more information about this topic. It stands for Importance Matrix, a technique used to improve the quality of quantized models. The Imatrix is calculated based on calibration data, and it helps determine the importance of different model activations during the quantization process. The idea is to preserve the most important information during quantization, which can help reduce the loss of model performance, especially when the calibration data is diverse. [[1]](https://github.com/ggerganov/llama.cpp/discussions/5006) [[2]](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) > [!NOTE] > For imatrix data generation, kalomaze's `groupsmerged.txt` with additional roleplay chats was used, you can find it here for reference. This was just to add a bit more diversity to the data with the intended use case in mind. ⇲ Click here to expand/hide how this would work in practice in a roleplay chat. ⇲ Click here to expand/hide how your SillyTavern Image Captions extension settings should look. > [!WARNING] > To use the multimodal capabilities of this model, such as vision, you also need to load the specified mmproj file, you can get it here or as uploaded in the mmproj folder in the repository. 1: Make sure you are using the latest version of KoboldCpp. 2: Load the mmproj file by using the corresponding section in the interface: 2.1: For CLI users, you can load the mmproj file by adding the respective flag to your usual command:
Lumimaid-v0.2-8B-GGUF-IQ-Imatrix
My GGUF-IQ-Imatrix quants for NeverSleep/Lumimaid-v0.2-8B. I recommend checking their page for feedback and support. > [!IMPORTANT] > Quantization process: > Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. > This is a bit more disk and compute intensive but hopefully avoids any losses during conversion. > To run this model, please use the latest version of KoboldCpp. > If you noticed any issues let me know in the discussions. > [!NOTE] > Presets: > Llama-3. > > Some compatible SillyTavern presets can be found here (Virt's Roleplay Presets - v1.9). > Check discussions such as this one and this one for other presets and samplers recommendations. > Lower temperatures are recommended by the authors, so make sure to experiment. > > General usage with KoboldCpp: > For 8GB VRAM GPUs, I recommend the Q4KM-imat (4.89 BPW) quant for up to 12288 context sizes without the use of `--quantkv`. > Using `--quantkv 1` (≈Q8) or even `--quantkv 2` (≈Q4) can get you to 32K context sizes with the caveat of not being compatible with Context Shifting, only relevant if you can manage to fill up that much context. > Read more about it in the release here. ⇲ Click here to expand/hide information – General chart with relative quant parformances. > [!NOTE] > Recommended read: > > "Which GGUF is right for me? (Opinionated)" by Artefact2 > > Click the image to view full size. > > [!TIP] > Personal-support: > I apologize for disrupting your experience. > Eventually I may be able to use a dedicated server for this, but for now hopefully these quants are helpful. > If you want and you are able to... > You can spare some change over here (Ko-fi). > > Author-support: > You can support the authors at their pages/here. This model is based on: Meta-Llama-3.1-8B-Instruct Wandb: https://wandb.ai/undis95/Lumi-Llama-3-1-8B?nw=nwuserundis95 Lumimaid 0.1 -> 0.2 is a HUGE step up dataset wise. As some people have told us our models are sloppy, Ikari decided to say fuck it and literally nuke all chats out with most slop. Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back! - Epiculous/Gnosis - ChaoticNeutrals/LuminousOpus - ChaoticNeutrals/Synthetic-Dark-RP - ChaoticNeutrals/Synthetic-RP - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned - Gryphe/Opus-WritingPrompts - meseca/writing-opus-6k - meseca/opus-instruct-9k - PJMixers/grimulkantheory-of-mind-ShareGPT - NobodyExistsOnTheInternet/ToxicQAFinal - Undi95/toxic-dpo-v0.1-sharegpt - cgato/SlimOrcaDedupCleaned - kalomaze/OpusInstruct25k - Doctor-Shotgun/no-robots-sharegpt - Norquinal/claudemultiroundchat30k - nothingiisreal/Claude-3-Opus-Instruct-15K - All the Aesirs dataset, cleaned, unslopped - All le luminae dataset, cleaned, unslopped - Small part of Airoboros reduced We sadly didn't find the sources of the following, DM us if you recognize your set ! - OpusInstruct-v2-6.5K-Filtered-v2-sharegpt - claudesharegpttrimmed - CapybaraPureDecontaminated-ShareGPTreduced Datasets credits: - Epiculous - ChaoticNeutrals - Gryphe - meseca - PJMixers - NobodyExistsOnTheInternet - cgato - kalomaze - Doctor-Shotgun - Norquinal - nothingiisreal IkariDev: Visit my retro/neocities style website please kek
Llama-3.1-8B-ArliAI-RPMax-v1.2-GGUF-IQ-ARM-Imatrix
L3-8B-Stheno-v3.1-GGUF-IQ-Imatrix
DS-R1-Qwen3-8B-ArliAI-RpR-v4-Small-GGUF-IQ-Imatrix
Fimbulvetr-11B-v2-GGUF-IQ-Imatrix
L3-8B-Stheno-v3.3-32K-GGUF-IQ-Imatrix
Lumimaid-v0.2-12B-GGUF-IQ-Imatrix
Erosumika 7B V3 0.2 GGUF IQ Imatrix
This repo contains GGUF-IQ-Imatrix quantized model files for Erosumika-7B-v3-0.2. "Q4KM", "Q4KS", "IQ4XS", "Q5KM", "Q5KS", "Q6K", "Q80", "IQ3M", "IQ3S", "IQ3XXS" It stands for Importance Matrix, a technique used to improve the quality of quantized models. The Imatrix is calculated based on calibration data, and it helps determine the importance of different model activations during the quantization process. The idea is to preserve the most important information during quantization, which can help reduce the loss of model performance, especially when the calibration data is diverse. [[1]](https://github.com/ggerganov/llama.cpp/discussions/5006) [[2]](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) For imatrix data generation, kalomaze's `groupsmerged.txt` with added roleplay chats was used, you can find it here. This was just to add a bit more diversity to the data. Model Details The Mistral 0.2 version of Erosumika-7B-v3, a DARE TIES merge between Nitral's Kunocchini-7b, Endevor's InfinityRP-v1-7B and my FlatErosAlpha, a flattened(in order to keep the vocab size 32000) version of tavtav's eros-7B-ALPHA. Alpaca and ChatML work best. Slightly smarter and better prompt comprehension than Mistral 0.1 Erosumika-7B-v3. 32k context should work. Limitations and biases The intended use-case for this model is fictional writing for entertainment purposes. Any other sort of usage is out of scope. It may produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. Outputs might often be factually wrong or misleading.
Kunocchini-7b-128k-test-GGUF-Imatrix
Violet_Magcap-12B-GGUF-IQ-Imatrix
Captain-Eris_Twighlight-V0.420-12B-GGUF-ARM-Imatrix
> [!TIP] > Updated! > Please grab "v2" quants remade with the new tokenizer settings to fix the endless generation issues. > [!NOTE] > SillyTavern > The complete AIO recommended preset: > v2-SillyTavern-Presets-AIO-2024-12-28.json My GGUF-ARM-Imatrix quants of Captain-ErisTwighlight-V0.420-12B.
Aura_L3_8B-GGUF-IQ-Imatrix
Llama-3.1-8B-Stheno-v3.4-GGUF-IQ-Imatrix
Aura_Uncensored_l3_8B-GGUF-IQ-Imatrix
MN-BackyardAI-Party-12B-v1-GGUF-IQ-ARM-Imatrix
Eris_PrimeV3-Vision-7B-GGUF-IQ-Imatrix
Mistral-Nemo-12B-ArliAI-RPMax-v1.2-GGUF-IQ-ARM-Imatrix
Llama 3.1 8B ArliAI Formax V1.0 GGUF IQ ARM Imatrix
My quants for ArliAI/Llama-3.1-8B-ArliAI-Formax-v1.0. "Formax is a model that specializes in following response format instructions. Tell it the format of it's response and it will follow it perfectly. Great for data processing and dataset creation tasks." "It is also a highly uncensored model that will follow your instructions very well."
Poppy Porpoise 0.72 L3 8B GGUF IQ Imatrix
> [!TIP] > Support: > My upload speeds have been cooked and unstable lately. > Realistically I'd need to move to get a better provider. > If you want and you are able to, you can support that endeavor and others here (Ko-fi). I apologize for disrupting your experience. GGUF-IQ-Imatrix quants for ChaoticNeutrals/PoppyPorpoise-0.72-L3-8B. "Poppy Porpoise" is a cutting-edge AI roleplay assistant based on the Llama 3 8B model, specializing in crafting unforgettable narrative experiences. With its advanced language capabilities, Poppy expertly immerses users in an interactive and engaging adventure, tailoring each adventure to their individual preferences. Recomended ST Presets:(Updated for 0.72) Porpoise Presets # To use the multimodal capabilities of this model and use vision you need to load the specified mmproj file, this can be found inside this model repo. Llava MMProj You can load the mmproj by using the corresponding section in the interface:
CaptainErisNebula-12B-AOE-v1-GGUF-IQ-Imatrix
MN-12B-Mag-Mell-R1-GGUF-IQ-ARM-Imatrix
Llama-3-Soliloquy-8B-v2-GGUF-IQ-Imatrix
L3-TheSpice-8b-v0.8.3-GGUF-IQ-Imatrix
llama-3-Stheno-Mahou-8B-GGUF-IQ-Imatrix
Eris_PrimeV3.05-Vision-7B-GGUF-IQ-Imatrix
Halu-8B-Llama3-Blackroot-GGUF-IQ-Imatrix
L3-TheSpice-8b-v0.1.3-GGUF-IQ-Imatrix
Azure_Dusk-v0.2-GGUF-IQ-Imatrix
Visual-LaylelemonMaidRP-7B-GGUF-IQ-Imatrix
Aura_v2_7B-GGUF-IQ-Imatrix
BuRP_7B-GGUF-IQ-Imatrix
Average_Normie_v3.69_8B-GGUF-IQ-Imatrix
llama3-8B-DarkIdol-1.0-GGUF-IQ-Imatrix-Request
Infinitely-Laydiculous-9B-GGUF-IQ-Imatrix
Qwen2-7B-Instruct-abliterated-GGUF-IQ-Imatrix-Request
Eris_PrimeV4-Vision-32k-7B-GGUF-IQ-Imatrix
Irix-12B-Reasoner-v.0.2-GGUF-IQ-Imatrix
Poppy_Porpoise-v0.7-L3-8B-GGUF-IQ-Imatrix
Captain-Eris_Violet-GRPO-v0.420-GGUF-IQ-Imatrix
Hello, travelers! These are my GGUF-IQ-Imatrix quants of Captain-ErisViolet-GRPO-v0.420. > [!TIP] > Discussions > - General discussion and author feedback. > Feedback is always welcome for potential issues with quants and as a way to guide the author in the future iterations. Your comments for them are appreciated! > [!NOTE] > SillyTavern > - [[SillyTavern Presets]](https://huggingface.co/Lewdiculous/Captain-ErisViolet-GRPO-v0.420-GGUF-IQ-Imatrix/tree/main/SillyTavern) > Initially recommended master-import presets.
InfinityRP-v2-8B-GGUF-IQ-Imatrix
llama3-8B-aifeifei-1.1-GGUF-IQ-Imatrix
SOVL_Llama3_8B-GGUF-IQ-Imatrix
opus-v1.2-7b-GGUF-IQ-Imatrix
firefly-gemma-7b-GGUF-IQ-Imatrix
Kukul-Stanta-0.420-32k-7B-0.2-GGUF-IQ-Imatrix
Orthocopter_8B-GGUF-Imatrix
InfinityRP-v1-7B-GGUF-IQ-Imatrix
L3-Umbral-Mind-RP-v1.0-8B-GGUF-IQ-Imatrix
RP_Vision_7B-GGUF-IQ-Imatrix
Violet_Twilight-v0.2-GGUF-IQ-Imatrix
Chaos_RP_l3_8B-GGUF-IQ-Imatrix
Test1_SLIDE-GGUF-IQ-Imatrix
llama3-8B-aifeifei-1.0-GGUF-IQ-Imatrix
lwd-Mirau-7b-RP-Merged-GGUF-IQ-Imatrix
Bungo-L3-8B-GGUF-IQ-Imatrix-Request
Nyanade_Stunna-Maid-7B-GGUF-IQ-Imatrix
Nina-v2-7B-GGUF-IQ-Imatrix
llama-3-cat-8b-instruct-v1-GGUF-IQ-Imatrix
mini-magnum-12b-v1.1-GGUF-IQ-Imatrix
Loyal-Toppy-Bruins-Maid-7B-DARE-GGUF-Imatrix
L3.1-8B-Niitama-v1.1-GGUF-IQ-Imatrix
Poppy_Porpoise-1.0-L3-8B-GGUF-IQ-Imatrix
Eris-Lelanacles-7b-GGUF-IQ-Imatrix
Llama-3-8B-Irene-v0.1-GGUF-IQ-Imatrix
mistral-7b-v0.1-layla-v4-GGUF-IQ-Imatrix
Mahou-1.2-llama3-8B-GGUF-IQ-Imatrix
Eris_Remix_7B-GGUF-IQ-Imatrix
flammen13-mistral-7B-GGUF-IQ-Imatrix
LLaMa-3-CursedStock-v1.8-8B-GGUF-IQ-Imatrix-Request
Prima-LelantaclesV5-7b-GGUF
Puppy_Purpose_0.69-GGUF-IQ-Imatrix
BuRPInfinity_9B-GGUF-IQ-Imatrix
Captain-Eris-Diogenes_Twilight-V0.420-12B-GGUF-ARM-Imatrix
llama3-8B-aifeifei-1.3-GGUF-IQ-Imatrix
Eris_Floramix_DPO_7B-GGUF-Imatrix
Hathor-L3-8B-v.01-GGUF-IQ-Imatrix
DaturaCookie_7B-GGUF-IQ-Imatrix
ogno-monarch-jaskier-merge-7b-OH-PREF-DPO-GGUF-IQ-Imatrix
Eris-Daturamix-7b-v2-GGUF-IQ-Imatrix
Rawr_Llama3_8B-GGUF-IQ-Imatrix
Paradigm_Shift_7B-GGUF-IQ-Imatrix
Poppy_Porpoise-v0.4-L3-8B-GGUF-IQ-Imatrix
Eris_PrimeV4.69-Vision-32k-7B-GGUF-Imatrix
Erosumika-7B-GGUF-IQ-Imatrix
Eris_PrimeV3.075-Vision-7B-GGUF-IQ-Imatrix-Test
Layris_9B-GGUF-IQ-Imatrix
Nera_Noctis-12B-GGUF-ARM-Imatrix
FuseChat-Kunoichi-10.7B-GGUF-IQ-Imatrix
Average_Normie_l3_v1_8B-GGUF-IQ-Imatrix
Prima-LelantaclesV6-7b-GGUF-IQ-Imatrix
Poppy_Porpoise-v0.2-L3-8B-GGUF-IQ-Imatrix
experimental-lwd-Mirau-RP-14B-GGUF-IQ-Imatrix
Neural-SOVLish-Devil-8B-L3-GGUF-IQ-Imatrix
llama3-8B-aifeifei-1.2-GGUF-IQ-Imatrix
Eris_7B-GGUF-IQ-Imatrix
phencyclidine-8b-v1-GGUF-IQ-Imatrix
llama3-8B-feifei-1.0-GGUF-IQ-Imatrix
KukulStanta-7B-GGUF-IQ-Imatrix
opus-v1.2-llama-3-8b-GGUF-IQ-Imatrix
Eris-Beach_Day-7b-GGUF-IQ-Imatrix
Eris-Daturamix-7b-GGUF-IQ-Imatrix
Infinitely-Laydiculous-7b-longtext-GGUF-IQ-Imatrix
RoleBeagle-11B-GGUF-IQ-Imatrix
Eris_PrimeV4.20-Vision-32k-7B-GGUF-IQ-Imatrix
Infinitely-Laydiculous-7B-GGUF-IQ-Imatrix
Kool-Aid_7B-GGUF-IQ-Imatrix
WestLake-10.7B-v2-GGUF-IQ-Imatrix
Aurora_l3_8B-GGUF-IQ-Imatrix
InfinityNexus_9B-GGUF-IQ-Imatrix
Paradigm_7B-GGUF-IQ-Imatrix
DarkSapling-7B-v2.0-GGUF-IQ-Imatrix
Kunocchini-1.2-7b-longtext-GGUF-Imatrix
Persephone_7B-GGUF-IQ-Imatrix
Datura_7B-GGUF-Imatrix
Eris_PrimeV3.075-Vision-7B-Longtext-test-GGUF-IQ-Imatrix-Test
LemonadeRP-4.5.3-GGUF-IQ-Imatrix
Copium-Cola-9B-GGUF-IQ-Imatrix
Test0_SLIDE-GGUF-IQ-Imatrix
flammen10-mistral-7B-GGUF-IQ-Imatrix-Testing
Prima-LelantaclesV6.69-7b-GGUF-IQ-Imatrix
Aura_7B-GGUF-IQ-Imatrix
Bepis_9B-GGUF-IQ-Imatrix
Elly_7B-GGUF-IQ-Imatrix
Test2_SLIDE-GGUF-IQ-Imatrix
Poppy_Porpoise-v0.6-L3-8B-GGUF-IQ-Imatrix
Erosumika-7B-v2-GGUF-IQ-Imatrix
Asherah_7B-GGUF-IQ-Imatrix
Multi-Verse-RP-7B-GGUF-IQ-Imatrix
InfiniteBuRP_7B-GGUF-IQ-Imatrix
kuno-kunoichi-v1-DPO-v2-SLERP-7B-GGUF-IQ-Imatrix
duloxetine-4b-v1-GGUF-IQ-Imatrix
Moistral-11B-v2-GGUF-IQ-Imatrix-Testing
Sinerva_7B-GGUF-IQ-Imatrix
TheSpice-7b-v0.1.1-GGUF-IQ-Imatrix
EndlessRP-v3-7B-GGUF-Imatrix
Prodigy_7B-GGUF-Imatrix
Eris_PrimeV4-Vision-7B-GGUF-IQ-Imatrix
Mika-Longtext-7b-GGUF-IQ-Imatrix
Prima-LelantaclesV7-experimental-7b-GGUF-IQ-Imatrix
kukulemon-7B-GGUF-IQ-Imatrix
Prima-LelantaclesV4-7b-16k-GGUF
Sonya-7B-GGUF-IQ-Imatrix
flammen18X-mistral-7B-GGUF-IQ-Imatrix
T.E-8.1-GGUF-IQ-Imatrix-Request
Prima-LelantaclesV6.3-7b-GGUF-IQ-Imatrix
mistral-7b-v0.2-layla-v4-GGUF-IQ-Imatrix
Eris-Prime-Punch-9B-GGUF-IQ-Imatrix
rogue-enchantress-32k-7B-GGUF-IQ-Imatrix
Pasta-Lake-7b-GGUF
Mika-Lelantacles-7b-Longtext-GGUF-IQ-Imatrix
InfinityNoodleRP-7b-GGUF-IQ-Imatrix
Irene-RP-v2-7B-GGUF-IQ-Imatrix
lwd-Mirau-7b-RP-Merged
> [!NOTE] > LoRA by mouseEliauk: > https://modelscope.cn/models/mouseEliauk/mirau-7b-RP-base > [!TIP] > Experimental quants for testing: > lwd-Mirau-7b-RP-Merged-GGUF-IQ-Imatrix Introduction mirau-7b-RP-base is a first-person narrative language model that transforms simple user actions into vivid storytelling, complete with environmental descriptions, psychological activities, and plot progression. I call this concept "Action-to-Narrative Render" - a way to render actions into immersive narratives. To ensure coherent storytelling, I developed a unique training method called "story flow chain of thought". In essence, it enables the model to weave each user input with previous context, creating a continuous narrative flow. This makes it perfect for text-based adventures, mystery stories, or simply exploring your imagination. You can have a try at modelscope:mirau-RP-7b-base-demo ⚠️ Important Notes This is a base version model - note that "base" here doesn't refer to a traditional pretrained base model, but rather indicates that this version: - Only supports first-person narrative perspective - Not suitable for dialogue interactions (outputs may be unstable) - Best used for single-character narrative experiences Input Types The model accepts various input commands, marked with parentheses (): 1. Basic Actions: Simple, everyday behaviors - Examples: `(I put on my clothes)`, `(I take a sip of water)`, `(I sit down)` 2. Exploration Actions: Interactions with the environment - Examples: `(I look around)`, `(I approach the wooden box)`, `(I push open the door)` 3. Inner Monologue: Character's thoughts and feelings - Examples: `(What's wrong here?)`, `(This feels strange)` 4. Observation: Focused attention on specific objects or scenes - Examples: `(examine the wooden box closely)`, `(listen to the surrounding sounds)` bash pip install ms-swift[llm] -U bash RAYmemorymonitorrefreshms=0 CUDAVISIBLEDEVICES=0 swift deploy\ --modeltype qwen25\ --model qwen/Qwen2.5-7B-Instruct\ --adapters mirau=mirau-7b-RP-base\ # the Lora you download --inferbackend vllm --maxbatchsize 1\ --maxlength 8192 \ --maxmodellen 8192 \ --port 8886 \ --host 0.0.0.0\ --vllmmaxlorarank 128\ --mergelora true # This merges the LoRA into the model, resulting in approximately 14GB of weights ```
llama.cpp-11077-test-01
SLIDE-v2-GGUF-IQ-Imatrix
fireblossom-32K-7B-GGUF-IQ-Imatrix
Model-Requests
> [!IMPORTANT] > # Status: > Quant-Requests are PAUSED momentarily due to external circumstances. > I sincerely apologize for disrupting your experience! Only if you want to and you are able... > You can support my personal endeavours here (Ko-fi). > Eventually I want to be able set aside resources for a dedicated infrastructure. > In the meantime, I'll be working to provide whenever possible with the resources available at the time. --> [!TIP] > Quant-Requests are open. > I apologize for disrupting your experience. > Only if you want to and you are able... > You can support my personal endeavours here (Ko-fi). > Eventually I want to be able set aside resources for a dedicated infrastructure. > In the meantime, I'll be working to provide whenever possible with the resources available at the time. Welcome to my GGUF-IQ-Imatrix Model Quantization Requests card! This card is meant only to request GGUF-IQ-Imatrix quants for models that meet the requirements below. Requirements to request GGUF-Imatrix model quantizations: For the model: - Maximum model parameter size of ~~11B~~ 12B. Small note is that models sizes larger than 8B parameters may take longer to process and upload than the smaller ones. At the moment I am unable to accept requests for larger models due to hardware/time limitations. Preferably for Mistral and LLama-3 based models in the creative/roleplay niche. If you need quants for a bigger model, you can try requesting at mradermacher's. He's doing an amazing work. Important: - Fill the request template as outlined in the next section. 1. Open a New Discussion titled "`Request: Model-Author/Model-Name`", for example, "`Request: Nitral-AI/Infinitely-Laydiculous-7B`", without the quotation marks. 2. Include the following template in your new discussion post, you can just copy and paste it as is, and fill the required information by replacing the {{placeholders}} (example request here):