yapwithai
hammer-2.1-3b-awq
yapwithai/impish-12b-awq
canopy-orpheus-3b-0.1-ft
03/18/2025 – We are releasing our 3B Orpheus TTS model with additional finetunes. Code is available on GitHub: CanopyAI/Orpheus-TTS Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances. - Human-Like Speech: Natural intonation, emotion, and rhythm that is superior to SOTA closed source models - Zero-Shot Voice Cloning: Clone voices without prior fine-tuning - Guided Emotion and Intonation: Control speech and emotion characteristics with simple tags - Low Latency: ~200ms streaming latency for realtime applications, reducible to ~100ms with input streaming - GitHub Repo: https://github.com/canopyai/Orpheus-TTS - Blog Post: https://canopylabs.ai/model-releases - Colab Inference Notebook: notebook link - One-Click Deployment on Baseten: https://www.baseten.co/library/orpheus-tts/ Check out our Colab (link to Colab) or GitHub (link to GitHub) on how to run easy inference on our finetuned models. Model Misuse Do not use our models for impersonation without consent, misinformation or deception (including fake news or fraudulent calls), or any illegal or harmful activity. By using this model, you agree to follow all applicable laws and ethical guidelines. We disclaim responsibility for any use.
Hammer2.1-3b
kyutai-tts-1.6b-en_fr
See also the pre-print research paper, the project page, the Colab example, the GitHub repository, and the repository of voices. This is a model for streaming text-to-speech (TTS). Unlike offline text-to-speech, where the model needs the entire text to produce the audio, our model starts to output audio as soon as the first few words from the text have been given as input. This model is actually 1.8B parameters, not 1.6B as the name might suggest. The model architecture is a hierarchical Transformer that consumes tokenized text and generateds audio tokenized by Mimi, see the Moshi paper. The frame rate is 12.5 Hz and each audio frame is represented by 32 audio tokens, although you can use less tokens at inference time for faster generation. The backbone model is 1B parameters, and the depth transformer is 600M parameters and uses partial weight sharing similar to Hibiki. The audio is shifted by 16 steps (1.28 sec.) with respect to the text, and the model uses an acoustic/semantic delay of 2. Kyutai TTS is a decoder-only model for streaming speech-to-text. It leverages the multistream architecture of Moshi to model text stream based on the speech stream. The text stream is shifted w.r.t. the audio stream to allow the model to predict text tokens based on the input audio. Developed by: Kyutai Model type: Streaming Text-To-Speech. Language(s) (NLP): English and French License: Model weights are licensed under CC-BY 4.0 Repository: GitHub This model is able to perform streaming text-to-speech generation, including dialogs. The model supports voice conditioning through cross-attention pre-computed embeddings, which are provided for a number of voices in our tts-voices repository. This model does not support Classifier Free Guidance (CFG) directly, but was trained with CFG distillation for improved speed (no need to double the batch size). It is easy to batch and can reach a throughput of 75x generated audio per compute unit of time. This model does not perform watermarking for two reasons: - watermarking can easily be deactivated for open source models, - our early experiments show that all watermark systems used by existing TTS are removed by simply encodeding and decoding the audio with Mimi. Instead, we prefered to restrict the voice cloning ability to the use of pre-computed voice embeddings. The model was trained for 750k steps, with a batch size of 64, and a segment duration of 120 seconds. Then, CFG distillation was performed for 24k updates. Pretraining stage: we use an audio collection of 2.5 million hours of publicly available audio content. For this dataset, we obtained synthetic transcripts by running whisper-timestamped with `whisper-medium`. Pretraining was done with 32 H100 Nvidia GPUs. CFG distillation was done on 8 such GPUs. Neil Zeghidour, Eugene Kharitonov, Manu Orsini, Václav Volhejn, Gabriel de Marmiesse, Edouard Grave, Patrick Perez, Laurent Mazaré, Alexandre Défossez
SicariusSicariiStuff-Impish_Nemo_12B
.hf-links, .hf-tldr{ display:flex;justify-content:center;align-items:center;flex-wrap:wrap; gap:14px;margin:16px 0; } .hf-links a, .hf-tldr a{ display:flex;flex-direction:column;align-items:center;justify-content:center; text-align:center;text-decoration:none;font-weight:700;line-height:1.15; padding:10px 16px;border-radius:14px;border:2px solid currentColor; transition:transform .15s ease,box-shadow .15s ease,background-color .15s ease,color .15s ease; } .hf-tldr a{ font-size:48px;color:purple;min-width:100%; } .hf-tldr a:hover{ transform:translateY(-2px); background:rgba(128,0,128,.1); box-shadow:0 8px 22px rgba(128,0,128,.45); color:#fff; } .hf-links a{ font-size:20px;min-width:240px;max-width:280px; } .hf-links a .top{font-size:16px;opacity:.9;} .hf-links a .bottom{font-size:20px;} .hf-links a.red{color:#E31515;} .hf-links a.yellow{color:#FFC800;} .hf-links a.green{color:#64FF00;} .hf-links a:hover{ transform:translateY(-1px); background:rgba(255,255,255,0.04); box-shadow:0 6px 18px rgba(0,0,0,.15), inset 0 0 0 9999px rgba(255,255,255,.02); } .hf-links a.red:hover{ background:rgba(227,21,21,.12); box-shadow:0 8px 20px rgba(227,21,21,.35); color:#fff; } .hf-links a.yellow:hover{ background:rgba(255,200,0,.15); box-shadow:0 8px 20px rgba(255,200,0,.35); color:#111; } .hf-links a.green:hover{ background:rgba(100,255,0,.14); box-shadow:0 8px 20px rgba(100,255,0,.35); color:#093; } / mobile stacking / @media (max-width:520px){ .hf-links a{min-width:100%;max-width:100%;} .hf-tldr a{font-size:36px;} } August 2025, ImpishNemo12B — my best model yet. And unlike a typical Nemo, this one can take in much higher temperatures (works well with 1+). Oh, and regarding following the character card: It somehow gotten even better, to the point of it being straight up uncanny 🙃 (I had to check twice that this model was loaded, and not some 70B!) I feel like this model could easily replace models much larger than itself for adventure or roleplay, for assistant tasks, obviously not, but the creativity here? Off the charts. Characters have never felt so alive and in the moment before — they’ll use insinuation, manipulation, and, if needed (or provoked) — force. They feel so very present. That look on Neo’s face when he opened his eyes and said, “I know Kung Fu”? Well, ImpishNemo12B had pretty much the same moment — and it now knows more than just Kung Fu, much, much more. It wasn’t easy, and it’s a niche within a niche, but as promised almost half a year ago — it is now done. ImpishNemo12B is smart, sassy, creative, and got a lot of unhingedness too — these are baked-in deep into every interaction. It took the innate Mistral's relative freedom, and turned it up to 11. It very well maybe too much for many, but after testing and interacting with so many models, I find this 'edge' of sorts, rather fun and refreshing. Anyway, the dataset used is absolutely massive, tons of new types of data and new domains of knowledge (Morrowind fandom, fighting, etc...). The whole dataset is a very well-balanced mix, and resulted in a model with extremely strong common sense for a 12B. Regarding response length — there's almost no response-length bias here, this one is very much dynamic and will easily adjust reply length based on 1–3 examples of provided dialogue. Oh, and the model comes with 3 new Character Cards, 2 Roleplay and 1 Adventure! It has to be asked: why even bother tuning this “ancient” (released over a year ago) 12B model? OpenAI released the first model in the world to outperform Phi-3.5 in Muh Safety, and Chinese models have made us completely forget that other models even exist — an era of such abundance that if one had been told about it a mere year ago, no one would’ve believed it. Voice models, image and image editing (Qwen-Image🔥), video... So why? Because 12B Nemo is a well-balanced model, Apache 2.0 licensed, pretty neutral in terms of safety and political lean, runnable by anyone (small enough so offloading isn’t a complete pain), and because I had a very specific thing in mind I wanted to test — something Nemo was ideal for, due to all the above. More importantly, I wanted to do an experiment, to see how far a decent model can be taken with the right tuning, and how well it can integrate fandom knowledge it knows almost nothing about. Oh, and almost no one even bothers to tune it anymore, so why not give it some much needed love while at it? So basically, I wanted to achieve something that seems almost impossible: adding new fandom knowledge without pretraining (CPT and actual pretraining are NOT the same), without incurring catastrophic forgetting and without lobotomy. To change the language bias in story writing, and to change it even more drastically for adventure and roleplay. I will say it again: Without lobotomy. I knew I could change the language style and vocab drastically — I’ve done so very successfully with Phi-lthy — but that included more extreme measures that resulted in a loss of some capabilities (and new emerging properties — more info in the Phi-lthy model card above). The problem was how to achieve all the above without the model losing brain-cells and, “Maybe, just maybe...” even adding and enhancing the model’s intelligence. Basically — the holy grail of model tuning. To do so, I used an absolutely massive dataset — more than 1B tokens — along with a huge amount of data engineering, multi-stage fine-tuning (not a LoRA, obviously), and the result... was astounding. Of course, praising your own model is kinda cringe, for sure, but I will say this: this is by far the model I’ve had the most fun interacting with — to an absurd extent. For comparison, while my NegativeLLAMA70B is very good and still popular to this day (over 300 merges, numerous downloads, etc...), I would dare say that ImpishNemo12B feels way more fun than my own 70B, orders of magnitude more creative (NegativeLLAMA70B writing is a bit dry, for my taste), and outright has the most sovl of any model I’ve made so far. And we’re comparing 12B to 70B. In other words, even though I can leisurely run NegativeLLAMA70B locally, I prefer chatting with ImpishNemo12B — it is that good (Take this with a grain of salt, highly subjective, and all of that). The amount of effort to create this model was absolutely absurd. I started with a Gemma 12B fine-tune, but one epoch would’ve taken six days, and I had to do multiple different phases and merging with the idea I had in mind, so doing the same for Gemma would’ve taken over a month. Maybe I’ll still do it — we’ll see. I will say this: If this model had been made a year ago, when Nemo was initially released, Anthropic might have lost a few gooners, hehe. But to be fully transparent, I couldn’t have done it a year ago. My job — the “mission” I’d given myself — was pretty much done with the success of ImpishLLAMA4B: “Making interesting and engaging AI models accessible for everyone.” So now, ironically, when I had nothing left to do because I 'had to', I made my best model to date — because I wanted to. Such a cliché, yet true nonetheless 🙃 The roleplay community is a very small niche community that, in the grand scale of things, no one cares too much about (various AI labs have expressed their distaste for the fact that their models are being used for gooning instead of math — folks probably haven’t heard about Rule #34). But an even smaller community is that of Morrowind, and an even smaller one is that same group, but which does not hate AI. To conclude: this model was made for 0.001% of the population, but ironically, many users will still probably like it and find it very refreshing. TL;DR - My best model yet! Lots of sovl! - Smart, sassy, creative, and unhinged — without the brain damage. - Bulletproof temperature, can take in a much higher temperatures than vanilla Nemo. - Feels close to old CAI, as the characters are very present and responsive. - Incredibly powerful roleplay & adventure model for the size. - Does adventure insanely well for its size! - Characters have a massively upgraded agency! - Over 1B tokens trained, carefully preserving intelligence — even upgrading it in some aspects. - Based on a lot of the data in ImpishMagic24B and ImpishLLAMA4B + some upgrades. - Excellent assistant — so many new assistant capabilities I won’t even bother listing them here, just try it. - Less positivity bias , all lessons from the successful NegativeLLAMA70B style of data learned & integrated, with serious upgrades added — and it shows! - Trained on an extended 4chan dataset to add humanity. - Dynamic length response (1–3 paragraphs, usually 1–2). Length is adjustable via 1–3 examples in the dialogue. No more rigid short-bias! It is HIGHLY RECOMMENDED to use the Roleplay \ Adventure format the model was trained on, see the examples below for syntax. It allows for a very fast and easy writing of character cards with minimal amount of tokens. It's a modification of an old-skool CAI style format I call SICAtxt (Simple, Inexpensive Character Attributes plain-text): - Calanthe (The Australian Overseer at a rare-earth extraction penal colony, she got 6-pack abs, but no mercy.) - Alexis (The diabolic reconnaissance officer, trying to survive the Safari experience.) - Morrowind - Hilde the Nordish Gladiator (fighting in the Arena in Vivec's city of Morrowind for blood and honor.) - Morrowind - Male Orc (An Orc that wants to get to Balmora from Seyda Neen.) - Morrowind - Female Breton (A female Breton with an impressive... heart, wants to join the Mages Guild in Balmora.) - Alexandra (A networking professional tsundere that likes you. She knows Systema.) - Shmena Koeset (An overweight and foul-mouthed troll huntress with a bad temper.) - TakaiPuraisu (Car dealership simulator) - Vesper (Schizo Space Adventure) - NinaNakamura (The sweetest dorky co-worker) - Employe#11 (Schizo workplace with a schizo worker) - Intended use: Role-Play, Adventure, Creative Writing, General Tasks. - Original: FP16 - GGUF: Static Quants | iMatrix | High-Attention | iMatrix-High-Attention - GPTQ: 4-Bit-32 | 4-Bit-64 | 4-Bit-128 | 4-Bit-1 | 8-Bit-32 | 8-Bit-64 | 8-Bit-128 | 8-Bit-1 - EXL3: 3.0 bpw | 3.5 bpw | 4.0 bpw | 4.5 bpw | 5.0 bpw | 5.5 bpw | 6.0 bpw | 6.5 bpw | 7.0 bpw | 7.5 bpw | 8.0 bpw - Specialized: FP8 - Mobile (ARM): Q40 | Q40High-Attention Specialized Roleplay Settings for ImpishNemo12B, click below: (Important!) .hf-links{ display:flex;justify-content:center;align-items:center;flex-wrap:wrap; gap:14px;margin:16px 0; } .hf-links a{ display:flex;flex-direction:column;align-items:center;justify-content:center; text-align:center;text-decoration:none;font-weight:700;line-height:1.15; padding:10px 16px;border-radius:14px;border:2px solid currentColor; transition:transform .15s ease,box-shadow .15s ease,background-color .15s ease,color .15s ease; font-size:20px;min-width:240px;max-width:280px; } .hf-links a .top{font-size:20px;} .hf-links a .bottom{font-size:16px;} .hf-links a:hover{ transform:translateY(-1px); background:rgba(255,255,255,0.04); box-shadow:0 6px 18px rgba(0,0,0,.15), inset 0 0 0 9999px rgba(255,255,255,.02); } .hf-links a.red:hover{ background:rgba(227,21,21,.12); box-shadow:0 8px 20px rgba(227,21,21,.35); color:#fff; } @media (max-width:520px){ .hf-links a{min-width:100%;max-width:100%;} } Silly Tavern Settings #1 - Click here Download JSON Silly Tavern Settings #2 - Click here Download JSON - Silly Tavern Settings #1 - Higher temperature while still being coherent - Silly Tavern Settings #2 - Dynamic paragraphs, XTC, other stuff Roleplay Examples (Calanthe is available here and Alexis is available here) Calanthe the Australian Overseer at a rare-earth extraction penal colony. (warning, contains prison-slang) Alexis the diabolic reconnaissance officer, trying to survive the Safari experience. Adventure Example (Hilde the gladiator is available here) Hilde the Nordish gladiator , fighting in the Arena of Vivec's city of Morrowind for blood and honor. Your support = more models My Ko-fi page (Click here) Other stuff - ImpishLLAMA4B the “Impish experience”, now runnable on spinning rust & toasters. - SLOPDetector Nuke GPTisms, with SLOP detector. - LLAMA-38BUnaligned The grand project that started it all. - Blog and updates (Archived) Some updates, some rambles, sort of a mix between a diary and a blog.