Minthy

23 models • 3 total models in database

Sort by:

RouWei-0.8

In depth retraining of Illustrious to achieve best prompt adherence, knowledge and state of the art performance. Dataset of 13M unique pictures (~4M with natural text captions) picked and balanced from over 25M of anime art, covers, digital illustrations, western media and other sources, including private datasets. More detailed description on Civitai Vpred version is now available. It works flawlessly out of box without any burning or related issues. Consider to use lower CFG (3..5), other generation settings are same. Some exotic and experimental samplers/schedulers untested. Key advantages: - Fresh and wast knowledge about characters (1 2), concepts, styles(list examples), cultural and related things - The best prompt adherence among SDXL anime models at the moment of release - Solved main problems with tags bleeding and biases, common for Illustrious, NoobAi and other checkpoints - Excellent aesthetics and knowledge across a wide range of styles (over 50,000 artists, including hundreds of unique cherry-picked datasets from private galleries, including those received from the artists themselves) - High flexibility and variety without stability tradeoff - No more annoying watermarks for popular styles thanks to clean dataset - Vibrant colors and smooth gradients without trace of burning, full range even with epsilon - Pure training from Illustrious v0.1 without involving third-party checkpoints, Loras, tweakers, etc. When you are prompting artist styles, especially mixing several, their tags MUST BE in a separate CLIP chunk. Add `BREAK` after it (for A1111 and derivatives), use conditioning concat node (for Comfy) or at least put them in the very end. Otherwise, significant degradation of results is likely. The model is designed to work both with short booru tag-based and long complex natural text prompts. The best result can be achieved using the combination of tags and some natural text phrases. For tags classic danbooru-style comma-separated tags without underscores were used. ~1..1.5 megapixel for txt2img, any AR with resolution multiple of 64 (1024x1024, 1152x, 1216x832,...). Eulera, CFG 4..8 for epsilon/3..5 for vpred, 20..28steps. LCM/PCM/DMD untested, cfg++ samplers work fine, some shedulers not working. Highresfix: x1.5 latent + denoise 0.6 or any gan + denoise 0.3..0.55. Please note that vpred version requires a lower CFG value. Quality tags: There are only 4: `masterpiece, best quality` for positive and `low quality, worst quality` for negative Nothing else. All except `low quality` in negative can be ommited. Meta tags like lowres have been removed, do not use them. Low resolution images have been either removed or upscaled and cleaned with DAT depending on their importance For best results keep it as clean as possible. Spamming of popular sequences will not improve results, since all related flaws have been solved, but will only lead to unwanted effects, biases and poor quality. The model knows over 35k of artist styles. List, grids with example on Mega. Used with `by `, will not work properly without it. Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you. About 4M of pitures from dataset have hybrid natural-text captions made by Claude, GPT, Gemini and ToriiGate Version 0.8 comes with advanced understanding of natural text prompts, providing state of the art performance among SDXL anime models. It doesn't mean that you are obligated to use nl prompts, tags only - completely fine, especially because understanding of tags combinations is also improved. You can use FP32 version for more accurate merging, or to get some benefits from using text encoders in fp32 mode with Comfy. Epsilon and vpred versions here have a brief aesthetic polishing after main training to improve small details and coherence. If you want to use RouWei in merges, extract or finetune it without bringing that last things - you can use BASE VERSION of RouWei: FP16 FP32 Safety: Model tends to generate NSFW images for corresponding prompts, consider to add extra filtering. Outputs may be inacurate and provocative and must not be used as a reference. Thanks: A number of anonymous persons, Bakariso, dga, Fi., ello, K., LOL2024, NeuroSenko, rred, Soviet Cat, Sv1., T., TekeshiX and other fellow brothers that helped. Donations: BTC bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c ETH/USDT(e) 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db XMR 47F7JAyKP8tMBtzwxpoZsUVB8wzg2VrbtDKBice9FAS1FikbHEXXPof4PAb42CQ5ch8p8Hs4RvJuzPHDtaVSdQzD6ZbA5TZ

NaNK

—

1,873

RouWei-0.6

Large scale finetune of Illustrious with state of the art techniques and performance. Dataset of 4.5M pictures (0.8M with natural text captions) picked and balanced from 12M of anime art and other media, including private datasets. More detailed description on Civitai Key advantages: - Better prompt following - Great aesthetic, anatomy, stability along with versatility - Vibrant colors and smooth gradients without trace of burning - Full brightness range even with epsilon - Knowledge of tens of thousands style and almost any character. An addition, comparing with vanilla Illustrious and NoobAI: - No more annoying watermarks - No tags bleed and better prompt segmentation - No characters tags bleed and related side effects (unwanted outfits, style, composition changes) - Better coherence and anatomy - Artist styles look exactly as they should - Each style including base is stable without random fluctuations on different seeds - New knowledge The model is designed to work both with short booru tag-based and long complex natural text prompts. The best result can be achieved using the combination of tags and some natural text phrases. For tags classic danbooru-style comma-separated tags without underscores were used. ~1 megapixel for txt2img, any AR with resolution multiple of 64 (1024x1024, 1152x, 1216x832,...). Eulera, CFG 4..8 for epsilon/3..5 for vpred, 20..28steps. LCM/PCM untested, cfg++ samplers work fine. Highresfix: x1.5 latent + denoise 0.6 or any gan + denoise 0.3..0.55. Please note that vpred version requires a lower CFG value. Quality tags: There are only 4: `masterpiece, best quality` for positive and `low quality, worst quality` for negative Nothing else. Meta tags like lowres have been removed, do not use them. Low resolution images have been either removed or upscaled and cleaned with DAT depending on their importance For best results keep it as clean as possible. Spamming of popular sequences will not improve results, since all related flaws have been solved, but will only lead to unwanted effects, biases and poor quality. The model knows over 22k of artist styles. List, grids with example on Mega. Used with "by ", will not work properly without it. Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you. Dataset contains over 800k of pitures with hybrid natural-text captions made by Opus-Vision, GPT-4o and ToriiGate Vpred version has index 0.6.1 because it was retrained from base for fix observed flaws, now it works flawlessly. To use it you need a latest dev build of a1111 or comfy or reforge. Do not forget to lower your CFG down to 3..5, higer values will lead to over-saturation. Safety: Model tends to generate NSFW images for corresponding prompts, consider to add extra filtering. Outputs may be inacurate and provocative and must not be used as a reference. License: Same as illustrious, please check out original page for limitation. Fell free to use in your merges, finetunes, ets. just please leave a link. Donations: BTC bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c ETH/USDT(e) 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db

NaNK

—

GLM-4.5-Air-exl3-5.5bpw

NaNK

license:mit

GLM-4.5-Air-exl3-6bpw

NaNK

license:mit

Qwen3-235B-A22B-Instruct-2507-exl3-5.0bpw

Exl3 5.0bpw h6 quantization of Qwen/Qwen3-235B-A22B-Instruct-2507 using `69750c8` commit of Exllamav3 148.4gb, requires ~160gb with 64k unquantized context.

NaNK

exllamav3

Qwen3-235B-A22B-Thinking-2507-exl3-5.0bpw

Exl3 5.0bpw h6 quantization of Qwen/Qwen3-235B-A22B-Thinking-2507 using `69750c8` commit of Exllamav3 148.4gb, requires ~160gb with 64k unquantized context.

NaNK

exllamav3

t5gemma-2b-2b-ul2-encoder-only

This repo contains original or finetuned models google/t5gemma-2b-2b-ul2. Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms.

NaNK

—

ToriiGate-v0.2

NaNK

base_model:HuggingFaceM4/Idefics3-8B-Llama3

ToriiGate-v0.4-2B-exl2-8bpw

NaNK

license:apache-2.0

Qwen3-235B-A22B-5.0bpw-h6-exl3

NaNK

exllamav3

ToriiGate-v0.4-7B-exl2-8bpw

NaNK

license:apache-2.0

ToriiGate-v0.4-7B-exl2-6bpw

NaNK

license:apache-2.0

ToriiGate-v0.4-7B-exl2-4bpw

NaNK

license:apache-2.0

Rouwei-T5Gemma-adapter_v0.2

Trained adapter for use of T5Gemma-2b as text encoder for Rouwei 0.8 (and other sdxl models). As a further development of llm adapter and early versions, now it has been moved to a separate repository. A drop-in replacement for Clip text encoders for SDXL models that allows them to achieve better prompt adherence and understanding. State of the art prompt adherence and NL prompt understanding among SDXL anime models Support of both long and short prompts, no 75 tokens limit per chunk Preserves original knowledge of styles and characters while allowing amazing flexibility in prompting Support of structured prompts that allow us to describe individual features for characters, parts, elements, etc. Maintains perfect compatibility with booru tags (alone or combined with NL), allowing easy and convenient prompting - Option a: Go to `ComfyUI/customnodes` and use `git clone https://github.com/NeuroSenko/ComfyUILLMSDXLAdapter` - Option b: Open example workflow, go to ComfyUI Manager and press `Install Missing Custom Nodes` button. 2. Make sure you updated Transformers to version that supports t5gemma (4.53 and above) Activate ComfyUI venv, type `pip install transformers -U` 3. Download adapter and put it into `ComfyUI/models/llmadapters` 4. Download T5Gemma - Option a: After activating ConfyUI venv type `hf download Minthy/RouWei-Gemma --include "t5gemma-2b-2b-ul2" --local-dir "./models/LLM"` (correct path if needed). - Option b: Download safetensors file and put in into `ComfyUI/models/textencoders` 5. Download Rouwei-0.8 (vpred or epsilon or base-epsilon) checkpoint if you don't have one yet. Also, you can use any Illustrious-based checkpoints but performance can be limited. 6. Use this workflow as a reference, feel free to experiment This version stands above any clip text encoders from various models in terms of prompt understanding. It allows to specify more details and individual parts for each character/object that will work more or less consistent instead of pure randomness, make a simple comic (stability varies), define positions and more complex composition. However, it is still in early stage, there can be difficulties with rare things (especially artist styles) and some biases. And it works with quite old and small UNET than needs proper training (and possibly modifications), don't expect it to perform as top tier open-source image generation models like Flux and QwenImage. The model is quite versatile and can accept various formats, including multilingual inputs or even base64. But it is better to stick to one of several prompting styles: Just plain text. It is better to avoid very short and very long prompts. Until emphasis support will be added to nodes, avoid adding `\` before brackets. Also unlike with clip misspelling may lead to wrong results. It can understand Markdown (`##` for separating), json, xlm or simple separation with new lines and `:`. Prompt structuring allows to improve results when prompting several characters with individual features. Depending on specific case, it can work very stable, work in most cases above random level, or can require some rolls but allowing to achieve things impossible otherwise due to biases or complexness. Any combinations of above. Recommended for most complex cases. It is better to avoid spamming because it can cause unwanted biases. Training dataset utilizes about 2.7M of pictures from Minthy/Anime-Art-Multicaptions-v5.0 and few other sources. Still quite a small number. The adapter creates text embeddings and pooled states that are directly compatible with SDXL Unet from T5Gemma hidden states. So, its support can be easily implemented by replacing Clip forward part. Main class and inference example, consists of 3 wide + 3 small transformers blocks and cross-attention comress between them. It supports up to 512 tokens input and converts hidden states from T5Gemma encoder with a shape of [512, 2304] to text embedding [308, 2048] + pooled part of vector embedding [1280]. 308 here is an equivalent of 4 concated clip chunks of 77 tokens (300 without bos and eos). Such adapter can be used with any other llm or encoder. Obtaining hidden states with a smiple example for multiple processing. Just aregular use of T5Gemma encoder part, can be connected directly with the adapter. This part is required only for initial pretraining if you're initializing new weights and want to train your own version (for example for a different text encoder model). Finetuning of pretrained model below. For early stages to reduce costs feature-based training from direct Clip outputs have been used. Here is an example training code that works with cached states and reference clip results. Consider it only as a starting point, adjust dataloaders and formats according to your preferences. T5gemma (Frozen atm, can be cached) -> Adapter (Trained) -> Unet (Frozen atm) -> Loss -> Backward Sd-scripts (dev branch) fork for Full Training, supports fine-tuning for each part (t5gemma, adapter, unet). Designed to work with Rouwei, works with most of Illustrious-based checkpoints including NoobAi and popular merges. There will be another version trained on larger dataset to estimate capacity and decision about joint training with encoder or lefting it untouched. If no flaws are found, then it will be used as text encoder for large training of next version of Rouwei checkpoint. Join Discord server where you can share you thouhts, give proposal, request, etc. Write me directly here, on civitai or dm in discord. Part of training was performed using google TPU and sponsored by OpenRoot-Compute Personal: NeuroSenko (code), Rimuru (idea, discussions), Lord (testing), DraconicDragon (fixes, testing), Remix (nodes code), and all fellow brothers who supported me before. ETH/USDT(e) 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db XMR 47F7JAyKP8tMBtzwxpoZsUVB8wzg2VrbtDKBice9FAS1FikbHEXXPof4PAb42CQ5ch8p8Hs4RvJuzPHDtaVSdQzD6ZbA5TZ This repo contains original or finetuned models google/t5gemma-2b-2b-ul2. Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms.

—

torch_2.7_dev_cu128_buildings

—

anime-art-watermarks_0.3

—

Minthy

RouWei-0.8

RouWei-Gemma

ToriiGate-v0.4-7B

RouWei-0.8-16ch-v0.1alpha

ToriiGate-v0.4-2B

RouWei-0.7

ToriiGate-v0.3

ToriiGate-0.5

RouWei-0.6

GLM-4.5-Air-exl3-5.5bpw

GLM-4.5-Air-exl3-6bpw

Qwen3-235B-A22B-Instruct-2507-exl3-5.0bpw

Qwen3-235B-A22B-Thinking-2507-exl3-5.0bpw

t5gemma-2b-2b-ul2-encoder-only

ToriiGate-v0.2

ToriiGate-v0.4-2B-exl2-8bpw

Qwen3-235B-A22B-5.0bpw-h6-exl3

ToriiGate-v0.4-7B-exl2-8bpw

ToriiGate-v0.4-7B-exl2-6bpw

ToriiGate-v0.4-7B-exl2-4bpw

Rouwei-T5Gemma-adapter_v0.2

torch_2.7_dev_cu128_buildings

anime-art-watermarks_0.3