FabioSarracino

1 models • 1 total models in database

Sort by:

VibeVoice-Large-Q8

The first 8-bit VibeVoice model that actually works [](LICENSE) [](https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8) [](https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8) If you've tried other 8-bit quantized VibeVoice models, you probably got nothing but static noise. This one actually works. The secret? Selective quantization: I only quantized the language model (the most robust part), while keeping audio-critical components (diffusion head, VAE, connectors) at full precision. Results - ✅ Perfect audio, identical to the original model - ✅ 11.6 GB instead of 18.7 GB (-38%) - ✅ Uses ~12 GB VRAM instead of 20 GB - ✅ Works on 12 GB GPUs (RTX 3060, 4070 Ti, etc.) Most 8-bit models you'll find online quantize everything aggressively: Result: Audio components get quantized → numerical errors propagate → audio = pure noise. I only quantized what can be safely quantized without losing quality. Result: 52% of parameters quantized, 48% at full precision = perfect audio quality. | Model | Size | Audio Quality | Status | |-------|------|---------------|--------| | Original VibeVoice | 18.7 GB | ⭐⭐⭐⭐⭐ | Full precision | | Other 8-bit models | 10.6 GB | 💥 NOISE | ❌ Don't work | | This model | 11.6 GB | ⭐⭐⭐⭐⭐ | ✅ Perfect | +1.0 GB vs other 8-bit models = perfect audio instead of noise. Worth it. 2. Download this model to `ComfyUI/models/vibevoice/` Minimum - VRAM: 12 GB - RAM: 16 GB - GPU: NVIDIA with CUDA (required) - Storage: 11 GB Recommended - VRAM: 16+ GB - RAM: 32 GB - GPU: RTX 3090/4090, A5000 or better ⚠️ Not supported: CPU, Apple Silicon (MPS), AMD GPUs 1. Requires NVIDIA GPU with CUDA - won't work on CPU or Apple Silicon 2. Inference only - don't use for fine-tuning 3. Requires: - `transformers>=4.51.3` - `bitsandbytes>=0.43.0` ✅ Use this 8-bit if: - You have 12-16 GB VRAM - You want maximum quality with reduced size - You need a production-ready model - You want the best size/quality balance Use full precision (18.7 GB) if: - You have unlimited VRAM (24+ GB) - You're doing research requiring absolute precision Use 4-bit NF4 (~6.6 GB) if: - You only have 8-10 GB VRAM - You can accept a small quality trade-off - Close other GPU applications - Use `devicemap="auto"` - Reduce batch size to 1 This shouldn't happen! If it does: 1. Verify you downloaded the correct model 2. Update transformers: `pip install --upgrade transformers` 3. Check CUDA: `torch.cuda.isavailable()` should return `True` - Original Model - Full precision base - ComfyUI Node - ComfyUI integration - Issues: GitHub Issues - Questions: HuggingFace Discussions The first 8-bit VibeVoice model that actually works

NaNK

license:mit

4,018