phazei
NSFW_MMaudio
This repository contains an FP16 safetensors version of the fine-tuned MMAudio model from cloud19/NSFWMMaudio, optimized for improved memory efficiency and faster loading times. Base Model: cloud19/NSFWMMaudio Original Project: hkchengrex/MMAudio Base Architecture: `large44k` (from the original MMAudio) Fine-tuning: Fine-tuned on NSFW content (see base model for details) Optimization: Converted from FP32 PyTorch checkpoint to FP16 safetensors Capabilities: Video-to-Audio, Image-to-Audio, Text-to-Audio Format: Safetensors (`.safetensors`) Precision: 16-bit floating point ✅ ~50% smaller file size (FP32 → FP16 conversion) ✅ Faster loading with safetensors format ✅ Lower GPU memory usage during inference ✅ Same quality output (minimal precision loss with FP16) ✅ Better compatibility with modern ML frameworks This model can be used as a drop-in replacement for the original model. Load the safetensors file instead of the original PyTorch checkpoint: System Requirements: GPU: 8-12 GB VRAM (reduced from 12-16 GB due to FP16 optimization) Python 3.10+ PyTorch with CUDA support For usage instructions, please refer to the base model repository and simply replace the model loading with the FP16 safetensors version. Original Format: FP32 PyTorch (.pth) - ~2.5GB Optimized Format: FP16 Safetensors (.safetensors) - ~1.25GB Conversion Method: Direct FP32 → FP16 tensor conversion Quality Impact: Negligible quality loss in practice Same limitations as the base model apply Content Warning: Due to the NSFW nature of the fine-tuning dataset, the model may generate explicit or mature audio content. User discretion is advised. FP16 precision may introduce minimal numerical differences compared to FP32 Base Model: cloud19/NSFWMMaudio Original MMAudio: hkchengrex/MMAudio Optimization: FP16 conversion for improved efficiency All credit for the original architecture, fine-tuning, and model development goes to the respective authors. This repository only provides format optimization.
HunyuanVideo Foley
This is an FP8 quantized version of tencent/HunyuanVideo-Foley optimized for reduced VRAM usage while maintaining audio generation quality. - Quantization Method: FP8 E5M2 & E4M3FN weight-only quantization - Layers Quantized: Transformer block weights only (attention and FFN layers) - Preserved Precision: Normalization layers, embeddings, and biases remain in original precision - Expected VRAM Savings: ~30-40% reduction compared to BF16 original - Memory Usage: Enables running on <12GB GPUs when combined with other optimizations This model is specifically optimized for use with the ComfyUI-HunyuanVideo-Foley custom node, which provides: - VRAM-friendly loading with ping-pong memory management - Built-in FP8 support that automatically handles the quantized weights - Torch compile integration for ~30% speed improvements after first run - Text-to-Audio and Video-to-Audio modes - Batch generation with audio selection tools Installation: 1. Install the ComfyUI node: ComfyUI-HunyuanVideo-Foley 2. Download this quantized model to `ComfyUI/models/foley/` 3. Enjoy <8GB VRAM usage with high-quality audio generation Typical VRAM Usage (5s audio, 50 steps): - Baseline (BF16): ~10-12 GB - With FP8 quantization: ~8-10 GB - Perfect for RTX 3080/4070 Ti and similar GPUs The FP8 weights can be used with any framework that supports automatic upcasting of FP8 to FP16/BF16 during computation. The quantized weights maintain compatibility with the original model architecture. - `hunyuanvideofoleyfp8e4m3fn.safetensors` - Main model weights in FP8 format - Quality: Maintains comparable audio generation quality to the original model - Speed: Conversion overhead is minimal; actual generation speed depends on compute precision - Memory: Significant VRAM reduction makes the model accessible on consumer GPUs - Compatibility: Drop-in replacement for the original model weights This quantization is based on tencent/HunyuanVideo-Foley. Please refer to the original repository for: - Model architecture details - Training information - License terms - Citation information The quantization uses a conservative approach that only converts transformer block weights while preserving precision-sensitive components: - ✅ Converted: Attention and FFN layer weights in transformer blocks - ❌ Preserved: Normalization layers, embeddings, projections, bias terms This selective quantization strategy maintains model quality while maximizing memory savings.