Jimmi42

5 models • 1 total models in database

Sort by:

sarvam-m-4bit-mlx

Chatterbox Tts Apple Silicon Code

High-quality voice cloning with native Apple Silicon MPS GPU acceleration! This is an optimized version of ResembleAI's Chatterbox-TTS specifically adapted for Apple Silicon devices (M1/M2/M3/M4) with full MPS GPU support and intelligent text chunking for longer inputs. 🚀 Apple Silicon Optimization - Native MPS GPU Support: 2-3x faster inference on Apple Silicon - CUDA→MPS Device Mapping: Automatic tensor device conversion - Memory Efficient: Optimized for Apple Silicon memory architecture - Cross-Platform: Works on M1, M2, M3 chip families 🎯 Enhanced Functionality - Smart Text Chunking: Automatically splits long text at sentence boundaries - Voice Cloning: Upload reference audio to clone any voice (6+ seconds recommended) - High-Quality Output: Maintains original Chatterbox-TTS audio quality - Real-time Processing: Live progress tracking and chunk visualization 🎛️ Advanced Controls - Exaggeration: Control speech expressiveness (0.25-2.0) - Temperature: Adjust randomness and creativity (0.05-5.0) - CFG/Pace: Fine-tune generation speed and quality (0.2-1.0) - Chunk Size: Configurable text processing (100-400 characters) - Seed Control: Reproducible outputs with custom seeds Text Chunking Algorithm - Sentence Boundary Detection: Splits at `.!?` with context preservation - Fallback Splitting: Handles long sentences via comma and space splitting - Silence Insertion: Adds 0.3s gaps between chunks for natural flow - Batch Processing: Generates individual chunks then concatenates Our enhanced app.py includes: - 🍎 Apple Silicon Compatibility - Optimized for M1/M2/M3/M4 Macs - 📝 Smart Text Chunking with sentence boundary detection - 🎨 Professional Gradio UI with progress tracking - 🔧 Advanced Controls for exaggeration, temperature, CFG/pace - 🛡️ Error Handling with graceful CPU fallbacks - ⚡ Performance Optimizations and memory management 💡 Apple Silicon Note While your Mac has MPS GPU capability, chatterbox-tts currently has compatibility issues with MPS tensors. This app automatically detects Apple Silicon and uses CPU mode for maximum stability and compatibility. Basic Text-to-Speech 1. Enter your text in the input field 2. Click "🎵 Generate Speech" 3. Listen to the generated audio Voice Cloning 1. Upload a reference audio file (6+ seconds recommended) 2. Enter the text you want in that voice 3. Adjust exaggeration and other parameters 4. Generate your custom voice output Long Text Processing - The system automatically chunks text longer than 250 characters - Each chunk is processed separately then combined - Progress tracking shows chunk-by-chunk generation | Device | Speed Improvement | Memory Usage | Compatibility | |--------|------------------|--------------|---------------| | M1 Mac | ~2.5x faster | 50% less RAM | ✅ Full | | M2 Mac | ~3x faster | 45% less RAM | ✅ Full | | M3 Mac | ~3.2x faster | 40% less RAM | ✅ Full | | M4 Mac | 3.5x faster | 35% less RAM | ✅ MPS GPU | | Intel Mac | CPU only | Standard | ✅ Fallback | Minimum Requirements - macOS: 12.0+ (Monterey) - Python: 3.9-3.11 - RAM: 8GB - Storage: 5GB for models Recommended Setup - macOS: 13.0+ (Ventura) - Python: 3.11 - RAM: 16GB - Apple Silicon: M1/M2/M3/M4 chip - Storage: 10GB free space Model Loading Errors - Ensure internet connection for initial model download - Check that MPS is available: `torch.backends.mps.isavailable()` Memory Issues - Reduce chunk size in Advanced Options - Close other applications to free RAM - Use CPU fallback if needed Audio Problems - Install ffmpeg: `brew install ffmpeg` - Check audio file format (WAV recommended) - Ensure reference audio is 6+ seconds | Feature | Original Chatterbox | Apple Silicon Version | |---------|-------------------|----------------------| | Device Support | CUDA only | MPS + CUDA + CPU | | Text Length | Limited | Unlimited (chunking) | | Progress Tracking | Basic | Detailed per chunk | | Memory Usage | High | Optimized | | macOS Support | CPU only | Native GPU | | Installation | Complex | Streamlined | We welcome contributions! Areas for improvement: - MLX Integration: Native Apple framework support - Batch Processing: Multiple inputs simultaneously - Voice Presets: Pre-configured voice library - API Endpoints: REST API for programmatic access MIT License - feel free to use, modify, and distribute! - ResembleAI: Original Chatterbox-TTS implementation - Apple: MPS framework for Apple Silicon optimization - Gradio Team: Excellent web interface framework - PyTorch: MPS backend development For detailed implementation notes, see: - `APPLESILICONADAPTATIONSUMMARY.md` - Complete technical guide - `MLXvsPyTorchAnalysis.md` - Performance comparisons - `SETUPGUIDE.md` - Detailed installation instructions 🎙️ Experience the future of voice synthesis with native Apple Silicon acceleration! This Space demonstrates how modern AI models can be optimized for Apple's custom silicon, delivering superior performance while maintaining full compatibility and ease of use.

license:mit

MonkeyOCR-Apple-Silicon

license:mit

Parakeet-V3-MLX

—

youtube-transcriber-subtitles

High-performance YouTube video transcription with perfectly timed subtitles using Apple MLX and Parakeet v2 🚀 Try it Now • ✨ Features • 📖 Usage • 🛠️ Installation Transform any YouTube video segment into a transcribed video with perfectly synchronized subtitles in seconds! Built for Apple Silicon with cutting-edge speech recognition. ⚡️ Lightning Fast - ~0.3 seconds to transcribe 1-minute videos - Apple MLX optimized for M1/M2/M3 chips - Real-time processing with chunked inference 🎯 Pixel-Perfect Timing - Sentence-level timing from Parakeet v2 - No more early/late subtitles - perfect sync - Natural speech patterns preserved 🎬 Smart Video Processing - YouTube URL input - paste any video link - Precise time trimming - specify start/end times (MM:SS or HH:MM:SS) - Auto quality selection - best available video/audio 🎤 Advanced Speech Recognition - Parakeet TDT v2 model - NVIDIA's latest ASR - Conformer + RNNT architecture - not slow transformers - Chunked processing - handles long videos efficiently 📝 Subtitle Magic - Toggle ON/OFF - choose subtitled or clean video - Accurate timing - uses real speech timestamps - SRT format - standard subtitle file creation - Burned-in subtitles - embedded directly in video 🎨 Beautiful Interface - Gradio web UI - clean, modern design - Real-time progress - see processing status - Dual output - video player + text transcript 3. Open Browser Navigate to `http://127.0.0.1:7860` 4. Process Video 1. Paste YouTube URL 2. Set start/end times (e.g., "1:23" to "2:45") 3. Toggle subtitles ON/OFF 4. Click "Process Video" 5. Download your result! Prerequisites - Python 3.8+ - Apple Silicon Mac (M1/M2/M3) - for MLX acceleration - ffmpeg - for video processing - yt-dlp - for YouTube downloads Key Dependencies - `parakeet-mlx` - Apple MLX speech recognition - `gradio` - Web interface - `yt-dlp` - YouTube downloader - `mlx` - Apple's ML framework 🧠 Model Architecture - Parakeet TDT 0.6B v2 - 600M parameter model - Conformer encoder - superior to transformers on Mac - RNNT decoder - streaming-friendly architecture - MLX optimized - native Apple Silicon acceleration ⚙️ Processing Pipeline 1. Download video using yt-dlp 2. Trim to specified time range with ffmpeg 3. Extract audio at 16kHz mono WAV 4. Transcribe with chunked inference (120s chunks, 5s overlap) 5. Generate SRT subtitles with real timing 6. Embed subtitles using ffmpeg (optional) 7. Return video + transcript 📊 Performance - Speed: ~5-10x faster than real-time - Memory: Efficient chunked processing - Quality: State-of-the-art accuracy - Compatibility: Apple Silicon optimized 1. 🍴 Fork the repository 2. 🌟 Create a feature branch 3. ✨ Make your improvements 4. 🧪 Test thoroughly 5. 📤 Submit a pull request Ideas for Contributions - 🎨 Custom subtitle styling options - 🌍 Multi-language support - 📱 Mobile-friendly interface - 🎵 Audio-only processing mode - 📊 Batch processing for multiple videos - NVIDIA - Parakeet speech recognition models - Apple - MLX framework for efficient inference - Gradio - Beautiful web interfaces made simple - ffmpeg - The Swiss Army knife of multimedia - 🐛 Bug reports: Open an issue - 💡 Feature requests: Start a discussion - 📖 Documentation: Check this README first - 💬 Community: Join our discussions ⭐ Star this repo if it helped you create amazing transcribed videos! ⭐

—