wangkanai

57 models • 4 total models in database

Sort by:

wan22-fp16-encoders-gguf

High-precision FP16 text encoders for the WAN 2.2 (World Animated Network) video generation model in optimized GGUF format. These encoders provide enhanced text understanding and conditioning for high-quality text-to-video and image-to-video generation. This repository contains the UMT5-XXL text encoder component for WAN 2.2, optimized in FP16 precision using the GGUF format. The text encoder is a critical component that processes text prompts and generates embeddings that condition the video generation process. Key Features: - FP16 Precision: Full 16-bit floating point precision for maximum quality - GGUF Format: Efficient memory-mapped format for faster loading and lower memory overhead - UMT5-XXL Architecture: Extra-large unified multilingual T5 model for superior text understanding - WAN 2.2 Compatible: Designed specifically for WAN 2.2 video generation pipeline Capabilities: - Complex prompt understanding with nuanced semantic comprehension - Multilingual text encoding support - High-quality conditioning for video generation - Efficient inference with optimized format | File | Size | Format | Precision | Purpose | |------|------|--------|-----------|---------| | `umt5-xxl-encoder-f16.gguf` | 10.59 GB | GGUF | FP16 | UMT5-XXL text encoder | Minimum Requirements - VRAM: 12 GB GPU memory (for encoder only) - System RAM: 16 GB - Disk Space: 11 GB free space - GPU: NVIDIA GPU with CUDA support (recommended) Recommended Requirements - VRAM: 16+ GB GPU memory - System RAM: 32 GB - Disk Space: 20 GB free space (for encoder + model files) - GPU: NVIDIA RTX 3090/4090 or A100 Full WAN 2.2 Pipeline Requirements When using with complete WAN 2.2 model: - VRAM: 40+ GB (encoder + transformer + VAE) - System RAM: 64 GB - Disk Space: 100+ GB for complete pipeline Text Encoder Architecture - Model: UMT5-XXL (Unified Multilingual T5 Extra Large) - Parameters: ~11 billion parameters - Precision: FP16 (16-bit floating point) - Format: GGUF (GPT-Generated Unified Format) - Context Length: 512 tokens - Embedding Dimension: 4096 Format Details - GGUF Version: Compatible with llama.cpp and transformers GGUF loaders - Quantization: None (full FP16 precision maintained) - Memory Mapping: Enabled for efficient loading - Tensor Layout: Optimized for GPU inference Integration - Primary Framework: Diffusers (Hugging Face) - Compatible Libraries: transformers, llama.cpp, GGML - Pipeline: WAN 2.2 text-to-video and image-to-video - Device Support: CUDA, CPU (with reduced performance) Advantages Over Standard Formats - Faster Loading: Memory-mapped file format reduces loading time by 2-3x - Lower Memory Overhead: Efficient tensor storage reduces RAM usage during loading - Better Compatibility: Works with multiple inference frameworks (transformers, llama.cpp) - Simplified Distribution: Single-file format easier to manage and distribute Performance Characteristics - Loading Speed: ~5-10 seconds (vs 30-60 seconds for standard safetensors) - Memory Footprint: ~11 GB VRAM (vs ~13 GB for unoptimized formats) - Inference Speed: Equivalent to standard FP16 with optimized attention This model is released under a custom license. Please review the license terms before use: Key Terms: - Research and commercial use permitted with attribution - Modifications and derivatives allowed - Distribution of derivatives must maintain original attribution - No warranty provided; use at your own risk For complete license terms, visit: https://huggingface.co/Lightricks/wan-2.2 If you use these encoders in your research or projects, please cite: Official WAN 2.2 Resources - Main Model: Lightricks/wan-2.2 - Documentation: WAN 2.2 Model Card - Research Paper: WAN: World Animated Network Diffusers Library - Documentation: Hugging Face Diffusers - Installation: `pip install diffusers transformers accelerate` - WAN Pipeline Guide: Diffusers WAN Pipeline GGUF Format - GGML/GGUF Specification: ggerganov/ggml - llama.cpp: ggerganov/llama.cpp - Format Documentation: GGUF Format Spec Getting Help - Issues: Report issues on the WAN 2.2 repository - Discussions: Join Hugging Face community discussions - Discord: Lightricks AI community server For questions, issues, or collaboration inquiries: - Email: [email protected] - Website: https://www.lightricks.com/research - Hugging Face: https://huggingface.co/Lightricks Last Updated: October 2024 Model Version: WAN 2.2 Text Encoders FP16 Format Version: GGUF Repository Maintainer: Lightricks Research Team

wangkanai

wan22-fp16-encoders-gguf

qwen2.5-vl-7b-instruct

qwen2.5-vl-3b-instruct

qwen2.5-vl-32b-instruct

wan22-qx-encoders-gguf

qwen3-vl-2b-instruct

qwen3-vl-8b-thinking

qwen3-vl-4b-thinking

qwen3-vl-2b-thinking

wan22-fp16-i2v-gguf

wan22-fp8-i2v-gguf

wan22-fp32-encoders-gguf

qwen3-vl-4b-instruct

flux-dev-fp16

wan22-fp8-i2v-loras-nsfw

wan25-fp16-i2v

wan25-fp16-i2v-loras-nsfw

sdxl-fp8

wan22-fp16-i2v

wan22-fp16-i2v-loras

flux-dev-fp8

flux-upscale

wan21-lightx2v-i2v-14b-480p

wan21-lightx2v-i2v-14b-720p

wan21-lightx2v-t2v-14b-720p

sdxl-vae

sdxl-fp16

flux-dev-loras

flux-dev-loras-nsfw

sdxl-fp16-loras

sdxl-fp16-loras-nsfw

sdxl-fp8-loras

wan21-fp8-480p

wan21-vae

wan22-vae

wan25-vae

wan22-fp8-i2v

wan22-fp8-i2v-loras

wan22-fp8-t2v-loras

wan22-fp8-t2v-loras-nsfw

wan25-fp16-i2v-loras

wan25-fp8-i2v

wan25-fp8-i2v-loras

wan25-fp8-i2v-loras-nsfw

sdxl-fp8-loras-nsfw

wan21-fp16-480p

wan21-fp16-720p

wan21-fp16-loras

wan21-fp16-loras-nsfw

wan21-fp8-720p

wan21-fp8-loras

wan21-fp8-loras-nsfw

wan22-fp16-encoders

wan22-fp8-encoders

qwen3-vl-32b-instruct

qwen3-vl-32b-thinking

qwen3-vl-8b-instruct