lightx2v
Wan2.2-Distill-Loras
--- license: apache-2.0 tags: - diffusion-single-file - comfyui - distillation - LoRA - video - video genration - lora pipeline_tags: - image-to-video - text-to-video base_model: - Wan-AI/Wan2.2-I2V-A14B library_name: diffusers pipeline_tag: image-to-video ---
Qwen-Image-Lightning
--- license: apache-2.0 language: - en - zh base_model: - Qwen/Qwen-Image pipeline_tag: text-to-image tags: - Qwen-Image - distillation - LoRA - lora library_name: diffusers ---
Qwen-Image-Edit-2511-Lightning
Qwen-Image-2512-Lightning
Wan2.2-Distill-Models
β‘ High-Performance Video Generation with 4-Step Inference Distillation-accelerated version of Wan2.2 - Dramatically faster speed with excellent quality [](https://huggingface.co/lightx2v/Wan2.2-Distill-Models) [](https://github.com/ModelTC/LightX2V) [](LICENSE) β‘ Ultra-Fast Generation - 4-step inference (vs traditional 50+ steps) - Approximately 2x faster using LightX2V than ComfyUI - Near real-time video generation capability π― Flexible Options - Dual noise control: High/Low noise variants - Multiple precision formats (BF16/FP8/INT8) - Full 14B parameter models πΎ Memory Efficient - FP8/INT8: ~50% size reduction - CPU offload support - Optimized for consumer GPUs π§ Easy Integration - Compatible with LightX2V framework - ComfyUI support - Simple configuration files πΌοΈ Image-to-Video (I2V) - 14B Parameters Transform static images into dynamic videos with advanced quality control - π¨ High Noise: More creative, diverse outputs - π― Low Noise: More faithful to input, stable outputs π Text-to-Video (T2V) - 14B Parameters Generate videos from text descriptions - π¨ High Noise: More creative, diverse outputs - π― Low Noise: More stable and controllable outputs - π Full 14B parameter model | Precision | Model Identifier | Model Size | Framework | Quality vs Speed | |:---------:|:-----------------|:----------:|:---------:|:-----------------| | π BF16 | `lightx2v4step` | ~28.6 GB | LightX2V | βββββ Highest Quality | | β‘ FP8 | `scaledfp8e4m3lightx2v4step` | ~15 GB | LightX2V | ββββ Excellent Balance | | π― INT8 | `int8lightx2v4step` | ~15 GB | LightX2V | ββββ Fast & Efficient | | π· FP8 ComfyUI | `scaledfp8e4m3lightx2v4stepcomfyui` | ~15 GB | ComfyUI | βββ ComfyUI Ready | > π‘ Browse All Models: View Full Model Collection β LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended! > π‘ Tip: For T2V models, follow the same steps but replace `i2v` with `t2v` in the filenames Or refer to Quick Start Documentation to use docker Choose appropriate configuration based on your GPU memory: 80GB+ GPUs (A100/H100) - I2V: wanmoei2vdistill.json 24GB+ GPUs (RTX 4090) - I2V: wanmoei2vdistill4090.json > π Note: Update model paths in the script to point to your Wan2.2 model. Also refer to LightX2V Model Structure Documentation LightX2V Documentation - Quick Start Guide: LightX2V Quick Start - Complete Usage Guide: LightX2V Model Structure Documentation - Configuration File Instructions: Configuration Files - Quantized Model Usage: Quantization Documentation - Parameter Offloading: Offload Documentation Other Components: These models only contain DIT weights. Additional components needed at runtime: - T5 text encoder - CLIP vision encoder - VAE encoder/decoder - Tokenizer Please refer to LightX2V Documentation for instructions on organizing the complete model directory. - GitHub Issues: https://github.com/ModelTC/LightX2V/issues - HuggingFace: https://huggingface.co/lightx2v/Wan2.2-Distill-Models If you find this project helpful, please give us a β on GitHub
Autoencoders
β‘ Efficient Video Autoencoder (VAE) Model Collection From Official Models to Lightx2v Distilled Optimized Versions - Balancing Quality, Speed and Memory [](https://huggingface.co/lightx2v) [](https://github.com/ModelTC/LightX2V) [](LICENSE) For VAE, the LightX2V team has conducted a series of deep optimizations, deriving two major series: LightVAE and LightTAE, which significantly reduce memory consumption and improve inference speed while maintaining high quality. β Best reconstruction accuracy β Complete detail preservation β Large memory usage (~8-12 GB) β Slow inference speed π Open Source TAE Series Features: Fastest Speed β‘β‘β‘β‘β‘ β Minimal memory usage (~0.4 GB) β Extremely fast inference β Average quality βββ β Potential detail loss π― LightVAE Series (Our Optimization) Features: Best Balanced Solution βοΈ β Uses Causal 3D Conv (same as official) β Quality close to official ββββ β Memory reduced by ~50% (~4-5 GB) β Speed increased by 2-3x β Balances quality, speed, and memory π β‘ LightTAE Series (Our Optimization) Features: Fast Speed + Good Quality π β Minimal memory usage (~0.4 GB) β Extremely fast inference β Quality close to official ββββ β Significantly surpasses open source TAE | Model Name | Type | Architecture | Description | |:--------|:-----|:-----|:-----| | `Wan2.1VAE` | Official VAE | Causal Conv3D | Wan2.1 official video VAE model Highest quality, large memory, slow speed | | `taew21` | Open Source Small AE | Conv2D | Open source model based on taeHV Small memory, fast speed, average quality | | `lighttaew21` | LightTAE Series | Conv2D | Our distilled optimized version based on `taew21` Small memory, fast speed, quality close to official β¨ | | `lightvaew21` | LightVAE Series | Causal Conv3D | Our pruned 75% on WanVAE2.1 architecture then trained+distilled Best balance: high quality + low memory + fast speed π | | Model Name | Type | Architecture | Description | |:--------|:-----|:-----|:-----| | `Wan2.2VAE` | Official VAE | Causal Conv3D | Wan2.2 official video VAE model Highest quality, large memory, slow speed | | `taew22` | Open Source Small AE | Conv2D | Open source model based on taeHV Small memory, fast speed, average quality | | `lighttaew22` | LightTAE Series | Conv2D | Our distilled optimized version based on `taew22` Small memory, fast speed, quality close to official β¨ | π Wan2.1 Series Performance Comparison - Precision: BF16 - Test Hardware: NVIDIA H100 |Speed | Wan2.1VAE | taew21 | lighttaew21 | lightvaew21 | |:-----|:--------------|:------------|:---------------------|:-------------| | Encode Speed | 4.1721 s | 0.3956 s | 0.3956 s |1.5014s | | Decode Speed | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s | |GPU Memory | Wan2.1VAE | taew21 | lighttaew21 | lightvaew21 | |:-----|:--------------|:------------|:---------------------|:-------------| | Encode Memory | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB | | Decode Memory | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB | π Wan2.2 Series Performance Comparison - Precision: BF16 - Test Hardware: NVIDIA H100 Video Reconstruction | Speed | Wan2.2VAE | taew22 | lighttaew22 | |:-----|:--------------|:------------|:---------------------| | Encode Speed | 1.1369s | 0.3499 s | 0.3499 s | | Decode Speed | 3.1268 s | 0.0891 s | 0.0891 s| | GPU Memory | Wan2.2VAE | taew22 | lighttaew22 | |:-----|:--------------|:------------|:---------------------| | Encode Memory | 6.1991 GB | 0.0064 GB | 0.0064 GB | | Decode Memory | 12.3487 GB | 0.4120 GB | 0.4120 GB | π Pursuing Best Quality Recommended: `Wan2.1VAE` / `Wan2.2VAE` - β Official model, quality ceiling - β Highest reconstruction accuracy - β Suitable for final product output - β οΈ Large memory usage (~8-12 GB) - β οΈ Slow inference speed - β Uses Causal 3D Conv (same as official) - β Excellent quality, close to official - β Memory reduced by ~50% (~4-5 GB) - β Speed increased by 2-3x - β Close to official quality ββββ Use Cases: Daily production, strongly recommended β β‘ Speed + Quality Balance β¨ Recommended: `lighttaew21` / `lighttaew22` - β Extremely low memory usage (~0.4 GB) - β Extremely fast inference - β Quality significantly surpasses open source TAE - β Close to official quality ββββ | Comparison | Open Source TAE | LightTAE (Ours) | Official VAE | LightVAE (Ours) | |:------|:--------|:---------------------|:---------|:---------------------| | Architecture | Conv2D | Conv2D | Causal Conv3D | Causal Conv3D | | Memory Usage | Minimal (~0.4 GB) | Minimal (~0.4 GB) | Large (~8-12 GB) | Medium (~4-5 GB) | | Inference Speed | Extremely Fast β‘β‘β‘β‘β‘ | Extremely Fast β‘β‘β‘β‘β‘ | Slow β‘β‘ | Fast β‘β‘β‘β‘ | | Generation Quality | Average βββ | Close to Official ββββ | Highest βββββ | Close to Official ββββ | π Todo List - [x] LightX2V integration - [x] ComfyUI integration - [ ] Training & Distillation Code We provide a standalone script `vidrecon.py` to test VAE models independently. This script reads a video, encodes it through VAE, then decodes it back to verify the reconstruction quality. Script Location: `LightX2V/lightx2v/models/videoencoders/hf/vidrecon.py` please refer to https://github.com/ModelTC/ComfyUI-LightVAE 1. Compatibility - Wan2.1 series VAE only works with Wan2.1 backbone models - Wan2.2 series VAE only works with Wan2.2 backbone models - Do not mix different versions of VAE and backbone models Documentation Links - LightX2V Quick Start: Quick Start Documentation - Model Structure Description: Model Structure Documentation - taeHV Project: GitHub - madebyollin/taeHV Related Models - Wan2.1 Backbone Models: Wan-AI Model Collection - Wan2.2 Backbone Models: Wan-AI/Wan2.2-TI2V-5B - LightX2V Optimized Models: lightx2v Model Collection - GitHub Issues: https://github.com/ModelTC/LightX2V/issues - HuggingFace: https://huggingface.co/lightx2v - LightX2V Homepage: https://github.com/ModelTC/LightX2V If you find this project helpful, please give us a β on GitHub
Wan2.1-Distill-Models
β‘ High-Performance Video Generation with 4-Step Inference Distillation-accelerated versions of Wan2.1 - Dramatically faster while maintaining exceptional quality [](https://huggingface.co/lightx2v/Wan2.1-Distill-Models) [](https://github.com/ModelTC/LightX2V) [](LICENSE) β‘ Ultra-Fast Generation - 4-step inference (vs traditional 50+ steps) - Up to 2x faster than ComfyUI - Real-time video generation capability π― Flexible Options - Multiple resolutions (480P/720P) - Various precision formats (BF16/FP8/INT8) - I2V and T2V support πΎ Memory Efficient - FP8/INT8: ~50% size reduction - CPU offload support - Optimized for consumer GPUs π§ Easy Integration - Compatible with LightX2V framework - ComfyUI support available - Simple configuration files πΌοΈ Image-to-Video (I2V) Transform still images into dynamic videos - πΊ 480P Resolution - π¬ 720P Resolution π Text-to-Video (T2V) Generate videos from text descriptions - π 14B Parameters - π¨ High-quality synthesis | Precision | Model Identifier | Model Size | Framework | Quality vs Speed | |:---------:|:-----------------|:----------:|:---------:|:-----------------| | π BF16 | `lightx2v4step` | ~28-32 GB | LightX2V | βββββ Highest quality | | β‘ FP8 | `scaledfp8e4m3lightx2v4step` | ~15-17 GB | LightX2V | ββββ Excellent balance | | π― INT8 | `int8lightx2v4step` | ~15-17 GB | LightX2V | ββββ Fast & efficient | | π· FP8 ComfyUI | `scaledfp8e4m3lightx2v4stepcomfyui` | ~15-17 GB | ComfyUI | βββ ComfyUI ready | > π‘ Explore all models: Browse Full Model Collection β LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended! Or refer to Quick Start Documentation to use docker Choose the appropriate configuration based on your GPU memory: For 80GB+ GPU (A100/H100) - I2V: wani2vdistill4stepcfg.json - T2V: want2vdistill4stepcfg.json For 24GB+ GPU (RTX 4090) - I2V: wani2vdistill4stepcfg4090.json - T2V: want2vdistill4stepcfg4090.json Documentation - Quick Start Guide: LightX2V Quick Start - Complete Usage Guide: LightX2V Model Structure Documentation - Configuration Guide: Configuration Files - Quantization Usage: Quantization Documentation - Parameter Offload: Offload Documentation - β‘ Fast: Approximately 2x faster than ComfyUI - π― Optimized: Deeply optimized for distilled models - πΎ Memory Efficient: Supports CPU offload and other memory optimization techniques - π οΈ Flexible: Supports multiple quantization formats and configuration options Community - Issues: https://github.com/ModelTC/LightX2V/issues 1. Additional Components: These models only contain DIT weights. You also need: - T5 text encoder - CLIP vision encoder - VAE encoder/decoder - Tokenizers Refer to LightX2V Documentation for how to organize the complete model directory. If you find this project helpful, please give us a β on GitHub
Hy1.5-Quantized-Models
Wan2.1-Distill-Loras
Encoders
Wan2.2-Lightning
You're welcome to visit our GitHub repository for the latest model releases or to reproduce our results. We are excited to release the distilled version of Wan2.2 video generation model family, which offers the following advantages: - Fast: Video generation now requires only 4 steps without the need of CFG trick, leading to x20 speed-up - High-quality: The distilled model delivers visuals on par with the base model in most scenarios, sometimes even better. - Complex Motion Generation: Despite the reduction to just 4 steps, the model retains excellent motion dynamics in the generated scenes. Aug 08, 2025: π Release of Native ComfyUI Workflows. Wan2.2-I2V-A14B-NFE4-V1 Image-to-Video I2V-V1-WF I2V-V1-WF Wan2.2-T2V-A14B-NFE4-V1.1 Text-to-Video T2V-V1.1-WF T2V-V1.1-WF Aug 07, 2025: π Release of Wan2.2-I2V-A14B-NFE4-V1. Aug 07, 2025: π Release of Wan2.2-T2V-A14B-NFE4-V1.1. The generation quality of V1.1 is slightly better than V1. Aug 04, 2025: π Release of Wan2.2-T2V-A14B-NFE4-V1. The videos below can be reproduced using examples/i2vpromptlist.txt and examples/i2vimagepathlist.txt. The videos below can be reproduced using examples/promptlist.txt. When the video contains elements with extremely large motion, the generated results may include artifacts. In some results, the direction of the vehicles may be reversed. π Todo List - [x] Wan2.2-T2V-A14B-4steps - [x] Wan2.2-I2V-A14B-4steps - [ ] Wan2.2-TI2V-5B-4steps Please follow Wan2.2 Official Github to install the Python Environment and download the Base Model. This repository supports the `Wan2.2-T2V-A14B` Text-to-Video model and can simultaneously support video generation at 480P and 720P resolutions, either portrait or landscape. To facilitate implementation, we will start with a basic version of the inference process that skips the prompt extension step. > π‘ This command can run on a GPU with at least 80GB VRAM. > π‘If you encounter OOM (Out-of-Memory) issues, you can use the `--offloadmodel True`, `--convertmodeldtype` and `--t5cpu` options to reduce GPU memory usage. - Multi-GPU inference using FSDP + DeepSpeed Ulysses We use PyTorch FSDP and DeepSpeed Ulysses to accelerate inference. Extending the prompts can effectively enrich the details in the generated videos, further enhancing the video quality. Therefore, we recommend enabling prompt extension. We provide the following two methods for prompt extension: - Use the Dashscope API for extension. - Apply for a `dashscope.apikey` in advance (EN | CN). - Configure the environment variable `DASHAPIKEY` to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable `DASHAPIURL` to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the dashscope document. - Use the `qwen-plus` model for text-to-video tasks and `qwen-vl-max` for image-to-video tasks. - You can modify the model used for extension with the parameter `--promptextendmodel`. For example: - By default, the Qwen model on HuggingFace is used for this extension. Users can choose Qwen models or other models based on the available GPU memory size. - For text-to-video tasks, you can use models like `Qwen/Qwen2.5-14B-Instruct`, `Qwen/Qwen2.5-7B-Instruct` and `Qwen/Qwen2.5-3B-Instruct`. - For image-to-video tasks, you can use models like `Qwen/Qwen2.5-VL-7B-Instruct` and `Qwen/Qwen2.5-VL-3B-Instruct`. - Larger models generally provide better extension results but require more GPU memory. - You can modify the model used for extension with the parameter `--promptextendmodel` , allowing you to specify either a local model path or a Hugging Face model. For example: This repository supports the `Wan2.2-I2V-A14B` Image-to-Video model and can simultaneously support video generation at 480P and 720P resolutions. > This command can run on a GPU with at least 80GB VRAM. > π‘For the Image-to-Video task, the `size` parameter represents the area of the generated video, with the aspect ratio following that of the original input image. - Multi-GPU inference using FSDP + DeepSpeed Ulysses π‘The model can generate videos solely from the input image. You can use prompt extension to generate prompt from the image. > The process of prompt extension can be referenced here. This repository supports the `Wan2.2-TI2V-5B` Text-Image-to-Video model and can support video generation at 720P resolutions. > π‘Unlike other tasks, the 720P resolution of the Text-Image-to-Video task is `1280704` or `7041280`. > This command can run on a GPU with at least 24GB VRAM (e.g, RTX 4090 GPU). > π‘If you are running on a GPU with at least 80GB VRAM, you can remove the `--offloadmodel True`, `--convertmodeldtype` and `--t5cpu` options to speed up execution. > π‘If the image parameter is configured, it is an Image-to-Video generation; otherwise, it defaults to a Text-to-Video generation. > π‘Similar to Image-to-Video, the `size` parameter represents the area of the generated video, with the aspect ratio following that of the original input image. - Multi-GPU inference using FSDP + DeepSpeed Ulysses > The process of prompt extension can be referenced here. --> License Agreement The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license. We built upon and reused code from the following projects: Wan2.1, Wan2.2, licensed under the Apache License 2.0. We also adopt the evaluation text prompts from Movie Gen Bench, which is licensed under the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0) License. The original license can be found here. The selected prompts are further enhanced using the `Qwen/Qwen2.5-14B-Instruct`model Qwen.
Wan2.1 T2V 14B StepDistill CfgDistill
Wan2.1-T2V-14B-StepDistill-CfgDistill is an advanced text-to-video generation model built upon the Wan2.1-T2V-14B foundation. This approach allows the model to generate videos with significantly fewer inference steps (4 steps) and without classifier-free guidance, substantially reducing video generation time while maintaining high quality outputs. Our training code is modified based on the Self-Forcing repository. We extended support for the Wan2.1-14B-T2V model and performed a 4-step bidirectional distillation process. The modified code is available at Self-Forcing-Plus. Our inference framework utilizes lightx2v, a highly efficient inference engine that supports multiple models. This framework significantly accelerates the video generation process while maintaining high quality output. License Agreement The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license. We would like to thank the contributors to the Wan2.1, Self-Forcing repositories, for their open research.
Wan2.1 I2V 14B 480P StepDistill CfgDistill Lightx2v
Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v is an advanced image-to-video generation model built upon the Wan2.1-I2V-14B-480P foundation. This approach allows the model to generate videos with significantly fewer inference steps (4 steps) and without classifier-free guidance, substantially reducing video generation time while maintaining high quality outputs. In this version, we added the following features: 1. Trained with higher quality datasets for extended iterations. 2. New fp8 and int8 quantized distillation models have been added, which enable fast inference using lightx2v on RTX 4060. Our training code is modified based on the Self-Forcing repository. We extended support for the Wan2.1-14B-I2V-480P model and performed a 4-step bidirectional distillation process. The modified code is available at Self-Forcing-Plus. Our inference framework utilizes lightx2v, a highly efficient inference engine that supports multiple models. This framework significantly accelerates the video generation process while maintaining high quality output. We recommend using the LCM scheduler with the following settings: - `shift=5.0` - `guidancescale=1.0 (i.e., without CFG)` License Agreement The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license. We would like to thank the contributors to the Wan2.1, Self-Forcing repositories, for their open research.
Wan2.2 I2V A14B Moe Distill Lightx2v
Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v Wan2.2-I2V-A14B-Moe-Distill-Lightx2v is an advanced image-to-video generation model built upon the Wan2.2-I2V-A14B foundation. This approach allows the model to generate videos with significantly fewer inference steps (4 steps, 2 steps for high noise and 2 steps for low noise) and without classifier-free guidance, substantially reducing video generation time while maintaining high quality outputs. This version has the following features: 1. ...
Wan2.1-T2V-14B-CausVid
Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v
Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v is an advanced text-to-video generation model built upon the Wan2.1-T2V-14B foundation. This approach allows the model to generate videos with significantly fewer inference steps (4 steps) and without classifier-free guidance, substantially reducing video generation time while maintaining high quality outputs. In this version, we added the following features: 1. Trained with higher quality datasets for extended iterations. 2. New fp8 and int8 quantized distillation models have been added, which enable fast inference using lightx2v on RTX 4060. Our training code is modified based on the Self-Forcing repository. We extended support for the Wan2.1-14B-T2V model and performed a 4-step bidirectional distillation process. The modified code is available at Self-Forcing-Plus. Our inference framework utilizes lightx2v, a highly efficient inference engine that supports multiple models. This framework significantly accelerates the video generation process while maintaining high quality output. We recommend using the LCM scheduler with the following settings: - `shift=5.0` - `guidancescale=1.0 (i.e., without CFG)` License Agreement The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license. We would like to thank the contributors to the Wan2.1, Self-Forcing repositories, for their open research.
Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v
Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v > Please note: The 720P distilled model = 720P original model + (480P step distillation - 480P original model), which means we did not train a native 720P model. > > The LoRA in this repository is completely identical to the LoRA in the 480P repository. Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v is an advanced image-to-video generation model built upon the Wan2.1-I2V-14B-480P foundation. This approach allows the model to generate videos with significantly fewer inference steps (4 steps) and without classifier-free guidance, substantially reducing video generation time while maintaining high quality outputs. In this version, we added the following features: 1. Trained with higher quality datasets for extended iterations. 2. New fp8 and int8 quantized distillation models have been added, which enable fast inference using lightx2v on RTX 4060. Our training code is modified based on the Self-Forcing repository. We extended support for the Wan2.1-14B-I2V-480P model and performed a 4-step bidirectional distillation process. The modified code is available at Self-Forcing-Plus. Our inference framework utilizes lightx2v, a highly efficient inference engine that supports multiple models. This framework significantly accelerates the video generation process while maintaining high quality output. We recommend using the LCM scheduler with the following settings: - `shift=5.0` - `guidancescale=1.0 (i.e., without CFG)` License Agreement The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license. We would like to thank the contributors to the Wan2.1, Self-Forcing repositories, for their open research.