mit-han-lab
svdq-int4-flux.1-fill-dev
This repository has been deprecated and will be hidden in December 2025. Please use https://huggingface.co/nunchaku-tech/nunchaku-flux.1-fill-dev. The FLUX.1 [dev] Model is licensed by Black Forest Labs Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs Inc. IN NO EVENT SHALL BLACK FOREST LABS INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.
nunchaku-flux.1-kontext-dev
This repository has been migrated to https://huggingface.co/nunchaku-tech/nunchaku-flux.1-kontext-dev and will be hidden in December 2025. This repository contains Nunchaku-quantized versions of FLUX.1-Kontext-dev, capable of editing images based on text instructions. It is optimized for efficient inference while maintaining minimal loss in performance. - Developed by: Nunchaku Team - Model type: image-to-image - License: flux-1-dev-non-commercial-license - Quantized from model: FLUX.1-Kontext-dev - `svdq-int4r32-flux.1-kontext-dev.safetensors`: SVDQuant quantized INT4 FLUX.1-Kontext-dev model. For users with non-Blackwell GPUs (pre-50-series). - `svdq-fp4r32-flux.1-kontext-dev.safetensors`: SVDQuant quantized NVFP4 FLUX.1-Kontext-dev model. For users with Blackwell GPUs (50-series). - Inference Engine: nunchaku - Quantization Library: deepcompressor - Paper: SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - Demo: svdquant.mit.edu - Diffusers Usage: See flux.1-kontext-dev.py. Check our tutorial for more advanced usage. - ComfyUI Usage: See nunchaku-flux.1-kontext-dev.json. The FLUX.1 [dev] Model is licensed by Black Forest Labs Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs Inc. IN NO EVENT SHALL BLACK FOREST LABS INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.
dc-ae-f32c32-sana-1.0
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models Figure 1: We address the reconstruction accuracy drop of high spatial-compression autoencoders. Figure 2: DC-AE delivers significant training and inference speedup without performance drop. Figure 3: DC-AE enables efficient text-to-image generation on the laptop. We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. If DC-AE is useful or relevant to your research, please kindly recognize our contributions by citing our papers:
nunchaku-flux.1-dev
svdq-int4-flux.1-schnell
StreamingVLM
dc-ae-f64c128-in-1.0-diffusers
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models Figure 1: We address the reconstruction accuracy drop of high spatial-compression autoencoders. Figure 2: DC-AE delivers significant training and inference speedup without performance drop. Figure 3: DC-AE enables efficient text-to-image generation on the laptop. We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. If DC-AE is useful or relevant to your research, please kindly recognize our contributions by citing our papers:
dc-ae-f32c32-mix-1.0
dc-ae-f32c32-sana-1.1-diffusers
dc-ae-f32c32-in-1.0
nunchaku-flux.1-fill-dev
This repository has been migrated to https://huggingface.co/nunchaku-tech/nunchaku-flux.1-fill-dev and will be hidden in December 2025. This repository contains Nunchaku-quantized versions of FLUX.1-Fill-dev, capable of filling areas in existing images based on a text description. It is optimized for efficient inference while maintaining minimal loss in performance. - Developed by: Nunchaku Team - Model type: image-to-image - License: flux-1-dev-non-commercial-license - Quantized from model: FLUX.1-Fill-dev - `svdq-int4r32-flux.1-fill-dev.safetensors`: SVDQuant quantized INT4 FLUX.1-Fill-dev model. For users with non-Blackwell GPUs (pre-50-series). - `svdq-fp4r32-flux.1-fill-dev.safetensors`: SVDQuant quantized NVFP4 FLUX.1-Fill-dev model. For users with Blackwell GPUs (50-series). - Inference Engine: nunchaku - Quantization Library: deepcompressor - Paper: SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - Demo: svdquant.mit.edu - Diffusers Usage: See flux.1-fill-dev.py. Check our tutorial for more advanced usage. - ComfyUI Usage: See nunchaku-flux.1-fill-dev.json. The FLUX.1 [dev] Model is licensed by Black Forest Labs Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs Inc. IN NO EVENT SHALL BLACK FOREST LABS INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.
svdq-int4-flux.1-dev
This repository has been deprecated and will be hidden in December 2025. Please use https://huggingface.co/nunchaku-tech/nunchaku-flux.1-dev. The FLUX.1 [dev] Model is licensed by Black Forest Labs Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs Inc. IN NO EVENT SHALL BLACK FOREST LABS INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.
vila-u-7b-256
dc-ae-f32c32-sana-1.1
dc-ae-f64c128-in-1.0
Qwen2.5-32B-Eagle-RL
nunchaku-flux.1-schnell
This repository has been migrated to https://huggingface.co/nunchaku-tech/nunchaku-flux.1-schnell and will be hidden in December 2025. This repository contains Nunchaku-quantized versions of FLUX.1-schnell, designed to generate high-quality images from text prompts. It is optimized for efficient inference while maintaining minimal loss in performance. - Developed by: Nunchaku Team - Model type: text-to-image - License: apache-2.0 - Quantized from model: FLUX.1-schnell - `svdq-int4r32-flux.1-schnell.safetensors`: SVDQuant quantized INT4 FLUX.1-schnell model. For users with non-Blackwell GPUs (pre-50-series). - `svdq-fp4r32-flux.1-schnell.safetensors`: SVDQuant quantized NVFP4 FLUX.1-schnell model. For users with Blackwell GPUs (50-series). - Inference Engine: nunchaku - Quantization Library: deepcompressor - Paper: SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - Demo: svdquant.mit.edu - Diffusers Usage: See flux.1-schnell.py. Check our tutorial for more advanced usage. - ComfyUI Usage: See nunchaku-flux.1-schnell.json.
dc-ae-f64c128-mix-1.0
nunchaku-flux.1-canny-dev
This repository has been migrated to https://huggingface.co/nunchaku-tech/nunchaku-flux.1-canny-dev and will be hidden in December 2025. This repository contains Nunchaku-quantized versions of FLUX.1-Canny-dev, capable of generating an image based on a text description while following the structure of a given input image. It is optimized for efficient inference while maintaining minimal loss in performance. - Developed by: Nunchaku Team - Model type: image-to-image - License: flux-1-dev-non-commercial-license - Quantized from model: FLUX.1-Canny-dev - `svdq-int4r32-flux.1-canny-dev.safetensors`: SVDQuant quantized INT4 FLUX.1-Canny-dev model. For users with non-Blackwell GPUs (pre-50-series). - `svdq-fp4r32-flux.1-canny-dev.safetensors`: SVDQuant quantized NVFP4 FLUX.1-Canny-dev model. For users with Blackwell GPUs (50-series). - Inference Engine: nunchaku - Quantization Library: deepcompressor - Paper: SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - Demo: svdquant.mit.edu - Diffusers Usage: See flux.1-canny-dev.py. Check our tutorial for more advanced usage. - ComfyUI Usage: See nunchaku-flux.1-canny-dev.json. The FLUX.1 [dev] Model is licensed by Black Forest Labs Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs Inc. IN NO EVENT SHALL BLACK FOREST LABS INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.
dc-ae-f32c32-sana-1.0-diffusers
svdq-fp4-flux.1-dev
Qwen2-VL-1.5B-Instruct
Nunchaku Flux.1 Depth Dev
This repository has been migrated to https://huggingface.co/nunchaku-tech/nunchaku-flux.1-depth-dev and will be hidden in December 2025. This repository contains Nunchaku-quantized versions of FLUX.1-Depth-dev, capable of generating an image based on a text description while following the structure of a given input image. It is optimized for efficient inference while maintaining minimal loss in performance. - Developed by: Nunchaku Team - Model type: image-to-image - License: flux-1-dev-non-commercial-license - Quantized from model: FLUX.1-Depth-dev - `svdq-int4r32-flux.1-depth-dev.safetensors`: SVDQuant quantized INT4 FLUX.1-Depth-dev model. For users with non-Blackwell GPUs (pre-50-series). - `svdq-fp4r32-flux.1-depth-dev.safetensors`: SVDQuant quantized NVFP4 FLUX.1-Depth-dev model. For users with Blackwell GPUs (50-series). - Inference Engine: nunchaku - Quantization Library: deepcompressor - Paper: SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - Demo: svdquant.mit.edu - Diffusers Usage: See flux.1-depth-dev.py. Check our tutorial for more advanced usage. - ComfyUI Usage: See nunchaku-flux.1-depth-dev.json. The FLUX.1 [dev] Model is licensed by Black Forest Labs Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs Inc. IN NO EVENT SHALL BLACK FOREST LABS INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.
Qwen2.5-7B-Eagle-RL
svdq-int4-flux.1-depth-dev
nunchaku-shuttle-jaguar
svdq-fp4-flux.1-schnell
svdq-int4-flux.1-canny-dev
dc-ae-f32c32-in-1.0-256px
svdq-flux.1-schnell-pix2pix-turbo
opt-1.3b-smoothquant
svdq-fp4-flux.1-fill-dev
dc-ae-f128c512-in-1.0
dc-ae-lite-f32c32-sana-1.1-diffusers
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models Figure 1: We address the reconstruction accuracy drop of high spatial-compression autoencoders. Figure 2: DC-AE delivers significant training and inference speedup without performance drop. Figure 3: DC-AE enables efficient text-to-image generation on the laptop. We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. If DC-AE is useful or relevant to your research, please kindly recognize our contributions by citing our papers:
dc-ae-f64c128-in-1.0-uvit-h-in-512px-train2000k
svdq-int4-sana-1600m
nunchaku-sana
svdq-fp4-flux.1-depth-dev
svdq-fp4-shuttle-jaguar
dc-ae-f32c32-in-1.0-diffusers
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models Figure 1: We address the reconstruction accuracy drop of high spatial-compression autoencoders. Figure 2: DC-AE delivers significant training and inference speedup without performance drop. Figure 3: DC-AE enables efficient text-to-image generation on the laptop. We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. If DC-AE is useful or relevant to your research, please kindly recognize our contributions by citing our papers:
dc-ae-f64c128-in-1.0-uvit-h-in-512px
dc-ae-f64c128-mix-1.0-diffusers
dc-ae-f128c512-mix-1.0
dc-ae-f32c32-mix-1.0-diffusers
svdq-int4-shuttle-jaguar
dc-ae-f128c512-mix-1.0-diffusers
opt-125m-smoothquant
dc-ae-f32c32-in-1.0-usit-2b-in-512px
svdq-fp4-flux.1-canny-dev
dc-ae-lite-f32c32-sana-1.1
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models Figure 1: We address the reconstruction accuracy drop of high spatial-compression autoencoders. Figure 2: DC-AE delivers significant training and inference speedup without performance drop. Figure 3: DC-AE enables efficient text-to-image generation on the laptop. We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. If DC-AE is useful or relevant to your research, please kindly recognize our contributions by citing our papers:
dc-ae-f128c512-in-1.0-diffusers
Llama-3-8B-Instruct-QServe
opt-13b-smoothquant
Llama-3-8B-Instruct-QServe-g128
nunchaku-t5
This repository has been migrated to https://huggingface.co/nunchaku-tech/nunchaku-t5 and will be hidden in December 2025. This repository contains Nunchaku-quantized versions of T5-XXL, used to encode text prompt to the embeddings. It is used to reduce the memory footprint of the model. - Developed by: Nunchaku Team - Model type: text-generation - License: apache-2.0 - Quantized from model: t5v11xxl - `awq-int4-flux.1-t5xxl.safetensors`: AWQ quantized W4A16 T5-XXL model for FLUX.1. - Inference Engine: nunchaku - Quantization Library: deepcompressor - Paper: SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - Demo: svdquant.mit.edu - Diffusers Usage: See flux.1-dev-qencoder.py. Check our tutorial for more advanced usage. - ComfyUI Usage: See nunchaku-flux.1-dev-qencoder.json.
Llama-3-8B-Instruct-QServe-W8A8
opt-6.7b-smoothquant
opt-30b-smoothquant
vicuna-13b-v1.3-4bit-g128-awq
Mistral-7B-v0.1-QServe
Llama-3-8B-QServe-g128
dc-ae-f32c32-in-1.0-dit-xl-in-512px
Llama-2-7B-QServe-g128
Llama-2-13B-QServe-g128
dc-ae-f32c32-in-1.0-uvit-s-in-512px
dc-ae-f32c32-in-1.0-usit-h-in-512px
dc-ae-f32c32-in-1.0-sit-xl-in-512px
Llama-3-8B-Instruct-Gradient-1048k-w8a8-per-channel-kv8-per-tensor
dc-ae-f32c32-in-1.0-uvit-h-in-512px
dc-ae-f64c128-in-1.0-uvit-2b-in-512px-train2000k
Mistral-7B-v0.1-QServe-g128
Yi-34B-QServe-g128
dc-ae-f64c128-in-1.0-uvit-2b-in-512px
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models Figure 1: We address the reconstruction accuracy drop of high spatial-compression autoencoders. Figure 2: DC-AE delivers significant training and inference speedup without performance drop. Figure 3: DC-AE enables efficient text-to-image generation on the laptop. We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. If DC-AE is useful or relevant to your research, please kindly recognize our contributions by citing our papers:
dc-ae-f32c32-in-1.0-dit-xl-in-512px-trainbs1024
Llama-2-7B-QServe
Llama-3-8B-QServe
vicuna-7b-v1.5-QServe
vicuna-13b-v1.5-QServe
vicuna-7b-v1.5-QServe-g128
Llama-3-8B-Instruct-Gradient-1048k-w8a8kv4-per-channel
Llama-3-8B-Instruct-Gradient-4194k-w8a8kv4-per-channel
dc-ae-f32c32-in-1.0-uvit-2b-in-512px
dc-ae-f32c32-in-1.0-sana-cls-xl-in-512px
nunchaku
This repository has been migrated to https://huggingface.co/nunchaku-tech/nunchaku and will be hidden in December 2025. This repository provides pre-built wheels for nunchaku for both Linux and Windows platforms. For detailed information about available wheels, please visit our GitHub Releases page.