REPA-E
E2e Qwenimage Vae
End-to-End Tuned VAEs for Supercharging Text-to-Image Diffusion Transformers π Project Page   π€ Models   π Paper   Xingjian Leng 1,2   ·   Jaskirat Singh 1   ·   Ryan Murdock 2   ·   Ethan Smith 2   ·   Rebecca Li 2   ·   Saining Xie 3   ·   Liang Zheng 1   1 Australian National University   2 Canva   3 New York University   Done during internship at Canva   π REPA-E Paper   |   π Blog Post   |   π€ Models --> We present REPA-E for T2I, a family of end-to-end tuned VAEs designed to supercharge text-to-image generation training. These models consistently outperform Qwen-Image-VAE across all benchmarks (COCO-30K, DPG-Bench, GenAI-Bench, GenEval, and MJHQ-30K) without requiring any additional representation alignment losses. For training, we adopt the official REPA-E training code to optimize the Qwen-Image-VAE for 80 epochs with a batch size of 256 on the ImageNet-256 dataset. The REPA-E training effectively refines the VAEβs latent-space structure and enables faster convergence in downstream text-to-image latent diffusion model training. This repository provides diffusers -compatible weights for the end-to-end trained Qwen-Image-VAE . In addition, we release end-to-end trained variants of several other widely used VAEs to facilitate research and integration within text-to-image diffusion frameworks. > Use `vae.encode(...)` / `vae.decode(...)` in your pipeline. (A full example is provided below.) | Model | Hugging Face Link | |-------|-------------------| | E2E-FLUX-VAE | π€ REPA-E/e2e-flux-vae | | E2E-SD-3.5-VAE | π€ REPA-E/e2e-sd3.5-vae | | E2E-Qwen-Image-VAE | π€ REPA-E/e2e-qwenimage-vae | π¦ Requirements The following packages are required to load and run the REPA-E VAEs with the `diffusers` library: π Example Usage Below is a minimal example showing how to load and use the REPA-E end-to-end trained Qwen-Image-VAE with `diffusers`:
E2e Flux Vae
e2e-sd3.5-vae
e2e-vavae-hf
e2e-invae-hf
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers Xingjian Leng 1   ·   Jaskirat Singh 1   ·   Yunzhong Hou 1   ·   Zhenchang Xing 2   ·   Saining Xie 3   ·   Liang Zheng 1   1 Australian National University   2 Data61-CSIRO   3 New York University   Project Leads  π Project Page   π€ Models   π Paper   We address a fundamental question: Can latent diffusion models and their VAE tokenizer be trained end-to-end? While training both components jointly with standard diffusion loss is observed to be ineffective β often degrading final performance β we show that this limitation can be overcome using a simple representation-alignment (REPA) loss. Our proposed method, REPA-E, enables stable and effective joint training of both the VAE and the diffusion model. REPA-E significantly accelerates training β achieving over 17Γ speedup compared to REPA and 45Γ over the vanilla training recipe. Interestingly, end-to-end tuning also improves the VAE itself: the resulting E2E-VAE provides better latent structure and serves as a drop-in replacement for existing VAEs (e.g., SD-VAE), improving convergence and generation quality across diverse LDM architectures. Our method achieves state-of-the-art FID scores on ImageNet 256Γ256: 1.12 with CFG and 1.69 without CFG. > New in this release: We are releasing the REPA-E E2E-VAE as a fully Hugging Face AutoencoderKL checkpoint β ready to use with `diffusers` out of the box. We previously released the REPA-E VAE checkpoint, which required loading through the model class in our REPA-E repository. This new version provides a Hugging Faceβcompatible AutoencoderKL checkpoint that can be loaded directly via the `diffusers` API β no extra code or custom wrapper needed. It offers plug-and-play compatibility with diffusion pipelines and can be seamlessly used to build or train new diffusion models. > Use `vae.encode(...)` / `vae.decode(...)` in your pipeline. (A full example is provided below.) π¦ Requirements The following packages are required to load and run the REPA-E VAEs with the `diffusers` library: π Example Usage Below is a minimal example showing how to load and use the REPA-E end-to-end trained IN-VAE with `diffusers`: