XCLiu
2_rectified_flow_from_sd_1_5
InstaFlow: 2-Rectified Flow fine-tuned from Stable Diffusion v1.5 2-Rectified Flow is a few-step text-to-image generative model fine-tuned from Stabled Diffusion v1.5. We use text-conditioned reflow as described in our paper. Reflow has interesting theoretical properties. You may check this ICLR paper and this arXiv paper. We compare SD 1.5+DPM-Solver and 2-Rectified Flow with random prompts from Diffusion DB using the same random seeds. We observe that 2-Rectiifed Flow is straighter. | | | :---: | | Prompt: a renaissance portrait of dwayne johnson, art in the style of rembrandt. | | | | :---: | | Prompt: a photo of a rabbit head on a grizzly bear body. | Training pipeline: 1. Reflow (Stage 1): We train the model using the text-conditioned reflow objective with a batch size of 64 for 70,000 iterations. The model is initialized from the pre-trained SD 1.5 weights. (11.2 A100 GPU days) 2. Reflow (Stage 2): We continue to train the model using the text-conditioned reflow objective with an increased batch size of 1024 for 25,000 iterations. (64 A100 GPU days) Total Training Cost: It takes 75.2 A100 GPU days to get 2-Rectified Flow. The following metrics of 2-Rectified Flow are measured on MS COCO 2017 with 5000 images and 25-step Euler solver: We evaluate the impact of the guidance scale on 2-Rectified Flow.