jimmycarter
flux-training
LibreFLUX
LibreFLUX is an Apache 2.0 version of FLUX.1-schnell that provides a full T5 context length, uses attention masking, has classifier free guidance restored, and has had most of the FLUX aesthetic fine-tuning/DPO fully removed. That means it's a lot uglier than base flux, but it has the potential to be more easily finetuned to any new distribution. It keeps in mind the core tenets of open source software, that it should be difficult to use, slower and clunkier than a proprietary solution, and have an aesthetic trapped somewhere inside the early 2000s. > The image features a man standing confidently, wearing a simple t-shirt with a humorous and quirky message printed across the front. The t-shirt reads: "I de-distilled FLUX schnell into a slow, ugly model and all I got was this stupid t-shirt." The man’s expression suggests a mix of pride and irony, as if he's aware of the complexity behind the statement, yet amused by the underwhelming reward. The background is neutral, keeping the focus on the man and his t-shirt, which pokes fun at the frustrating and often anticlimactic nature of technical processes or complex problem-solving, distilled into a comically understated punchline. - LibreFLUX: A free, de-distilled FLUX model - Usage - Inference - Fine-tuning - Non-technical Report on Schnell De-distillation - Why - Restoring the Original Training Objective - FLUX and Attention Masking - Make De-distillation Go Fast and Fit in Small GPUs - Selecting Better Layers to Train with LoKr - Beta Timestep Scheduling and Timestep Stratification - Datasets - Training - Post-hoc "EMA" - Results - Closing Thoughts - Contacting Me and Grants - Citation To use the model, just call the custom pipeline using diffusers. It currently works with `diffusers==0.30.3` and will be updated to the latest diffusers soon. The model works best with a CFG scale of 2.0 to 5.0, so if you are getting images with a blur or strange shadows try turning down your CFG scale (`guidancescale` in diffusers). Alternatively, you can also use higher CFG scales if you turn it off during the first couple of timesteps (`nocfguntiltimestep=2` in the custom pipeline). py def approximatenormaltensor(inp, target, scale=1.0): tensor = torch.randnlike(target) desirednorm = inp.norm() desiredmean = inp.mean() desiredstd = inp.std() currentnorm = tensor.norm() tensor = tensor (desirednorm / currentnorm) currentstd = tensor.std() tensor = tensor (desiredstd / currentstd) tensor = tensor - tensor.mean() + desiredmean tensor.mul(scale) def initlokrnetworkwithperturbednormal(lycoris, scale=1e-3): with torch.nograd(): for lora in lycoris.loras: lora.lokrw1.fill(1.0) approximatenormaltensor(lora.orgweight, lora.lokrw2, scale=scale) py from scipy.stats import beta as spbeta alpha = 2.0 beta = 1.6 numprocesses = self.accelerator.numprocesses processindex = self.accelerator.processindex totalbsz = numprocesses bsz startidx = processindex bsz endidx = (processindex + 1) bsz indices = torch.arange(startidx, endidx, dtype=torch.float64) u = torch.rand(bsz) p = (indices + u) / totalbsz sigmas = torch.fromnumpy( spbeta.ppf(p.numpy(), a=alpha, b=beta) ).to(device=self.accelerator.device) py firstcheckpointfile = checkpointfiles[0] emastatedict = loadfile(firstcheckpointfile) for checkpointfile in checkpointfiles[1:]: newstatedict = loadfile(checkpointfile) for k in emastatedict.keys(): emastatedict[k] = torch.lerp( emastatedict[k], newstatedict[k], alpha, ) outputfile = os.path.join(outputfolder, f"alphalinear{alpha}.safetensors") savefile(emastatedict, outputfile) @misc{libreflux, author = {James Carter}, title = {LibreFLUX: A free, de-distilled FLUX model}, year = {2024}, publisher = {Huggingface}, journal = {Huggingface repository}, howpublished = {\url{https://huggingface.co/datasets/jimmycarter/libreflux}}, } ```