xinsir
controlnet-union-sdxl-1.0
ControlNet++: All-in-one ControlNet for image generations and editing! ProMax Model has released!! 12 control + 5 advanced editing, just try it!!! Advantages about the model - Use bucket training like novelai, can generate high resolutions images of any aspect ratio - Use large amount of high quality data(over 10000000 images), the dataset covers a diversity of situation - Use re-captioned prompt like DALLE.3, use CogVLM to generate detailed description, good prompt following ability - Use many useful tricks during training. Including but not limited to date augmentation, mutiple loss, multi resolution - Use almost the same parameter compared with original ControlNet. No obvious increase in network parameter or computation. - Support 10+ control conditions, no obvious performance drop on any single condition compared with training independently - Support multi condition generation, condition fusion is learned during training. No need to set hyperparameter or design prompts. - Compatible with other opensource SDXL models, such as BluePencilXL, CounterfeitXL. Compatible with other Lora models. We design a new architecture that can support 10+ control types in condition text-to-image generation and can generate high resolution images visually comparable with midjourney. The network is based on the original ControlNet architecture, we propose two new modules to: 1 Extend the original ControlNet to support different image conditions using the same network parameter. 2 Support multiple conditions input without increasing computation offload, which is especially important for designers who want to edit image in detail, different conditions use the same condition encoder, without adding extra computations or parameters. We do thoroughly experiments on SDXL and achieve superior performance both in control ability and aesthetic score. We release the method and the model to the open source community to make everyone can enjoy it. Inference scripts and more details can found: https://github.com/xinsir6/ControlNetPlus/tree/main If you find it useful, please give me a star, thank you very much SDXL ProMax version has been released!!!,Enjoy it!!! I am sorry that because of the project's revenue and expenditure are difficult to balance, the GPU resources are assigned to other projects that are more likely to be profitable, the SD3 trainging is stopped until I find enough GPU supprt, I will try my best to find GPUs to continue training. If this brings you inconvenience, I sincerely apologize for that. I want to thank everyone who likes this project, your support is what keeps me going Note: we put the promax model with a promax suffix in the same huggingface model repo, detailed instructions will be added later. Advanced editing features in Promax Model Tile Deblur Tile Super Resolution Following example show from 1M resolution --> 9M resolution
controlnet-openpose-sdxl-1.0
controlnet-tile-sdxl-1.0
support any aspect ratio and any times upscale, followings are 3 3 times code reference: https://huggingface.co/TTPlanet/TTPLanetSDXLControlnetTileRealistic/blob/main/TTPtilepreprocessorv5.py https://github.com/lllyasviel/ControlNet-v1-1-nightly/blob/main/gradiotile.py performance may unstable and next version is optimizing!
controlnet-canny-sdxl-1.0
controlnet-scribble-sdxl-1.0
This is an anyline model that can generate images comparable with midjourney and support any line type and any width! The following five lines are using different control lines, from top to below, Scribble, Canny, HED, PIDI, Lineart General Scribble model that can generate images comparable with midjourney! Hello, I am very happy to announce the controlnet-scribble-sdxl-1.0 model, a very powerful controlnet that can generate high resolution images visually comparable with midjourney. The model was trained with large amount of high quality data(over 10000000 images), with carefully filtered and captioned(powerful vllm model). Besides, useful tricks are applied during the training, including date augmentation, mutiple loss and multi resolution. Note that this model can achieve higher aesthetic performance than our Controlnet-Canny-Sdxl-1.0 model, the model support any type of lines and any width of lines, the sketch can be very simple and so does the prompt. This model is more general and good at generate visual appealing images, The control ability is also strong, for example if you are unstatisfied with some local regions about the generated image, draw a more precise sketch and give a detail prompt will help a lot. Note the model also support lineart or canny lines, you can try it and will get a surpurise!!! - Developed by: xinsir - Model type: ControlNetSDXL - License: apache-2.0 - Finetuned from model [optional]: stabilityai/stable-diffusion-xl-base-1.0 - Paper [optional]: https://arxiv.org/abs/2302.05543 Examples[Note the following examples are all generate using stabilityai/stable-diffusion-xl-base-1.0 and xinsir/controlnet-scribble-sdxl-1.0] prompt: purple feathered eagle with specks of light like stars in feathers. It glows with arcane power prompt: 17 year old girl with long dark hair in the style of realism with fantasy elements, detailed botanical illustrations, barbs and thorns, ethereal, magical, black, purple and maroon, intricate, photorealistic prompt: a logo for a paintball field named district 7 on a white background featuring paintballs the is bright and colourful eye catching and impactuful prompt: a photograph of a handsome crying blonde man with his face painted in the pride flag prompt: concept art, a surreal magical Tome of the Sun God, the book binding appears to be made of solar fire and emits a holy, radiant glow, Age of Wonders, Unreal Engine v5 prompt: black Caribbean man walking balance front his fate chaos anarchy liberty independence force energy independence cinematic surreal beautiful rendition intricate sharp detail 8k prompt: die hard nakatomi plaza, explosion at the top, vector, night scene prompt: solitary glowing yellow tree in a desert. ultra wide shot. night time. hdr photography ```python from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler from controlnetaux import PidiNetDetector, HEDdetector from diffusers.utils import loadimage from huggingfacehub import HfApi from pathlib import Path from PIL import Image import torch import numpy as np import cv2 import os def nms(x, t, s): x = cv2.GaussianBlur(x.astype(np.float32), (0, 0), s) f1 = np.array([[0, 0, 0], [1, 1, 1], [0, 0, 0]], dtype=np.uint8) f2 = np.array([[0, 1, 0], [0, 1, 0], [0, 1, 0]], dtype=np.uint8) f3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.uint8) f4 = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=np.uint8) for f in [f1, f2, f3, f4]: np.putmask(y, cv2.dilate(x, kernel=f) == x, x) z = np.zeroslike(y, dtype=np.uint8) z[y > t] = 255 return z controlnetconditioningscale = 1.0 prompt = "your prompt, the longer the better, you can describe it as detail as possible" negativeprompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality' eulerascheduler = EulerAncestralDiscreteScheduler.frompretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler") controlnet = ControlNetModel.frompretrained( "xinsir/controlnet-scribble-sdxl-1.0", torchdtype=torch.float16 ) when test with other base model, you need to change the vae also. vae = AutoencoderKL.frompretrained("madebyollin/sdxl-vae-fp16-fix", torchdtype=torch.float16) pipe = StableDiffusionXLControlNetPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, vae=vae, safetychecker=None, torchdtype=torch.float16, scheduler=eulerascheduler, ) you can use either hed to generate a fake scribble given an image or a sketch image totally draw by yourself if random.random() > 0.5: # Method 1 # if you use hed, you should provide an image, the image can be real or anime, you extract its hed lines and use it as the scribbles # The detail about hed detect you can refer to https://github.com/lllyasviel/ControlNet/blob/main/gradiofakescribble2image.py # Below is a example using diffusers HED detector # imagepath = Image.open("your image path, the image can be real or anime, HED detector will extract its edge boundery") imagepath = cv2.imread("your image path, the image can be real or anime, HED detector will extract its edge boundery") processor = HEDdetector.frompretrained('lllyasviel/Annotators') controlnetimg = processor(imagepath, scribble=False) controlnetimg.save("a hed detect path for an image") # following is some processing to simulate human sketch draw, different threshold can generate different width of lines controlnetimg = np.array(controlnetimg) controlnetimg = nms(controlnetimg, 127, 3) controlnetimg = cv2.GaussianBlur(controlnetimg, (0, 0), 3) # higher threshold, thiner line randomval = int(round(random.uniform(0.01, 0.10), 2) 255) controlnetimg[controlnetimg > randomval] = 255 controlnetimg[controlnetimg In our evaluation, the model can generate visually appealing images using simple sketch and simple prompt. This model can support any type of lines and any width of lines, using thick line will give a coarse control which obey the prompt your write more, and using thick line will give a strong control which obey the condition image more. The model can help you complish the drawing from coarse to fine, the model achieves higher aesthetic score than xinsir/controlnet-canny-sdxl-1.0, but the control ability will decrease a bit because of thick line.
controlnet-depth-sdxl-1.0
anime-painter
Make everyone an anime painter, even you don't know anything about drawing. Controlnet-scribble-sdxl-1.0-anime This is a controlnet-scribble-sdxl-1.0 model that can generate very high quality images with an anime sketch, it can support any type of and any width of lines. As you can see from the examples that the sketch can be very simple and unclear, we suppose you are just a child or a person know nothing about drawing, you can simple doodle and write some danbooru tags to generate a beautiful anime Illustration. In our evalution, the model achieves state of the art performance, obviously better than the original SDXL1.5 Scribble trained by lvming Zhang[https://github.com/lllyasviel/ControlNet], the model have been trained with complex tricks and high quality dataset, besides the aesthetic score, the prompt following ability[propose by Openai in the paper(https://cdn.openai.com/papers/dall-e-3.pdf)] and the image deformity rate[the probability that the images generate abnormal human struction] also improves a lot. The founder of Midjourney said that: midjourney can help those who don't know drawing to draw, so it expands the boundaries of their imagination. We have the similar vision that: we hope to let those person who don't know anime or cartoons to create their own characters in a simple way, to express yourself and unleash your creativity. AIGC will reshape the animation industry, the model we released can generate anime images with aesthetic score higher than almost all popular anime websites in average, so just enjoy it. If you want to generate especially visually appealing images, you should use danbooru tags along with natural language, due to the reason that the anime images is far less than the real images, you can't just use natural language input like "a girl walk in the street" as the information is limited. Instead you should describe it with more detail such as "a girl, blue shirt, white hair, black eye, smile, pink flower, cherry blossoms ..." In summary, you should first use tags to describle what in the image[danbooru tag] and then describe what happened in the image[natural language], the detail the better. If you don't describe it very clean, the image generated will be something totally by probability, anyway, it will suit the condition image you draw and the edge detection will coincide between the condition and the generated image, the model can understand your drawing from semantics to some degree, and give you a result that is not bad. To the best of our knowledge, we haven't see other SDXL-Scribble model in the opensource community, probably we are the first. Attention To generate anime images with our model, you need to choose an anime sdxl base model from huggingface[https://huggingface.co/models?pipelinetag=text-to-image&sort=trending&search=blue] or civitai[https://civitai.com/search/models?baseModel=SDXL%201.0&sortBy=modelsv8&query=anime]. The showcases we list here is based on CounterfeitXL[https://huggingface.co/gsdf/CounterfeitXL/tree/main], different base model have different image styles and you can use bluepencil or other model as well. The model was trained with large amount of anime images which includes almost all the anime images we can found in the Internet. We filtered it seriously to preserve the images that have high visual quality, comparable to nijijourney or popular anime Illustration. We trained it with controlnet-sdxl-1.0, [https://arxiv.org/abs/2302.05543], the technical detail won't not be disclosed in this report. - Developed by: xinsir - Model type: ControlNetSDXL - License: apache-2.0 - Finetuned from model [optional]: stabilityai/stable-diffusion-xl-base-1.0 - Paper [optional]: https://arxiv.org/abs/2302.05543 - Examples Display prompt: 1girl, breasts, solo, long hair, pointy ears, red eyes, horns, navel, sitting, cleavage, toeless legwear, hair ornament, smoking pipe, oni horns, thighhighs, detached sleeves, looking at viewer, smile, large breasts, holding smoking pipe, wide sleeves, bare shoulders, flower, barefoot, holding, nail polish, black thighhighs, jewelry, hair flower, oni, japanese clothes, fire, kiseru, very long hair, ponytail, black hair, long sleeves, bangs, red nails, closed mouth, toenails, navel cutout, cherry blossoms, water, red dress, fingernails prompt: 1girl, solo, blonde hair, weapon, sword, hair ornament, hair flower, flower, dress, holding weapon, holding sword, holding, gloves, breasts, full body, black dress, thighhighs, looking at viewer, boots, bare shoulders, bangs, medium breasts, standing, black gloves, short hair with long locks, thigh boots, sleeveless dress, elbow gloves, sidelocks, black background, black footwear, yellow eyes, sleeveless prompt: 1girl, solo, holding, white gloves, smile, purple eyes, gloves, closed mouth, balloon, holding microphone, microphone, blue flower, long hair, puffy sleeves, purple flower, blush, puffy short sleeves, short sleeves, bangs, dress, shoes, very long hair, standing, pleated dress, white background, flower, full body, blue footwear, one side up, arm up, hair bun, brown hair, food, mini crown, crown, looking at viewer, hair between eyes, heart balloon, heart, tilted headwear, single side bun, hand up prompt: tiger, 1boy, male focus, blue eyes, braid, animal ears, tiger ears, 2022, solo, smile, chinese zodiac, year of the tiger, looking at viewer, hair over one eye, weapon, holding, white tiger, grin, grey hair, polearm, arm up, white hair, animal, holding weapon, arm behind head, multicolored hair, holding polearm prompt: 1boy, male child, glasses, male focus, shorts, solo, closed eyes, bow, bowtie, smile, open mouth, red bow, jacket, red bowtie, white background, shirt, happy, black shorts, child, simple background, long sleeves, ^^, short hair, white shirt, brown hair, black-framed eyewear, :d, facing viewer, black hair prompt: solo, 1girl, swimsuit, blue eyes, plaid headwear, bikini, blue hair, virtual youtuber, side ponytail, looking at viewer, navel, grey bik ini, ribbon, long hair, parted lips, blue nails, hat, breasts, plaid, hair ribbon, water, arm up, bracelet, star (symbol), cowboy shot, stomach, thigh strap, hair between eyes, beach, small breasts, jewelry, wet, bangs, plaid bikini, nail polish, grey headwear, blue ribbon, adapted costume, choker, ocean, bare shoulders, outdoors, beret prompt: fruit, food, no humans, food focus, cherry, simple background, english text, strawberry, signature, border, artist name, cream prompt: 1girl, solo, ball, swimsuit, bikini, mole, beachball, white bikini, breasts, hairclip, navel, looking at viewer, hair ornament, chromatic aberration, holding, holding ball, pool, cleavage, water, collarbone, mole on breast, blush, bangs, parted lips, bare shoulders, mole on thigh, bare arms, smile, large breasts, blonde hair, halterneck, hair between eyes, stomach from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler from controlnetaux import PidiNetDetector, HEDdetector from diffusers.utils import loadimage from huggingfacehub import HfApi from pathlib import Path from PIL import Image import torch import numpy as np import cv2 import os def nms(x, t, s): x = cv2.GaussianBlur(x.astype(np.float32), (0, 0), s) f1 = np.array([[0, 0, 0], [1, 1, 1], [0, 0, 0]], dtype=np.uint8) f2 = np.array([[0, 1, 0], [0, 1, 0], [0, 1, 0]], dtype=np.uint8) f3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.uint8) f4 = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=np.uint8) for f in [f1, f2, f3, f4]: np.putmask(y, cv2.dilate(x, kernel=f) == x, x) z = np.zeroslike(y, dtype=np.uint8) z[y > t] = 255 return z controlnetconditioningscale = 1.0 prompt = "your prompt, the longer the better, you can describe it as detail as possible" negativeprompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality' eulerascheduler = EulerAncestralDiscreteScheduler.frompretrained("gsdf/CounterfeitXL", subfolder="scheduler") controlnet = ControlNetModel.frompretrained( "xinsir/anime-painter", torchdtype=torch.float16 ) when test with other base model, you need to change the vae also. vae = AutoencoderKL.frompretrained("gsdf/CounterfeitXL", subfolder="vae", torchdtype=torch.float16) pipe = StableDiffusionXLControlNetPipeline.frompretrained( "gsdf/CounterfeitXL", controlnet=controlnet, vae=vae, safetychecker=None, torchdtype=torch.float16, scheduler=eulerascheduler, ) you can use either hed to generate a fake scribble given an image or a sketch image totally draw by yourself if random.random() > 0.5: # Method 1 # if you use hed, you should provide an image, the image can be real or anime, you extract its hed lines and use it as the scribbles # The detail about hed detect you can refer to https://github.com/lllyasviel/ControlNet/blob/main/gradiofakescribble2image.py # Below is a example using diffusers HED detector imagepath = Image.open("your image path, the image can be real or anime, HED detector will extract its edge boundery") processor = HEDdetector.frompretrained('lllyasviel/Annotators') controlnetimg = processor(imagepath, scribble=False) controlnetimg.save("a hed detect path for an image") # following is some processing to simulate human sketch draw, different threshold can generate different width of lines controlnetimg = np.array(controlnetimg) controlnetimg = nms(controlnetimg, 127, 3) controlnetimg = cv2.GaussianBlur(controlnetimg, (0, 0), 3) # higher threshold, thiner line randomval = int(round(random.uniform(0.01, 0.10), 2) 255) controlnetimg[controlnetimg > randomval] = 255 controlnetimg[controlnetimg In our evaluation, the model got better aesthetic score in anime images compared with lllyasviel/controlv11psd15scribble, we want to compare with other sdxl-1.0-scribble model but find nothing, The model is better in control ability when test with perception similarity due to bigger base model and complex data augmentation. Besides, the model has lower rate to generate abnormal images which tend to include some abnormal human structure.