duongve
NetaYume-Lumina-Image-2.0
--- pipeline_tag: text-to-image license: apache-2.0 base_model: - neta-art/Neta-Lumina - Alpha-VLLM/Lumina-Image-2.0 tags: - stable-diffusion - text-to-image - comfyui - diffusion-single-file ---
NetaYume-Lumina-Image-2.0-Diffusers-v35-pretrained
System Prompt: This help you generate your desired images more easily by understanding and aligning with your prompts. You are an advanced assistant designed to generate high-quality images from user prompts, utilizing danbooru tags to accurately guide the image creation process. You are an assistant designed to generate high-quality images based on user prompts and danbooru tags. You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts. - CFG: 4–7 - Sampling Steps: 40-50 - Sampler: - Euler a (with scheduler: normal) - resmultistep (with scheduler: linearquadratic) --- 3. Acknowledgments - narugo1992 – for the invaluable Danbooru dataset - Alpha-VLLM - for creating the a wonderful model! - Neta.art and his team – for openly sharing awesome model.
Lumina Yume V0.1
This model is based on Lumina-Image-2.0, which is a DIT model with 2 billions parameter flow-based diffusion transformer. For more information, visit here. This model was trained with the goal of not only generating realistic human images but also producing high-quality anime-style images. Despite being fine-tuned on a specific dataset, it retains a significant amount of knowledge from the base model. Key Features: - Anime Support via Danbooru Tags: Easily generate anime-style images using familiar tagging systems. - Improved Spatial Accuracy: Enhanced ability to place objects and characters correctly based on detailed prompts. - Preserved General Knowledge: Maintains a broad understanding from the base model, ensuring flexibility across domains. Limitations: - Text generation inside images is still inaccurate. - Output image quality is currently moderate and may vary depending on prompts. - Understanding of specific character prompts via Danbooru tags is limited. Notes: - This is an experimental model and not the final release. I plan to update it with improved versions in the future. - This model has been fine-tuned by me to suit my personal preferences. As this is the model I have worked on individually, any feedback or suggestions for improvement would be highly appreciated. Your input will help me enhance future versions of the model. Thank you for your support! - The file LumiYumev0.1bf16.safetensors is an all-in-one file that contains the necessary weights for the VAE, text encoder, and image backbone to be used with ComfyUI. --- 2. Model Components & Training Details - Text Encoder: Pre-trained Gemma-2-2b - Variational Autoencoder: Pre-trained Flux.1 dev's VAE - Image Backbone: Fine-tune Lumina's Image Backbone The model was trained on a dataset containing approximately 30 million images. This dataset includes: - Anime-style images labeled with Danbooru tags - Real human images collected from the internet - Images containing text (primarily short text snippets) - Images annotated with detailed instances location information to enhance spatial understanding System Prompt: This help you generate your desired images more easily by understanding and aligning with your prompts. You are an advanced assistant designed to generate high-quality images from user prompts, utilizing danbooru tags to accurately guide the image creation process. You are an assistant designed to generate high-quality images based on user prompts and danbooru tags. You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts. Recommended Settings - CFG: 3–6 - Sampling Steps: 40-50 - Sampler: Euler a --- 5. Acknowledgments - narugo1992 – for the invaluable Danbooru dataset - Alpha-VLLM - for creating the a wonderful model! - AngelBottomless and his team – for openly sharing their Lumina-Illustrious training experiments, which provided helpful insights during development.