SahilCarterr

10 models • 1 total models in database

Sort by:

Qwen-Image-Distill-Full

This model is a distilled and accelerated version of Qwen-Image. The original model requires 40 inference steps and uses classifier-free guidance (CFG), resulting in a total of 80 forward passes. The distilled accelerated model only requires 15 inference steps and does not need CFG, resulting in only 15 forward passes — achieving about 5× speed-up. Of course, the number of inference steps can be further reduced if needed, but generation quality may decrease. The training framework is built using DiffSynth-Studio. The training dataset consists of 16,000 images generated by the original model using randomly sampled prompts from DiffusionDB. Training was conducted for about 1 day on 8 × MI308X GPUs. | | Original Model | Original Model | Accelerated Model | |-|-|-|-| | Inference Steps | 40 | 15 | 15 | | CFG Scale | 4 | 1 | 1 | | Forward Passes | 80 | 15 | 15 | | Example 1 | | | | | Example 2 | | | | | Example 3 | | | |

—

410

Sentence-Equivalence-Evaluator

license:apache-2.0

codeparrot-ds

NaNK

base_model:TheBloke/Llama-2-7B-Chat-GPTQ

BRIA-3.2

Bria 3.2 is the next-generation commercial-ready text-to-image model. With just 4 billion parameters, it provides exceptional aesthetics and text rendering, evaluated to provide on par results to leading open-source models, and outperforming other licensed models. In addition to being built entirely on licensed data, 3.2 provides several advantages for enterprise and commercial use: Efficient Compute - the model is X3 smaller than the equivalent models in the market (4B parameters vs 12B parameters other open source models) Architecture Consistency: Same architecture as 3.1—ideal for users looking to upgrade without disruption. Fine-tuning Speedup: 2x faster fine-tuning on L40S and A100. BRIA 3.2 is our latest text-to-image model explicitly designed for commercial applications. This model combines technological innovation with ethical responsibility and legal security, setting a new standard in the AI industry. Bria AI licenses the foundation model with full legal liability coverage. Our dataset does not contain copyrighted materials, such as fictional characters, logos, trademarks, public figures, harmful content, or privacy-infringing content. Join our Discord community for more information, tutorials, tools, and to connect with other users! - 65% user preference for BRIA 3.2 over BRIA 3.1. - 76% user preference for BRIA 3.2 over BRIA 2.3. - Superior Text Rendering: The model is optimized to generate short text consists of 1-6 words. OCR Score improvement from 5% (3.1) to 70% (3.2). - Consistent Prompt Alignment: Maintains high-quality textual description adherence. Get Access Bria 3.2 is available everywhere you build, either as source-code and weights, ComfyUI nodes or API endpoints. - API Endpoint: Bria.ai , Fal.ai, Replicate - ComfyUI: Use it in workflows - GitHub: github.com/Bria-AI/BRIA-3.2 - Interested in BRIA 3.2 source code and weights for commercial use? Purchase is required to license BRIA 3.2 got commercial use, ensuring royalty management with our data partners and full liability coverage. - Are you a startup or a student? We encourage you to apply for our Startup Program to request access. This program are designed to support emerging businesses and academic pursuits with our cutting-edge technology. - By submitting the form above, you agree to BRIA’s Privacy policy and Terms & conditions. - Architecture: 4B parameter, rectified flow transformer based model with T5 text encoder. - Legally Compliant: Offers full legal liability coverage for copyright and privacy infringements. Thanks to training on 100% licensed data from leading data partners, we ensure the ethical use of content. - Patented Attribution Engine: Our attribution engine is our way to compensate our data partners, powered by our proprietary and patented algorithms. - Enterprise-Ready: Specifically designed for business applications, Bria AI 3.2 delivers high-quality, compliant imagery for a variety of commercial needs. - Customizable Technology: Provides access to source code and weights for extensive customization, catering to specific business requirements. - Developed by: BRIA AI - Model type: Latent diffusion text-to-image model - Resources for more information: BRIA AI Some tips for using our text-to-image model at inference: 1. Using negative prompt is recommended. 2. For Fine-tuning, use zeros instead of null text embedding. 3. We support multiple aspect ratios, yet resolution should overall consists approximately `10241024=1M` pixels, for example: `((1024,1024), (1280, 768), (1344, 768), (832, 1216), (1152, 832), (1216, 832), (960,1088)` 4. Use 30-50 steps (higher is better) 5. Use `guidancescale` of 5.0

—

Qwen-Image-Blockwise-ControlNet-Canny

This model is a structure control model for images, trained based on Qwen-Image. The model architecture is ControlNet, capable of controlling the generated image structure according to edge detection (Canny) maps. The training framework is built upon DiffSynth-Studio and the dataset used is BLIP3o。 |Structure Map|Generated Image 1|Generated Image 2| |-|-|-| |||| |||| ||||

—

Qwen-Image-Edit-Lowres-Fix

—

Qwen-Image-Blockwise-ControlNet-Inpaint

This model is a local image redraw model trained based on Qwen-Image , with a model structure of ControlNet, capable of redrawing local areas of an image. The training framework is built on DiffSynth-Studio , and the dataset used is Qwen-Image-Self-Generated-Dataset。 This model is compatible with both Qwen-Image and Qwen-Image-Edit，It can perform local redrawing on Qwen-Image and edit specified areas on Qwen-Image-Edit. |Input Prompt|Input Image|Redrawn Image| |-|-|-| |A robot with wings and a hat standing in a colorful garden with flowers and butterflies.||| |A girl in a school uniform stands gracefully in front of a vibrant stained glass window with colorful geometric patterns.||| |A small wooden boat battles against towering, crashing waves in a stormy sea.||| Limitations - Inpaint models based on the ControlNet structure may result in disharmonious boundaries between the redrawn and non-redrawn areas. - The model is trained on rectangular area redraw data, so its generalization to non-rectangular areas might not be optimal.

—

Qwen-Image-Blockwise-ControlNet-Depth

Qwen-Image Image Structure Control Model - Depth ControlNet This model is a structure control model for images, trained based on Qwen-Image .The model architecture is ControlNet, which can control the generated image structure according to the depth (Depth) map .The training framework is built onDiffSynth-Studio and the dataset used is BLIP3o。 |Structure Map|Generated Image 1|Generated Image 2| |-|-|-| |||| |||| ||||

—

Qwen Image In Context Control Union

This model is a LoRA for image structure control, trained based on Qwen-Image, adopting the In Context technical approach. It supports multiple conditions: canny, depth, lineart, softedge, normal, and openpose. The training framework is built upon DiffSynth-Studio , and the dataset used isQwen-Image-Self-Generated-Dataset It is recommended to start the input Prompt with "ContextControl. ". Please note that when using Openpose control, due to the particularity of this type of control, it cannot achieve a similar "point-to-point" control effect as other control types. |Control Condition|Control Image|Generated Image 1|Generated Image 2| |-|-|-|-| |canny|||| |depth|||| |lineart|||| |softedge|||| |normal|||| |openpose||||

—

Just-image-Transformer

—