gpustack

31 models • 1 total models in database

Sort by:

stable-diffusion-v3-5-large-turbo-GGUF

!!! Experimental supported by gpustack/llama-box v0.0.75+ only !!! Model creator: Stability AI Original model: stable-diffusion-3.5-large-turbo GGUF quantization: based on stable-diffusion.cpp ac54e that patched by llama-box. | Quantization | OpenAI CLIP ViT-L/14 Quantization | OpenCLIP ViT-G/14 Quantization | Google T5-xxl Quantization | VAE Quantization | | --- | --- | --- | --- | --- | | FP16 | FP16 | FP16 | FP16 | FP16 | | Q80 | FP16 | FP16 | Q80 | FP16 | | (pure) Q80 | Q80 | Q80 | Q80 | FP16 | | Q41 | FP16 | FP16 | Q80 | FP16 | | Q40 | FP16 | FP16 | Q80 | FP16 | | (pure) Q40 | Q40 | Q40 | Q40 | FP16 | Stable Diffusion 3.5 Large Turbo is a Multimodal Diffusion Transformer (MMDiT) text-to-image model with Adversarial Diffusion Distillation (ADD) that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency, with a focus on fewer inference steps. Please note: This model is released under the Stability Community License. Visit Stability AI to learn or contact us for commercial licensing details. - Developed by: Stability AI - Model type: MMDiT text-to-image generative model - Model Description: This model generates images based on text prompts. It is an ADD-distilled Multimodal Diffusion Transformer that use three fixed, pretrained text encoders, and with QK-normalization. - Community License: Free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in total annual revenue. More details can be found in the Community License Agreement. Read more at https://stability.ai/license. - For individuals and organizations with annual revenue above $1M: Please contact us to get an Enterprise License. For local or self-hosted use, we recommend ComfyUI for node-based UI inference, or diffusers or GitHub for programmatic use. - QK Normalization: Implements the QK normalization technique to improve training Stability. - Adversarial Diffusion Distillation (ADD) (see the technical report), which allows sampling with 4 steps at high image quality. - Text Encoders： - CLIPs: OpenCLIP-ViT/G, CLIP-ViT/L, context length 77 tokens - T5: T5-xxl, context length 77/256 tokens at different stages of training This model was trained on a wide variety of data, including synthetic data and filtered publicly available data. For more technical details of the original MMDiT architecture, please refer to the Research paper. See blog for our study about comparative performance in prompt adherence and aesthetic quality. Using with Diffusers Upgrade to the latest version of the 🧨 diffusers library Reduce your VRAM usage and have the model fit on low VRAM GPUs Intended uses include the following: Generation of artworks and use in design and other artistic processes. Applications in educational or creative tools. Research on generative models, including understanding the limitations of generative models. All uses of the model must be in accordance with our Acceptable Use Policy. The model was not trained to be factual or true representations of people or events. As such, using the model to generate such content is out-of-scope of the abilities of this model. As part of our safety-by-design and responsible AI deployment approach, we take deliberate measures to ensure Integrity starts at the early stages of development. We implement safety measures throughout the development of our models. We have implemented safety mitigations that are intended to reduce the risk of certain harms, however we recommend that developers conduct their own testing and apply additional mitigations based on their specific use cases. For more about our approach to Safety, please visit our Safety page. Our integrity evaluation methods include structured evaluations and red-teaming testing for certain harms. Testing was conducted primarily in English and may not cover all possible harms. Harmful content: We have used filtered data sets when training our models and implemented safeguards that attempt to strike the right balance between usefulness and preventing harm. However, this does not guarantee that all possible harmful content has been removed. TAll developers and deployers should exercise caution and implement content safety guardrails based on their specific product policies and application use cases. Misuse: Technical limitations and developer and end-user education can help mitigate against malicious applications of models. All users are required to adhere to our Acceptable Use Policy, including when applying fine-tuning and prompt engineering mechanisms. Please reference the Stability AI Acceptable Use Policy for information on violative uses of our products. Privacy violations: Developers and deployers are encouraged to adhere to privacy regulations with techniques that respect data privacy. Please report any issues with the model or contact us: Safety issues: [email protected] Security issues: [email protected] Privacy issues: [email protected] License and general: https://stability.ai/license Enterprise license: https://stability.ai/enterprise

—

17,618

stable-diffusion-xl-base-1.0-GGUF

!!! Experimental supported by gpustack/llama-box v0.0.75+ only !!! Model creator: Stability AI Original model: stable-diffusion-xl-base-1.0 GGUF quantization: based on stable-diffusion.cpp ac54e that patched by llama-box. VAE From: madebyollin/sdxl-vae-fp16-fix. | Quantization | OpenAI CLIP ViT-L/14 Quantization | OpenCLIP ViT-G/14 Quantization | VAE Quantization | | --- | --- | --- | --- | | FP16 | FP16 | FP16 | FP16 | | Q80 | FP16 | FP16 | FP16 | | Q41 | FP16 | FP16 | FP16 | | Q40 | FP16 | FP16 | FP16 | SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/) specialized for the final denoising steps. Note that the base model can be used as a standalone module. Alternatively, we can use a two-stage pipeline as follows: First, the base model is used to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as "img2img") to the latents generated in the first step, using the same prompt. This technique is slightly slower than the first one, as it requires more function evaluations. Source code is available at https://github.com/Stability-AI/generative-models . - Developed by: Stability AI - Model type: Diffusion-based text-to-image generative model - License: CreativeML Open RAIL++-M License - Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). - Resources for more information: Check out our GitHub Repository and the SDXL report on arXiv. For research purposes, we recommend our `generative-models` Github repository (https://github.com/Stability-AI/generative-models), which implements the most popular diffusion frameworks (both training and inference) and for which new functionalities like distillation will be added over time. Clipdrop provides free SDXL inference. - Repository: https://github.com/Stability-AI/generative-models - Demo: https://clipdrop.co/stable-diffusion The chart above evaluates user preference for SDXL (with and without refinement) over SDXL 0.9 and Stable Diffusion 1.5 and 2.1. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance. In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark: py pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) diff - pipe.to("cuda") + pipe.enablemodelcpuoffload() bash pip install optimum[openvino] diff - from diffusers import StableDiffusionXLPipeline + from optimum.intel import OVStableDiffusionXLPipeline modelid = "stabilityai/stable-diffusion-xl-base-1.0" - pipeline = StableDiffusionXLPipeline.frompretrained(modelid) + pipeline = OVStableDiffusionXLPipeline.frompretrained(modelid) prompt = "A majestic lion jumping from a big stone at night" image = pipeline(prompt).images[0] bash pip install optimum[onnxruntime] diff - from diffusers import StableDiffusionXLPipeline + from optimum.onnxruntime import ORTStableDiffusionXLPipeline modelid = "stabilityai/stable-diffusion-xl-base-1.0" - pipeline = StableDiffusionXLPipeline.frompretrained(modelid) + pipeline = ORTStableDiffusionXLPipeline.frompretrained(modelid) prompt = "A majestic lion jumping from a big stone at night" image = pipeline(prompt).images[0] ``` You can find more examples in optimum documentation. The model is intended for research purposes only. Possible research areas and tasks include - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. - The model does not achieve perfect photorealism - The model cannot render legible text - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

—

1,982