stabilityai

✓ VerifiedAI Startup

Creators of Stable Diffusion and Stable LM

111 models • 36 total models in database
Sort by:

sd-turbo

SD-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation. We release SD-Turbo as a research artifact, and to study small, distilled text-to-image models. For increased quality and prompt understanding, we recommend SDXL-Turbo. Please note: For commercial use, please refer to https://stability.ai/license. Model Description SD-Turbo is a distilled version of Stable Diffusion 2.1, trained for real-time synthesis. SD-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the technical report), which allows sampling large-scale foundational image diffusion models in 1 to 4 steps at high image quality. This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines this with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. - Developed by: Stability AI - Funded by: Stability AI - Model type: Generative text-to-image model - Finetuned from model: Stable Diffusion 2.1 For research purposes, we recommend our `generative-models` Github repository (https://github.com/Stability-AI/generative-models), which implements the most popular diffusion frameworks (both training and inference). - Repository: https://github.com/Stability-AI/generative-models - Paper: https://stability.ai/research/adversarial-diffusion-distillation - Demo [for the bigger SDXL-Turbo]: http://clipdrop.co/stable-diffusion-turbo The charts above evaluate user preference for SD-Turbo over other single- and multi-step models. SD-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-Lora XL and LCM-Lora 1.5. Note: For increased quality, we recommend the bigger version SDXL-Turbo. For details on the user study, we refer to the research paper. The model is intended for both non-commercial and commercial usage. Possible research areas and tasks include - Research on generative models. - Research on real-time applications of generative models. - Research on the impact of real-time generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. For commercial use, please refer to https://stability.ai/membership. SD-Turbo does not make use of `guidancescale` or `negativeprompt`, we disable it with `guidancescale=0.0`. Preferably, the model generates images of size 512x512 but higher image sizes work as well. A single step is enough to generate high quality images. When using SD-Turbo for image-to-image generation, make sure that `numinferencesteps` `strength` is larger or equal to 1. The image-to-image pipeline will run for `int(numinferencesteps strength)` steps, e.g. 0.5 2.0 = 1 step in our example below. The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy. Limitations - The quality and prompt alignment is lower than that of SDXL-Turbo. - The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism. - The model cannot render legible text. - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. The model is intended for both non-commercial and commercial usage. Check out https://github.com/Stability-AI/generative-models

2,754,387
426

stable-diffusion-xl-base-1.0

--- license: openrail++ tags: - text-to-image - stable-diffusion ---

2,710,209
7,121

stable-diffusion-2-1

OpenRail license. Tags: stable-diffusion, text-to-image. Pinned: true.

652,497
4,037

stable-diffusion-2-base

644,755
357

stable-diffusion-3-medium-diffusers

--- license: other license_name: stabilityai-nc-research-community license_link: LICENSE tags: - text-to-image - stable-diffusion extra_gated_prompt: >- By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE) and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy). extra_gated_fields: Name: text Email: text Country: country Organization or Affiliation: text Receive email updates and pro

491,657
419

stable-diffusion-xl-refiner-1.0

--- license: openrail++ tags: - stable-diffusion - image-to-image ---

435,581
1,990

sdxl-turbo

--- pipeline_tag: text-to-image inference: false license: other license_name: sai-nc-community license_link: https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.md ---

407,843
2,479

sdxl-vae

--- license: mit tags: - stable-diffusion - stable-diffusion-diffusers inference: false ---

license:mit
388,599
714

stable-video-diffusion-img2vid

--- pipeline_tag: image-to-video license: other license_name: stable-video-diffusion-community license_link: LICENSE.md ---

381,214
989

stable-diffusion-2-1-base

287,210
695

stable-diffusion-2

217,836
1,922

stable-diffusion-3.5-large

--- license: other license_name: stabilityai-ai-community license_link: LICENSE.md tags: - text-to-image - stable-diffusion - diffusers inference: true extra_gated_prompt: >- By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md) and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy). extra_gated_fields: Name: text Email: text Country: country Organization or Affiliation: text Rec

207,412
3,207

stable-diffusion-3.5-medium

--- license: other license_name: stabilityai-ai-community license_link: LICENSE.md tags: - text-to-image - stable-diffusion - diffusers inference: true extra_gated_prompt: >- By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/blob/main/LICENSE.md) and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy). extra_gated_fields: Name: text Email: text Country: country Organization or Affiliation: text Re

206,457
851

sd-vae-ft-mse

--- license: mit tags: - stable-diffusion - stable-diffusion-diffusers inference: false ---

license:mit
175,698
395

stable-diffusion-2-inpainting

155,784
612

stable-video-diffusion-img2vid-xt

--- pipeline_tag: image-to-video license: other license_name: stable-video-diffusion-community license_link: LICENSE.md ---

111,815
3,185

stable-audio-open-1.0

35,212
1,336

stable-virtual-camera

34,000
219

stable-video-diffusion-img2vid-xt-1-1

29,783
953

stable-diffusion-3.5-large-controlnet-depth

22,585
12

stable-diffusion-3.5-large-controlnet-canny

21,044
12

sd-vae-ft-ema

license:mit
20,868
132

stablelm-3b-4e1t

`StableLM-3B-4E1T` is a 3 billion parameter decoder-only language model pre-trained on 1 trillion tokens of diverse English and code datasets for 4 epochs. Get started generating text with `StableLM-3B-4E1T` by using the following code snippet: Developed by: Stability AI Model type: `StableLM-3B-4E1T` models are auto-regressive language models based on the transformer decoder architecture. Language(s): English Library: GPT-NeoX License: Model checkpoints are licensed under the Creative Commons license (CC BY-SA-4.0). Under this license, you must give credit to Stability AI, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the Stability AI endorses you or your use. Contact: For questions and comments about the model, please email `[email protected]` The model is a decoder-only transformer similar to the LLaMA (Touvron et al., 2023) architecture with the following modifications: | Parameters | Hidden Size | Layers | Heads | Sequence Length | |----------------|-------------|--------|-------|-----------------| | 2,795,443,200 | 2560 | 32 | 32 | 4096 | Position Embeddings: Rotary Position Embeddings (Su et al., 2021) applied to the first 25% of head embedding dimensions for improved throughput following Black et al. (2022). Normalization: LayerNorm (Ba et al., 2016) with learned bias terms as opposed to RMSNorm (Zhang & Sennrich, 2019). Tokenizer: GPT-NeoX (Black et al., 2022). For complete dataset and training details, please see the StableLM-3B-4E1T Technical Report. The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: Falcon RefinedWeb extract (Penedo et al., 2023), RedPajama-Data (Together Computer., 2023) and The Pile (Gao et al., 2020) both without the Books3 subset, and StarCoder (Li et al., 2023). Given the large amount of web data, we recommend fine-tuning the base StableLM-3B-4E1T for your downstream tasks. The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW, and trained using the NeoX tokenizer with a vocabulary size of 50,257. We outline the complete hyperparameters choices in the project's GitHub repository - config. Hardware: `StableLM-3B-4E1T` was trained on the Stability AI cluster across 256 NVIDIA A100 40GB GPUs (AWS P4d instances). Training began on August 23, 2023, and took approximately 30 days to complete. Software: We use a fork of `gpt-neox` (EleutherAI, 2021), train under 2D parallelism (Data and Tensor Parallel) with ZeRO-1 (Rajbhandari et al., 2019), and rely on flash-attention as well as SwiGLU and Rotary Embedding kernels from FlashAttention-2 (Dao et al., 2023) The model is intended to be used as a foundational base model for application-specific fine-tuning. Developers must evaluate and fine-tune the model for safe performance in downstream applications. Limitations and Bias ​ As a base model, this model may exhibit unreliable, unsafe, or other undesirable behaviors that must be corrected through evaluation and fine-tuning prior to deployment. The pre-training dataset may have contained offensive or inappropriate content, even after applying data cleansing filters, which can be reflected in the model-generated text. We recommend that users exercise caution when using these models in production systems. Do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others. Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |---------------------------------|----:| |Avg. |46.58| |AI2 Reasoning Challenge (25-Shot)|46.59| |HellaSwag (10-Shot) |75.94| |MMLU (5-Shot) |45.23| |TruthfulQA (0-shot) |37.20| |Winogrande (5-shot) |71.19| |GSM8k (5-shot) | 3.34|

NaNK
license:cc-by-sa-4.0
16,556
311

stable-diffusion-x4-upscaler

Stable Diffusion x4 upscaler model card This model card focuses on the model associated with the Stable Diffusion Upscaler, available here. This model is trained for 1.25M steps on a 10M subset of LAION containing images `>2048x2048`. The model was trained on crops of size `512x512` and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a `noiselevel` as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule. - Use it with the `stablediffusion` repository: download the `x4-upscaler-ema.ckpt` here. - Use it with 🧨 `diffusers` Model Details - Developed by: Robin Rombach, Patrick Esser - Model type: Diffusion-based text-to-image generation model - Language(s): English - License: CreativeML Open RAIL++-M License - Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H). - Resources for more information: GitHub Repository. - Cite as: @InProceedings{Rombach2022CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } Using the 🤗's Diffusers library to run Stable Diffusion 2 in a simple and efficient manner. Notes: - Despite not being a dependency, we highly recommend you to install xformers for memory efficient attention (better performance) - If you have low GPU RAM available, make sure to add a `pipe.enableattentionslicing()` after sending it to `cuda` for less VRAM usage (to the cost of speed) Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. ### Misuse, Malicious Use, and Out-of-Scope Use Note: This section is originally taken from the DALLE-MINI model card, was used for Stable Diffusion v1, but applies in the same way to Stable Diffusion v2. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a subset of the large-scale dataset LAION-5B, which contains adult, violent and sexual content. To partially mitigate this, we have filtered the dataset using LAION's NFSW detector (see Training section). Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion vw was primarily trained on subsets of LAION-2B(en), which consists of images that are limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent. Training Data The model developers used the following dataset for training the model: - LAION-5B and subsets (details below). The training data is further filtered using LAION's NSFW detector, with a "punsafe" score of 0.1 (conservative). For more details, please refer to LAION-5B's NeurIPS 2022 paper and reviewer discussions on the topic. Training Procedure Stable Diffusion v2 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training, - Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4 - Text prompts are encoded through the OpenCLIP-ViT/H text-encoder. - The output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We also use the so-called v-objective, see https://arxiv.org/abs/2202.00512. - `512-base-ema.ckpt`: 550k steps at resolution `256x256` on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with `punsafe=0.1` and an aesthetic score >= `4.5`. 850k steps at resolution `512x512` on the same dataset with resolution `>= 512x512`. - `768-v-ema.ckpt`: Resumed from `512-base-ema.ckpt` and trained for 150k steps using a v-objective on the same dataset. Resumed for another 140k steps on a `768x768` subset of our dataset. - `512-depth-ema.ckpt`: Resumed from `512-base-ema.ckpt` and finetuned for 200k steps. Added an extra input channel to process the (relative) depth prediction produced by MiDaS (`dpthybrid`) which is used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. - `512-inpainting-ema.ckpt`: Resumed from `512-base-ema.ckpt` and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. The same strategy was used to train the 1.5-inpainting checkpoint. - `x4-upscaling-ema.ckpt`: Trained for 1.25M steps on a 10M subset of LAION containing images `>2048x2048`. The model was trained on crops of size `512x512` and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a `noiselevel` as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule. - Hardware: 32 x 8 x A100 GPUs - Optimizer: AdamW - Gradient Accumulations: 1 - Batch: 32 x 8 x 2 x 4 = 2048 - Learning rate: warmup to 0.0001 for 10,000 steps and then kept constant Evaluation Results Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 steps DDIM sampling steps show the relative improvements of the checkpoints: Evaluated using 50 DDIM steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores. Stable Diffusion v1 Estimated Emissions Based on that information, we estimate the following CO2 emissions using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact. - Hardware Type: A100 PCIe 40GB - Hours used: 200000 - Cloud Provider: AWS - Compute Region: US-east - Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid): 15000 kg CO2 eq. Citation @InProceedings{Rombach2022CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} } This model card was written by: Robin Rombach, Patrick Esser and David Ha and is based on the Stable Diffusion v1 and DALL-E Mini model card.

16,265
711

TripoSR

> Try our new model: SF3D with several improvements such as faster generation and more game-ready assets. > > The model is available here and we also have a demo. TripoSR is a fast and feed-forward 3D generative model developed in collaboration between Stability AI and Tripo AI. We closely follow LRM network architecture for the model design, where TripoSR incorporates a series of technical advancements over the LRM model in terms of both data curation as well as model and training improvements. For more technical details and evaluations, please refer to our tech report. Developed by: Stability AI, Tripo AI Model type: Feed-forward 3D reconstruction from a single image License: MIT Hardware: We train `TripoSR` for 5 days on 22 GPU nodes each with 8 A100 40GB GPUs Repository: https://github.com/VAST-AI-Research/TripoSR Tech report: https://arxiv.org/abs/2403.02151 Demo: https://huggingface.co/spaces/stabilityai/TripoSR We use renders from the Objaverse dataset, utilizing our enhanced rendering method that more closely replicate the distribution of images found in the real world, significantly improving our model’s ability to generalize. We selected a carefully curated subset of the Objaverse dataset for the training data, which is available under the CC-BY license. For usage instructions, please refer to our TripoSR GitHub repository The model should not be used to intentionally create or disseminate 3D models that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.

license:mit
15,708
567

stable-diffusion-3.5-large-turbo

14,450
642

stable-cascade

This model is built upon the Würstchen architecture and its main difference to other models like Stable Diffusion is that it is working at a much smaller latent space. Why is this important? The smaller the latent space, the faster you can run inference and the cheaper the training becomes. How small is the latent space? Stable Diffusion uses a compression factor of 8, resulting in a 1024x1024 image being encoded to 128x128. Stable Cascade achieves a compression factor of 42, meaning that it is possible to encode a 1024x1024 image to 24x24, while maintaining crisp reconstructions. The text-conditional model is then trained in the highly compressed latent space. Previous versions of this architecture, achieved a 16x cost reduction over Stable Diffusion 1.5. Therefore, this kind of model is well suited for usages where efficiency is important. Furthermore, all known extensions like finetuning, LoRA, ControlNet, IP-Adapter, LCM etc. are possible with this method as well. Stable Cascade is a diffusion model trained to generate images given a text prompt. - Developed by: Stability AI - Funded by: Stability AI - Model type: Generative text-to-image model For research purposes, we recommend our `StableCascade` Github repository (https://github.com/Stability-AI/StableCascade). - Repository: https://github.com/Stability-AI/StableCascade - Paper: https://openreview.net/forum?id=gU58d5QeGv Model Overview Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade to generate images, hence the name "Stable Cascade". Stage A & B are used to compress images, similar to what the job of the VAE is in Stable Diffusion. However, with this setup, a much higher compression of images can be achieved. While the Stable Diffusion models use a spatial compression factor of 8, encoding an image with resolution of 1024 x 1024 to 128 x 128, Stable Cascade achieves a compression factor of 42. This encodes a 1024 x 1024 image to 24 x 24, while being able to accurately decode the image. This comes with the great benefit of cheaper training and inference. Furthermore, Stage C is responsible for generating the small 24 x 24 latents given a text prompt. The following picture shows this visually. For this release, we are providing two checkpoints for Stage C, two for Stage B and one for Stage A. Stage C comes with a 1 billion and 3.6 billion parameter version, but we highly recommend using the 3.6 billion version, as most work was put into its finetuning. The two versions for Stage B amount to 700 million and 1.5 billion parameters. Both achieve great results, however the 1.5 billion excels at reconstructing small and fine details. Therefore, you will achieve the best results if you use the larger variant of each. Lastly, Stage A contains 20 million parameters and is fixed due to its small size. According to our evaluation, Stable Cascade performs best in both prompt alignment and aesthetic quality in almost all comparisons. The above picture shows the results from a human evaluation using a mix of parti-prompts (link) and aesthetic prompts. Specifically, Stable Cascade (30 inference steps) was compared against Playground v2 (50 inference steps), SDXL (50 inference steps), SDXL Turbo (1 inference step) and Würstchen v2 (30 inference steps). Note: In order to use the `torch.bfloat16` data type with the `StableCascadeDecoderPipeline` you need to have PyTorch 2.2.0 or higher installed. This also means that using the `StableCascadeCombinedPipeline` with `torch.bfloat16` requires PyTorch 2.2.0 or higher, since it calls the StableCascadeDecoderPipeline internally. If it is not possible to install PyTorch 2.2.0 or higher in your environment, the `StableCascadeDecoderPipeline` can be used on its own with the torch.float16 data type. You can download the full precision or bf16 variant weights for the pipeline and cast the weights to torch.float16. Using the Lite Version of the Stage B and Stage C models Loading the original format checkpoints is supported via `fromsinglefile` method in the StableCascadeUNet. The model is intended for research purposes for now. Possible research areas and tasks include - Research on generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy. Limitations - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. Check out https://github.com/Stability-AI/StableCascade

12,985
1,321

sd-x2-latent-upscaler

This model card focuses on the latent diffusion-based upscaler developed by Katherine Crowson in collaboration with Stability AI. This model was trained on a high-resolution subset of the LAION-2B dataset. It is a diffusion model that operates in the same latent space as the Stable Diffusion model, which is decoded into a full-resolution image. To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it. Note: This upscaling model is designed explicitely for Stable Diffusion as it can upscale Stable Diffusion's latent denoised image embeddings. This allows for very fast text-to-image + upscaling pipelines as all intermeditate states can be kept on GPU. More for information, see example below. This model works on all Stable Diffusion checkpoints | | |:--:| Image by Tanishq Abraham from Stability AI originating from this tweet| Original output image | 2x upscaled output image :-------------------------:|:-------------------------: | Model Details - Developed by: Katherine Crowson - Model type: Diffusion-based latent upscaler - Language(s): English - License: CreativeML Open RAIL++-M License Using the 🤗's Diffusers library to run latent upscaler on top of any `StableDiffusionUpscalePipeline` checkpoint to enhance its output image resolution by a factor of 2. Notes: - Despite not being a dependency, we highly recommend you to install xformers for memory efficient attention (better performance) - If you have low GPU RAM available, make sure to add a `pipe.enableattentionslicing()` after sending it to `cuda` for less VRAM usage (to the cost of speed) Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. ### Misuse, Malicious Use, and Out-of-Scope Use Note: This section is originally taken from the DALLE-MINI model card, was used for Stable Diffusion v1, but applies in the same way to Stable Diffusion v2. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes. Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. Misuse and Malicious Use Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to: - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc. - Intentionally promoting or propagating discriminatory content or harmful stereotypes. - Impersonating individuals without their consent. - Sexual content without consent of the people who might see it. - Mis- and disinformation - Representations of egregious violence and gore - Sharing of copyrighted or licensed material in violation of its terms of use. - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use. - The model does not achieve perfect photorealism - The model cannot render legible text - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The model was trained mainly with English captions and will not work as well in other languages. - The autoencoding part of the model is lossy - The model was trained on a subset of the large-scale dataset LAION-5B, which contains adult, violent and sexual content. To partially mitigate this, we have filtered the dataset using LAION's NFSW detector (see Training section). Bias While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion vw was primarily trained on subsets of LAION-2B(en), which consists of images that are limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts. Stable Diffusion v2 mirrors and exacerbates biases to such a degree that viewer discretion must be advised irrespective of the input or its intent.

12,261
187

stable-diffusion-3-medium

12,027
4,866

stablelm-zephyr-3b

NaNK
11,356
258

stable-code-3b

NaNK
dataset:bigcode/commitpackft
8,018
657

stable-diffusion-3.5-large-tensorrt

--- pipeline_tag: text-to-image inference: false library_name: tensorrt license: other license_name: stabilityai-ai-community license_link: LICENSE.md tags: - tensorrt - sd3.5-large - text-to-image - onnx - model-optimizer - fp8 - quantization extra_gated_prompt: >- By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md) and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy). extra

5,571
43

stablelm-2-1_6b

Please note: For commercial use, please refer to https://stability.ai/license `Stable LM 2 1.6B` is a 1.6 billion parameter decoder-only language model pre-trained on 2 trillion tokens of diverse multilingual and code datasets for two epochs. Get started generating text with `Stable LM 2 1.6B` by using the following code snippet: Developed by: Stability AI Model type: `Stable LM 2 1.6B` models are auto-regressive language models based on the transformer decoder architecture. Language(s): English Paper: Stable LM 2 1.6B Technical Report Library: GPT-NeoX License: Stability AI Community License. Commercial License: to use this model commercially, please refer to https://stability.ai/license Contact: For questions and comments about the model, please email `[email protected]` The model is a decoder-only transformer similar to the LLaMA (Touvron et al., 2023) architecture with the following modifications: | Parameters | Hidden Size | Layers | Heads | Sequence Length | |----------------|-------------|--------|-------|-----------------| | 1,644,417,024 | 2048 | 24 | 32 | 4096 | Position Embeddings: Rotary Position Embeddings (Su et al., 2021) applied to the first 25% of head embedding dimensions for improved throughput following Black et al. (2022). Normalization: LayerNorm (Ba et al., 2016) with learned bias terms as opposed to RMSNorm (Zhang & Sennrich, 2019). Biases: We remove all bias terms from the feed-forward networks and multi-head self-attention layers, except for the biases of the query, key, and value projections (Bai et al., 2023). Tokenizer: We use Arcade100k, a BPE tokenizer extended from OpenAI's `tiktoken.cl100kbase`. We split digits into individual tokens following findings by Liu & Low (2023). The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: Falcon RefinedWeb extract (Penedo et al., 2023), RedPajama-Data (Together Computer., 2023) and The Pile (Gao et al., 2020) both without the Books3 subset, and StarCoder (Li et al., 2023). We further supplement our training with multi-lingual data from CulturaX (Nguyen et al., 2023) and, in particular, from its OSCAR corpora, as well as restructured data in the style of Yuan & Liu (2022). Given the large amount of web data, we recommend fine-tuning the base `Stable LM 2 1.6B` for your downstream tasks. The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW, and trained using the Arcade100k tokenizer with a vocabulary size of 100,352. We outline the complete hyperparameters choices in the project's GitHub repository - config. The final checkpoint of pre-training, before cooldown, is provided in the `globalstep420000` branch. Hardware: `Stable LM 2 1.6B` was trained on the Stability AI cluster across 512 NVIDIA A100 40GB GPUs (AWS P4d instances). Software: We use a fork of `gpt-neox` (EleutherAI, 2021), train under 2D parallelism (Data and Tensor Parallel) with ZeRO-1 (Rajbhandari et al., 2019), and rely on flash-attention as well as SwiGLU and Rotary Embedding kernels from FlashAttention-2 (Dao et al., 2023) The model is intended to be used as a foundational base model for application-specific fine-tuning. Developers must evaluate and fine-tune the model for safe performance in downstream applications. For commercial use, please refer to https://stability.ai/membership. Limitations and Bias ​ As a base model, this model may exhibit unreliable, unsafe, or other undesirable behaviors that must be corrected through evaluation and fine-tuning prior to deployment. The pre-training dataset may have contained offensive or inappropriate content, even after applying data cleansing filters, which can be reflected in the model-generated text. We recommend that users exercise caution when using these models in production systems. Do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others.

NaNK
4,610
192

stablelm-2-zephyr-1_6b

`Stable LM 2 Zephyr 1.6B` is a 1.6 billion parameter instruction tuned language model inspired by HugginFaceH4's Zephyr 7B training pipeline. The model is trained on a mix of publicly available datasets and synthetic datasets, utilizing Direct Preference Optimization (DPO). `StableLM 2 Zephyr 1.6B` uses the following instruction format: This format is also available through the tokenizer's `applychattemplate` method: Developed by: Stability AI Model type: `StableLM 2 Zephyr 1.6B` model is an auto-regressive language model based on the transformer decoder architecture. Language(s): English Paper: Stable LM 2 1.6B Technical Report Library: Alignment Handbook Finetuned from model: https://huggingface.co/stabilityai/stablelm-2-16b License: StabilityAI Non-Commercial Research Community License. If you want to use this model for your commercial products or purposes, please contact us here to learn more. Contact: For questions and comments about the model, please email `[email protected]` The dataset is comprised of a mixture of open datasets large-scale datasets available on the HuggingFace Hub: 1. SFT Datasets - HuggingFaceH4/ultrachat200k - meta-math/MetaMathQA - WizardLM/WizardLMevolinstructV2196k - Open-Orca/SlimOrca - openchat/openchatsharegpt4dataset - LDJnr/Capybara - hkust-nlp/deita-10k-v0 2. Preference Datasets: - allenai/ultrafeedbackbinarizedcleaned - Intel/orcadpopairs | Model | Size | MT-Bench | |-------------------------|------|----------| | Mistral-7B-Instruct-v0.2| 7B | 7.61 | | Llama2-Chat | 70B | 6.86 | | stablelm-zephyr-3b | 3B | 6.64 | | MPT-30B-Chat | 30B | 6.39 | | stablelm-2-zephyr-1.6b | 1.6B | 5.42 | | Falcon-40B-Instruct | 40B | 5.17 | | Qwen-1.8B-Chat | 1.8B | 4.95 | | dolphin-2.6-phi-2 | 2.7B | 4.93 | | phi-2 | 2.7B | 4.29 | | TinyLlama-1.1B-Chat-v1.0| 1.1B | 3.46 | | Model | Size | Average | ARC Challenge (accnorm) | HellaSwag (accnorm) | MMLU (accnorm) | TruthfulQA (mc2) | Winogrande (acc) | Gsm8k (acc) | |----------------------------------------|------|---------|-------------------------|----------------------|-----------------|------------------|------------------|-------------| | microsoft/phi-2 | 2.7B | 61.32% | 61.09% | 75.11% | 58.11% | 44.47% | 74.35% | 54.81% | | stabilityai/stablelm-2-zephyr-16b | 1.6B | 49.89% | 43.69% | 69.34% | 41.85% | 45.21% | 64.09% | 35.18% | | microsoft/phi-15 | 1.3B | 47.69% | 52.90% | 63.79% | 43.89% | 40.89% | 72.22% | 12.43% | | stabilityai/stablelm-2-16b | 1.6B | 45.54% | 43.43% | 70.49% | 38.93% | 36.65% | 65.90% | 17.82% | | mosaicml/mpt-7b | 7B | 44.28% | 47.70% | 77.57% | 30.80% | 33.40% | 72.14% | 4.02% | | KnutJaegersberg/Qwen-18B-Llamaified | 1.8B | 44.75% | 37.71% | 58.87% | 46.37% | 39.41% | 61.72% | 24.41% | | openlm-research/openllama3bv2 | 3B | 40.28% | 40.27% | 71.60% | 27.12% | 34.78% | 67.01% | 0.91% | | iiuae/falcon-rw-1b | 1B | 37.07% | 35.07% | 63.56% | 25.28% | 35.96% | 62.04% | 0.53% | | TinyLlama/TinyLlama-1.1B-3T | 1.1B | 36.40% | 33.79% | 60.31% | 26.04% | 37.32% | 59.51% | 1.44% | Hardware: `StableLM 2 Zephyr 1.6B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes. Code Base: We use our internal script for SFT steps and used HuggingFace Alignment Handbook script for DPO training. The model is intended to be used in chat-like applications. Developers must evaluate the model for safety performance in their specific use case. Read more about safety and limitations below. Limitations and Bias ​ This model is not trained against adversarial inputs. We strongly recommend pairing this model with an input and output classifier to prevent harmful responses. Through our internal red teaming, we discovered that while the model will not output harmful information if not prompted to do so, it will hallucinate many facts. It is also willing to output potentially harmful outputs or misinformation when the user requests it. Using this model will require guardrails around your inputs and outputs to ensure that any outputs returned are not misinformation or harmful. Additionally, as each use case is unique, we recommend running your own suite of tests to ensure proper performance of this model. Finally, do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others.

NaNK
4,503
186

sdxl-turbo-ryzen-ai

4,438
10

stable-fast-3d

3,615
704

stablelm-tuned-alpha-3b

NaNK
license:cc-by-nc-sa-4.0
2,838
110

stable-diffusion-2-1-unclip

2,793
291

stable-code-instruct-3b

NaNK
2,710
177

stable-diffusion-2-depth

2,218
391

stablelm-base-alpha-7b-v2

NaNK
license:cc-by-sa-4.0
1,970
45

stable-cascade-prior

1,928
30

stablelm-tuned-alpha-7b

`StableLM-Tuned-Alpha` is a suite of 3B and 7B parameter decoder-only language models built on top of the `StableLM-Base-Alpha` models and further fine-tuned on various chat and instruction-following datasets. Get started chatting with `StableLM-Tuned-Alpha` by using the following code snippet: StableLM Tuned should be used with prompts formatted to ` ... ... ...` The system prompt is Developed by: Stability AI Model type: StableLM-Tuned-Alpha models are auto-regressive language models based on the NeoX transformer architecture. Language(s): English Library: HuggingFace Transformers License: Fine-tuned checkpoints (`StableLM-Tuned-Alpha`) are licensed under the Non-Commercial Creative Commons license (CC BY-NC-SA-4.0), in-line with the original non-commercial license specified by Stanford Alpaca. Contact: For questions and comments about the model, please email `[email protected]` | Parameters | Hidden Size | Layers | Heads | Sequence Length | |------------|-------------|--------|-------|-----------------| | 3B | 4096 | 16 | 32 | 4096 | | 7B | 6144 | 16 | 48 | 4096 | `StableLM-Tuned-Alpha` models are fine-tuned on a combination of five datasets: Alpaca, a dataset of 52,000 instructions and demonstrations generated by OpenAI's `text-davinci-003` engine. GPT4All Prompt Generations, which consists of 400k prompts and responses generated by GPT-4; Anthropic HH, made up of preferences about AI assistant helpfulness and harmlessness; DataBricks Dolly, comprising 15k instruction/responses generated by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA and summarization; and ShareGPT Vicuna (English subset), a dataset of conversations retrieved from ShareGPT. Models are learned via supervised fine-tuning on the aforementioned datasets, trained in mixed-precision (FP16), and optimized with AdamW. We outline the following hyperparameters: | Parameters | Batch Size | Learning Rate | Warm-up | Weight Decay | Betas | |------------|------------|---------------|---------|--------------|-------------| | 3B | 256 | 2e-5 | 50 | 0.01 | (0.9, 0.99) | | 7B | 128 | 2e-5 | 100 | 0.01 | (0.9, 0.99) | These models are intended to be used by the open-source community chat-like applications in adherence with the CC BY-NC-SA-4.0 license. Although the aforementioned datasets help to steer the base language models into "safer" distributions of text, not all biases and toxicity can be mitigated through fine-tuning. We ask that users be mindful of such potential issues that can arise in generated responses. Do not treat model outputs as substitutes for human judgment or as sources of truth. Please use responsibly. This work would not have been possible without the helpful hand of Dakota Mahan (@dmayhem93).

NaNK
license:cc-by-nc-sa-4.0
1,914
360

StableBeluga-13B

NaNK
llama
1,734
113

japanese-stablelm-base-alpha-7b

NaNK
license:apache-2.0
1,671
121

stable-audio-open-small

1,492
233

StableBeluga-7B

NaNK
llama
1,428
128

stablelm-base-alpha-3b

NaNK
license:cc-by-sa-4.0
1,201
82

japanese-stablelm-instruct-gamma-7b

This is a 7B-parameter decoder-only Japanese language model fine-tuned on instruction-following datasets, built on top of the base model Japanese Stable LM Base Gamma 7B. If you are in search of a smaller model, please check Japanese StableLM-3B-4E1T Instruct. Developed by: Stability AI Model type: `Japanese Stable LM Instruct Gamma 7B` model is an auto-regressive language model based on the transformer decoder architecture. Language(s): Japanese License: This model is licensed under Apache License, Version 2.0. Contact: For questions and comments about the model, please join Stable Community Japan. For future announcements / information about Stability AI models, research, and events, please follow https://twitter.com/StabilityAIJP. For details, please see Mistral AI's paper and release blog post. - Japanese translation of the Databricks Dolly-15k dataset - Japanese translation of the subset of the Anthropic HH dataset - Wikinews subset of the izumi-lab/llm-japanese-dataset The model is intended to be used by all individuals as a foundational model for application-specific fine-tuning without strict limitations on commercial use. The pre-training dataset may have contained offensive or inappropriate content even after applying data cleansing filters which can be reflected in the model-generated text. We recommend users exercise reasonable caution when using these models in production systems. Do not use the model for any applications that may cause harm or distress to individuals or groups. The fine-tuning was carried out by Fujiki Nakamura. Other aspects, including data preparation and evaluation, were handled by the Language Team of Stability AI Japan, notably Meng Lee, Makoto Shing, Paul McCann, Naoki Orii, and Takuya Akiba. This model is based on Mistral-7B-v0.1 released by the Mistral AI team. We are grateful to the Mistral AI team for providing such an excellent base model. We are grateful for the contributions of the EleutherAI Polyglot-JA team in helping us to collect a large amount of pre-training data in Japanese. Polyglot-JA members includes Hyunwoong Ko (Project Lead), Fujiki Nakamura (originally started this project when he commited to the Polyglot team), Yunho Mo, Minji Jung, KeunSeok Im, and Su-Kyeong Jang. We are also appreciative of AI Novelist/Sta (Bit192, Inc.) and the numerous contributors from Stable Community Japan for assisting us in gathering a large amount of high-quality Japanese textual data for model training.

NaNK
license:apache-2.0
1,196
53

japanese-stablelm-base-beta-70b

NaNK
llama
1,181
17

japanese-stablelm-instruct-beta-70b

NaNK
llama
1,178
26

japanese-stablelm-base-gamma-7b

NaNK
license:apache-2.0
1,131
25

stablelm-base-alpha-7b

NaNK
license:cc-by-sa-4.0
1,098
209

sv4d2.0

1,058
61

StableBeluga1-Delta

llama
853
57

StableBeluga2

Use Stable Chat (Research Preview) to test Stability AI's best language models for free `Stable Beluga 2` is a Llama2 70B model finetuned on an Orca style Dataset Start chatting with `Stable Beluga 2` using the following code snippet: Stable Beluga 2 should be used with this prompt format: StableBeluga 1 - Delta StableBeluga 13B StableBeluga 7B Developed by: Stability AI Model type: Stable Beluga 2 is an auto-regressive language model fine-tuned on Llama2 70B. Language(s): English Library: HuggingFace Transformers License: Fine-tuned checkpoints (`Stable Beluga 2`) is licensed under the STABLE BELUGA NON-COMMERCIAL COMMUNITY LICENSE AGREEMENT Contact: For questions and comments about the model, please email `[email protected]` ` Stable Beluga 2` is trained on our internal Orca-style dataset Models are learned via supervised fine-tuning on the aforementioned datasets, trained in mixed-precision (BF16), and optimized with AdamW. We outline the following hyperparameters: | Dataset | Batch Size | Learning Rate |Learning Rate Decay| Warm-up | Weight Decay | Betas | |-------------------|------------|---------------|-------------------|---------|--------------|-------------| | Orca pt1 packed | 256 | 3e-5 | Cosine to 3e-6 | 100 | 1e-6 | (0.9, 0.95) | | Orca pt2 unpacked | 512 | 3e-5 | Cosine to 3e-6 | 100 | 1e-6 | (0.9, 0.95) | Beluga is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Beluga's potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Beluga, developers should perform safety testing and tuning tailored to their specific applications of the model.

llama
840
883

stablecode-completion-alpha-3b-4k

NaNK
license:apache-2.0
773
279

stable-diffusion-3.5-controlnets-tensorrt

766
3

stable-point-aware-3d

690
315

tiny-random-stablelm-2

628
3

Japanese Stable Clip Vit L 16

605
27

japanese-stablelm-3b-4e1t-instruct

NaNK
license:apache-2.0
543
29

japanese-stablelm-3b-4e1t-base

NaNK
license:apache-2.0
512
17

japanese-stablelm-instruct-alpha-7b

NaNK
425
96

japanese-stablelm-instruct-alpha-7b-v2

NaNK
license:apache-2.0
401
22

stable-diffusion-xl-base-0.9

389
1,409

japanese-stablelm-instruct-beta-7b

NaNK
llama
389
17

ar-stablelm-2-chat

388
18

stablelm-2-1_6b-chat

`Stable LM 2 Chat 1.6B` is a 1.6 billion parameter instruction tuned language model inspired by HugginFaceH4's Zephyr 7B training pipeline. The model is trained on a mix of publicly available datasets and synthetic datasets, utilizing Direct Preference Optimization (DPO). `StableLM 2 1.6B Chat` uses the following ChatML format: Developed by: Stability AI Model type: `StableLM 2 Chat 1.6B` model is an auto-regressive language model based on the transformer decoder architecture. Language(s): English Paper: Stable LM 2 1.6B Technical Report Library: Alignment Handbook Finetuned from model: https://huggingface.co/stabilityai/stablelm-2-16b License: StabilityAI Non-Commercial Research Community License. If you want to use this model for your commercial products or purposes, please contact us here to learn more. Contact: For questions and comments about the model, please email `[email protected]` The dataset is comprised of a mixture of open datasets large-scale datasets available on the HuggingFace Hub: 1. SFT Datasets - HuggingFaceH4/ultrachat200k - meta-math/MetaMathQA - WizardLM/WizardLMevolinstructV2196k - Open-Orca/SlimOrca - openchat/openchatsharegpt4dataset - LDJnr/Capybara - hkust-nlp/deita-10k-v0 - teknium/OpenHermes-2.5 2. Preference Datasets: - allenai/ultrafeedbackbinarizedcleaned - Intel/orcadpopairs - argilla/dpo-mix-7k | Model | Size | MT-Bench | |-------------------------|------|----------| | Mistral-7B-Instruct-v0.2| 7B | 7.61 | | Llama2-Chat | 70B | 6.86 | | stablelm-zephyr-3b | 3B | 6.64 | | MPT-30B-Chat | 30B | 6.39 | | stablelm-2-16b-chat | 1.6B | 5.83 | | stablelm-2-zephyr-1.6b | 1.6B | 5.42 | | Falcon-40B-Instruct | 40B | 5.17 | | Qwen-1.8B-Chat | 1.8B | 4.95 | | dolphin-2.6-phi-2 | 2.7B | 4.93 | | phi-2 | 2.7B | 4.29 | | TinyLlama-1.1B-Chat-v1.0| 1.1B | 3.46 | | Model | Size | Average | ARC Challenge (accnorm) | HellaSwag (accnorm) | MMLU (accnorm) | TruthfulQA (mc2) | Winogrande (acc) | Gsm8k (acc) | |----------------------------------------|------|---------|-------------------------|----------------------|-----------------|------------------|------------------|-------------| | microsoft/phi-2 | 2.7B | 61.32% | 61.09% | 75.11% | 58.11% | 44.47% | 74.35% | 54.81% | | stabilityai/stablelm-2-16b-chat | 1.6B | 50.80% | 43.94% | 69.22% | 41.59% | 46.52% | 64.56% | 38.96% | | stabilityai/stablelm-2-zephyr-16b | 1.6B | 49.89% | 43.69% | 69.34% | 41.85% | 45.21% | 64.09% | 35.18% | | microsoft/phi-15 | 1.3B | 47.69% | 52.90% | 63.79% | 43.89% | 40.89% | 72.22% | 12.43% | | stabilityai/stablelm-2-16b | 1.6B | 45.54% | 43.43% | 70.49% | 38.93% | 36.65% | 65.90% | 17.82% | | mosaicml/mpt-7b | 7B | 44.28% | 47.70% | 77.57% | 30.80% | 33.40% | 72.14% | 4.02% | | KnutJaegersberg/Qwen-18B-Llamaified | 1.8B | 44.75% | 37.71% | 58.87% | 46.37% | 39.41% | 61.72% | 24.41% | | openlm-research/openllama3bv2 | 3B | 40.28% | 40.27% | 71.60% | 27.12% | 34.78% | 67.01% | 0.91% | | iiuae/falcon-rw-1b | 1B | 37.07% | 35.07% | 63.56% | 25.28% | 35.96% | 62.04% | 0.53% | | TinyLlama/TinyLlama-1.1B-3T | 1.1B | 36.40% | 33.79% | 60.31% | 26.04% | 37.32% | 59.51% | 1.44% | The model is intended to be used in chat-like applications. Developers must evaluate the model for safety performance in their specific use case. Read more about safety and limitations below. Limitations and Bias ​ This model is not trained against adversarial inputs. We strongly recommend pairing this model with an input and output classifier to prevent harmful responses. Through our internal red teaming, we discovered that while the model will not output harmful information if not prompted to do so, it will hallucinate many facts. It is also willing to output potentially harmful outputs or misinformation when the user requests it. Using this model will require guardrails around your inputs and outputs to ensure that any outputs returned are not misinformation or harmful. Additionally, as each use case is unique, we recommend running your own suite of tests to ensure proper performance of this model. Finally, do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others.

NaNK
387
33

japanese-stablelm-instruct-ja_vocab-beta-7b

NaNK
llama
364
11

japanese-stablelm-base-beta-7b

NaNK
llama
356
14

stable-diffusion-2-1-unclip-small

350
35

japanese-stable-vlm

310
51

japanese-stablelm-base-ja_vocab-beta-7b

NaNK
llama
310
7

Stable Diffusion Xl 1.0 Tensorrt

This repository hosts the TensorRT versions(sdxl, sdxl-lcm, sdxl-lcmlora) of Stable Diffusion XL 1.0 created in collaboration with NVIDIA. The optimized versions give substantial improvements in speed and efficiency. See the usage instructions for how to run the SDXL pipeline with the ONNX files hosted in this repository. - Developed by: Stability AI - Model type: Diffusion-based text-to-image generative model - License: CreativeML Open RAIL++-M License - Model Description: This is a conversion of the SDXL base 1.0 and SDXL refiner 1.0 models for NVIDIA TensorRT optimized inference | Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement | |-------------|--------------------------|-----------------------------|------------------------| | A10 | 9399 ms | 8160 ms | ~13% | | A100 | 3704 ms | 2742 ms | ~26% | | H100 | 2496 ms | 1471 ms | ~41% | | Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement | |-------------|--------------------------|-----------------------------|------------------------| | A10 | 0.10 images/sec | 0.12 images/sec | ~20% | | A100 | 0.27 images/sec | 0.36 images/sec | ~33% | | H100 | 0.40 images/sec | 0.68 images/sec | ~70% | Timings for Latent Consistency Model(LCM) version for 4 steps at 1024x1024 | Accelerator | CLIP | Unet | VAE |Total | |-------------|--------------------------|-----------------------------|------------------------|------------------------| | A100 | 1.08 ms | 192.02 ms | 228.34 ms | 426.16 ms | | H100 | 0.78 ms | 102.8 ms | 126.95 ms | 234.22 ms | 1. Following the setup instructions on launching a TensorRT NGC container. The first invocation produces plan files in `enginexlbase` and `enginexlrefiner` specific to the accelerator being run on and are reused for later invocations. The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations. The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations.

NaNK
289
151

stablelm-2-12b-chat-GGUF

NaNK
281
1

stablelm-base-alpha-3b-v2

NaNK
license:cc-by-sa-4.0
227
27

japanese-stable-diffusion-xl

171
101

stablelm-2-12b-chat

`Stable LM 2 12B Chat` is a 12 billion parameter instruction tuned language model trained on a mix of publicly available datasets and synthetic datasets, utilizing Direct Preference Optimization (DPO). `StableLM 2 12B Chat` uses the following instruction ChatML format. This format is also available through the tokenizer's `applychattemplate` method: StableLM 2 12B Chat also supports function calling. The following is an example of how to use it: Developed by: Stability AI Model type: `StableLM 2 12B Chat` model is an auto-regressive language model based on the transformer decoder architecture. Language(s): English Paper: Stable LM 2 Chat Technical Report Library: Alignment Handbook Finetuned from model: License: StabilityAI Non-Commercial Research Community License. If you want to use this model for your commercial products or purposes, please contact us here to learn more. Contact: For questions and comments about the model, please email `[email protected]`. The dataset is comprised of a mixture of open datasets large-scale datasets available on the HuggingFace Hub as well as an internal safety dataset: 1. SFT Datasets - HuggingFaceH4/ultrachat200k - meta-math/MetaMathQA - WizardLM/WizardLMevolinstructV2196k - Open-Orca/SlimOrca - openchat/openchatsharegpt4dataset - LDJnr/Capybara - hkust-nlp/deita-10k-v0 - teknium/OpenHermes-2.5 - glaiveai/glaive-function-calling-v2 2. Safety Datasets: - Anthropic/hh-rlhf - Internal Safety Dataset | Model | Parameters | MT Bench (Inflection-corrected) | |---------------------------------------|------------|---------------------------------| | mistralai/Mixtral-8x7B-Instruct-v0.1 | 13B/47B | 8.48 ± 0.06 | | stabilityai/stablelm-2-12b-chat | 12B | 8.15 ± 0.08 | | Qwen/Qwen1.5-14B-Chat | 14B | 7.95 ± 0.10 | | HuggingFaceH4/zephyr-7b-gemma-v0.1 | 8.5B | 7.82 ± 0.03 | | mistralai/Mistral-7B-Instruct-v0.2 | 7B | 7.48 ± 0.02 | | meta-llama/Llama-2-70b-chat-hf | 70B | 7.29 ± 0.05 | | Model | Parameters | Average | ARC Challenge (25-shot) | HellaSwag (10-shot) | MMLU (5-shot) | TruthfulQA (0-shot) | Winogrande (5-shot) | GSM8K (5-shot) | | -------------------------------------- | ---------- | ------- | ---------------------- | ------------------- | ------------- | ------------------- | ------------------- | -------------- | | mistralai/Mixtral-8x7B-Instruct-v0.1 | 13B/47B | 72.71 | 70.14 | 87.55 | 71.40 | 64.98 | 81.06 | 61.11 | | stabilityai/stablelm-2-12b-chat | 12B | 68.45 | 65.02 | 86.06 | 61.14 | 62.00 | 78.77 | 57.70 | | Qwen/Qwen1.5-14B | 14B | 66.70 | 56.57 | 81.08 | 69.36 | 52.06 | 73.48 | 67.63 | | mistralai/Mistral-7B-Instruct-v0.2 | 7B | 65.71 | 63.14 | 84.88 | 60.78 | 60.26 | 77.19 | 40.03 | | HuggingFaceH4/zephyr-7b-gemma-v0.1 | 8.5B | 62.41 | 58.45 | 83.48 | 60.68 | 52.07 | 74.19 | 45.56 | | Qwen/Qwen1.5-14B-Chat | 14B | 62.37 | 58.79 | 82.33 | 68.52 | 60.38 | 73.32 | 30.86 | | google/gemma-7b | 8.5B | 63.75 | 61.09 | 82.20 | 64.56 | 44.79 | 79.01 | 50.87 | | stabilityai/stablelm-2-12b | 12B | 63.53 | 58.45 | 84.33 | 62.09 | 48.16 | 78.10 | 56.03 | | mistralai/Mistral-7B-v0.1 | 7B | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 | | meta-llama/Llama-2-13b-hf | 13B | 55.69 | 59.39 | 82.13 | 55.77 | 37.38 | 76.64 | 22.82 | | meta-llama/Llama-2-13b-chat-hf | 13B | 54.92 | 59.04 | 81.94 | 54.64 | 41.12 | 74.51 | 15.24 | The model is intended to be used in chat-like applications. Developers must evaluate the model for safety performance in their specific use case. Read more about safety and limitations below. We strongly recommend pairing this model with an input and output classifier to prevent harmful responses. Using this model will require guardrails around your inputs and outputs to ensure that any outputs returned are not hallucinations. Additionally, as each use case is unique, we recommend running your own suite of tests to ensure proper performance of this model. Finally, do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others.

NaNK
167
87

stablelm-2-12b

`Stable LM 2 12B` is a 12.1 billion parameter decoder-only language model pre-trained on 2 trillion tokens of diverse multilingual and code datasets for two epochs. Please note: For commercial use, please refer to https://stability.ai/license. Get started generating text with `Stable LM 2 12B` by using the following code snippet: Developed by: Stability AI Model type: `Stable LM 2 12B` models are auto-regressive language models based on the transformer decoder architecture. Language(s): English Paper: Stable LM 2 Technical Report Library: GPT-NeoX License: Stability AI Community License. Commercial License: to use this model commercially, please refer to https://stability.ai/license Contact: For questions and comments about the model, please email `[email protected]` The model is a decoder-only transformer with the following architecture: | Parameters | Hidden Size | Layers | Heads | KV Heads | Sequence Length | |----------------|-------------|--------|-------|----------|-----------------| | 12,143,605,760 | 5120 | 40 | 32 | 8 | 4096 | Position Embeddings: Rotary Position Embeddings (Su et al., 2021) applied to the first 25% of head embedding dimensions for improved throughput following Black et al. (2022). Parallel Layers: Parallel attention and feed-forward residual layers with a single input LayerNorm (Wang, 2021). Normalization: LayerNorm (Ba et al., 2016) without biases. Furthermore, we apply per-head QK normalization (Dehghani et al., 2023, Wortsman et al., 2023). Biases: We remove all bias terms from the feed-forward networks and grouped-query self-attention layers. Tokenizer: We use Arcade100k, a BPE tokenizer extended from OpenAI's `tiktoken.cl100kbase`. We split digits into individual tokens following findings by Liu & Low (2023). The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: Falcon RefinedWeb extract (Penedo et al., 2023), RedPajama-Data (Together Computer., 2023) and The Pile (Gao et al., 2020) both without the Books3 subset, and StarCoder (Li et al., 2023). We further supplement our training with multi-lingual data from CulturaX (Nguyen et al., 2023) and, in particular, from its OSCAR corpora, as well as restructured data in the style of Yuan & Liu (2022). Given the large amount of web data, we recommend fine-tuning the base `Stable LM 2 12B` for your downstream tasks. The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW, and trained using the Arcade100k tokenizer with a vocabulary size of 100,352. We outline the complete hyperparameters choices in the project's GitHub repository - config. Hardware: `Stable LM 2 12B` was trained on the Stability AI cluster across 384 NVIDIA H100 GPUs (AWS P5 instances). Software: We use a fork of `gpt-neox` (EleutherAI, 2021), train under 2D parallelism (Data and Tensor Parallel) with ZeRO-1 (Rajbhandari et al., 2019), and rely on flash-attention as well as SwiGLU and Rotary Embedding kernels from FlashAttention-2 (Dao et al., 2023) The model is intended to be used as a foundational base model for application-specific fine-tuning. Developers must evaluate and fine-tune the model for safe performance in downstream applications. For commercial use, please refer to https://stability.ai/membership. Limitations and Bias ​ As a base model, this model may exhibit unreliable, unsafe, or other undesirable behaviors that must be corrected through evaluation and fine-tuning prior to deployment. The pre-training dataset may have contained offensive or inappropriate content, even after applying data cleansing filters, which can be reflected in the model-generated text. We recommend that users exercise caution when using these models in production systems. Do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others.

NaNK
163
120

stablecode-completion-alpha-3b

NaNK
license:apache-2.0
155
117

stable-diffusion-3.5-medium-tensorrt

147
7

sv4d

143
305

stable-codec-speech-16k

99
22

japanese-instructblip-alpha

92
53

stable-diffusion-3.5-large-controlnet-blur

80
12

codellama13b_instruct_260k_synthesis

NaNK
llama
79
1

stable-diffusion-xl-refiner-0.9

63
332

sdxl-turbo-tensorrt

61
37

stablecode-instruct-alpha-3b

NaNK
59
303

stable-diffusion-3.5-controlnets

28
170

stable-diffusion-3-medium-tensorrt

22
150

Stable Video Diffusion Img2vid Xt 1 1 Tensorrt

This repository hosts the TensorRT version of the Stable Video Diffusion (SVD) 1.1 Image-to-Video model. Please see Stable Video Diffusion (SVD) 1.1 Image-to-Video for the full model details. This model is intended for research purposes only and should not be used in any way that violates Stability AI's Acceptable Use Policy. | | A100 80GB PCI | A100 80GB SXM | H100 80GB PCI | |-------------|---------------|---------------|---------------| | VAE Encoder | 66.70 ms | 65.68 ms | 49.07 ms | | CLIP | 105.41 ms | 53.20 ms | 91.32 ms | | UNet x 25 | 30,367.73 ms | 27,489.88 ms | 19,102.98 ms | | VAE Decoder | 4,663.63 ms | 4,544.12 ms | 3,382.62 ms | | Total E2E | 35,258.38 ms | 32,166.41 ms | 22,644.73 ms | 1. Clone TensorRT and this repo then launch NGC container

19
25

sp4d

16
6

japanese-stablelm-2-instruct-1_6b

NaNK
12
27

ar-stablelm-2-base

6
6

sd-vae-ft-mse-original

Utilizing These weights are intended to be used with the original CompVis Stable Diffusion codebase. If you are looking for the model to use with the 🧨 diffusers library, come here. Decoder Finetuning We publish two kl-f8 autoencoder versions, finetuned from the original kl-f8 autoencoder on a 1:1 ratio of LAION-Aesthetics and LAION-Humans, an unreleased subset containing only SFW images of humans. The intent was to fine-tune on the Stable Diffusion training set (the autoencoder was originally trained on OpenImages) but also enrich the dataset with images of humans to improve the reconstruction of faces. The first, ft-EMA, was resumed from the original checkpoint, trained for 313198 steps and uses EMA weights. It uses the same loss configuration as the original checkpoint (L1 + LPIPS). The second, ft-MSE, was resumed from ft-EMA and uses EMA weights and was trained for another 280k steps using a different loss, with more emphasis on MSE reconstruction (MSE + 0.1 LPIPS). It produces somewhat ``smoother'' outputs. The batch size for both versions was 192 (16 A100s, batch size 12 per GPU). To keep compatibility with existing models, only the decoder part was finetuned; the checkpoints can be used as a drop-in replacement for the existing autoencoder.. Evaluation COCO 2017 (256x256, val, 5000 images) | Model | train steps | rFID | PSNR | SSIM | PSIM | Link | Comments |----------|---------|------|--------------|---------------|---------------|-----------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------| | | | | | | | | | | original | 246803 | 4.99 | 23.4 +/- 3.8 | 0.69 +/- 0.14 | 1.01 +/- 0.28 | https://ommer-lab.com/files/latent-diffusion/kl-f8.zip | as used in SD | | ft-EMA | 560001 | 4.42 | 23.8 +/- 3.9 | 0.69 +/- 0.13 | 0.96 +/- 0.27 | https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt | slightly better overall, with EMA | | ft-MSE | 840001 | 4.70 | 24.5 +/- 3.7 | 0.71 +/- 0.13 | 0.92 +/- 0.27 | https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt | resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 LPIPS), smoother outputs | LAION-Aesthetics 5+ (256x256, subset, 10000 images) | Model | train steps | rFID | PSNR | SSIM | PSIM | Link | Comments |----------|-----------|------|--------------|---------------|---------------|-----------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------| | | | | | | | | | | original | 246803 | 2.61 | 26.0 +/- 4.4 | 0.81 +/- 0.12 | 0.75 +/- 0.36 | https://ommer-lab.com/files/latent-diffusion/kl-f8.zip | as used in SD | | ft-EMA | 560001 | 1.77 | 26.7 +/- 4.8 | 0.82 +/- 0.12 | 0.67 +/- 0.34 | https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt | slightly better overall, with EMA | | ft-MSE | 840001 | 1.88 | 27.3 +/- 4.7 | 0.83 +/- 0.11 | 0.65 +/- 0.34 | https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt | resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 LPIPS), smoother outputs | Visual Visualization of reconstructions on 256x256 images from the COCO2017 validation dataset. 256x256: ft-EMA (left), ft-MSE (middle), original (right)

license:mit
2
1,388

stable-codec-speech-16k-base

2
2

ar-stablelm-2-chat-gguf

1
1

control-lora

Introduction By adding low-rank parameter efficient fine tuning to ControlNet, we introduce Control-LoRAs. This approach offers a more efficient and compact method to bring model control to a wider variety of consumer GPUs. - Rank 256 files (reducing the original `4.7GB` ControlNet models down to `~738MB` Control-LoRA models) and experimental - Rank 128 files (reducing to model down to `~377MB`) Each Control-LoRA has been trained on a diverse range of image concepts and aspect ratios. This Control-LoRA utilizes a grayscale depth map for guided generation. Depth estimation is an image processing technique that determines the distance of objects in a scene, providing a depth map that highlights variations in proximity. The model was trained on the depth results of `MiDaS dptbeitlarge512`. It was further finetuned on the `Portrait Depth Estimation` model available in the ClipDrop API by Stability AI. Canny Edge Detection is an image processing technique that identifies abrupt changes in intensity to highlight edges in an image. This Control-LoRA uses the edges from an image to generate the final image. These two Control-LoRAs can be used to colorize images. Recolor is designed to colorize black and white photographs. Sketch is designed to color in drawings input as a white-on-black image (either hand-drawn, or created with a `pidi` edge model). Revision is a novel approach of using images to prompt SDXL. It uses pooled CLIP embeddings to produce images conceptually similar to the input. It can be used either in addition, or to replace text prompts. Revision also includes a blending function for combining multiple image or text concepts, as either positive or negative prompts. Control-LoRAs have been implemented into ComfyUI and StableSwarmUI Basic ComfyUI workflows (using the base model only) are available in this HF repo. Custom nodes from Stability are available here.

0
953

Stable Zero123

Please note: For commercial use, please refer to https://stability.ai/license Stable Zero123 is a model for view-conditioned image generation based on Zero123. With improved data rendering and model conditioning strategies, our model demonstrates improved performance when compared to the original Zero123 and its subsequent iteration, Zero123-XL. By using Score Distillation Sampling (SDS) along with the Stable Zero123 model, we can produce high-quality 3D models from any input image. The process can also extend to text-to-3D generation by first generating a single image using SDXL and then using SDS on Stable Zero123 to generate the 3D object. To enable open research in 3D object generation, we've improved the open-source code of threestudio by supporting Zero123 and Stable Zero123. To use Stable Zero123 for object 3D mesh generation in threestudio, you can follow these steps: 1. Install threestudio using their instructions 2. Download the Stable Zero123 checkpoint `stablezero123.ckpt` into the `load/zero123/` directory 2. Take an image of your choice, or generate it from text using your favourite AI image generator such as Stable Assistant (https://stability.ai/stable-assistant) E.g. "A simple 3D render of a friendly dog" 3. Remove its background using Stable Assistant (https://stability.ai/stable-assistant) 4. Save to `load/images/`, preferably with `rgba.png` as the suffix 5. Run Zero-1-to-3 with the Stable Zero123 ckpt: Developed by: Stability AI Model type: latent diffusion model. Finetuned from model: lambdalabs/sd-image-variations-diffusers License: We released 2 versions of Stable Zero123. Stable Zero123 included some CC-BY-NC 3D objects, so it cannot be used commercially, but can be used for research purposes. It is released under the Stability AI Non-Commercial Research Community License. Stable Zero123C (“C” for “Commercially-available”) was only trained on CC-BY and CC0 3D objects. It is released under StabilityAI Community License. You can read more about the license here. According to our internal tests, both models perform similarly in terms of prediction visual quality. We use renders from the Objaverse dataset, utilizing our enhanced rendering method Hardware: `Stable Zero123` was trained on the Stability AI cluster on a single node with 8 A100 80GBs GPUs. Code Base: We use our modified version of the original zero123 repository. The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.

0
747

sv3d

0
720

cosxl

0
240

sd-vae-ft-ema-original

license:mit
0
158

japanese-stablelm-2-base-1_6b

NaNK
0
13

stable-diffusion-3.5-large-turbo_amdgpu

0
10

stable-diffusion-3-medium_amdgpu

NaNK
0
9

arcade100k

0
3

sdxl-turbo_amdgpu

0
3

stable-diffusion-3.5-medium_amdgpu

0
3

stable-diffusion-3.5-large_amdgpu

0
2