CLIP-GmP-ViT-L-14

Name: CLIP-GmP-ViT-L-14
Author: zer0int

7.8K

509

Small context

14.0B

float32

license:mit

zer0int

Image Model

OTHER

14B params

New

8K downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

32GB+ RAM

Mobile

Laptop

Server

Quick Summary

🔥 Update SUMMER 2025: 🔥 🤖 New and greatly improved version of the model, check out: - 🌑 https://huggingface.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

14GB+ RAM

Code Examples

text

</details>

-------
- Want to feed it yourself? All code for fine-tuning and much more is on [my GitHub](https://github.com/zer0int).
-----
## Update 23/SEP/2024:
- Huggingface Transformers / Diffusers pipeline now implemented.
- See here for an example script: [Integrating my CLIP-L with Flux.1](https://github.com/zer0int/CLIP-txt2img-diffusers-scripts)
- Otherwise, use as normal / any HF model:

text

## Update 03/SEP/2024 / edit 05/AUG:

## 👋 Looking for a Text Encoder for Flux.1 (or SD3, SDXL, SD, ...) to replace CLIP-L? 👀
You'll generally want the "TE-only" .safetensors:

- 👉 The "TEXT" model has superior prompt following, especially for text, but also for other details. [DOWNLOAD](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF.safetensors)
- 👉 The "SMOOTH" model can sometimes** have better details (when there's no text in the image). [DOWNLOAD](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-BEST-smooth-GmP-TE-only-HF-format.safetensors)
- The "GmP" initial fine-tune is deprecated / inferior to the above models. Still, you can [DOWNLOAD](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-GmP-ft-TE-only-HF-format.safetensors) it.

**: The "TEXT" model is the best for text. Full stop. But whether the "SMOOTH" model is better for your (text-free) scenario than the "TEXT" model really depends on the specific prompt. It might also be the case that the "TEXT" model leads to images that you prefer over "SMOOTH"; the only way to know is to experiment with both.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/y-B-FimzahYqskNr2MV1C.png)

## 🤓👨‍💻 In general (because we're not limited to text-to-image generative AI), I provide four versions / downloads:

- Text encoder only .safetensors.
- Full model .safetensors.
- State_dict pickle.
- Full model pickle (can be used as-is with "import clip" -> clip.load() after bypassing SHA checksum verification).

## The TEXT model has a modality gap of 0.80 (OpenAI pre-trained: 0.82).
- Trained with high temperature of 0.1 + tinkering.
- ImageNet/ObjectNet accuracy ~0.91 for both "SMOOTH" and "TEXT" models (pre-trained: ~0.84).
- The models (this plot = "TEXT" model on MSCOCO) are also golden retrievers: 🥰🐕

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/WiyuZLZVyjBTdPwHaVG_6.png)

----
## Update 11/AUG/2024:

New Best-Performing CLIP ViT-L/14 'GmP-smooth' model added (simply download the files named *BEST*!):

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/qb5hYNxSTMB5z7rSs7N9k.png)

Or just create a fine-tune yourself: [https://github.com/zer0int/CLIP-fine-tune](https://github.com/zer0int/CLIP-fine-tune)

How?
- Geometric Parametrization (GmP) (same as before)
- Activation Value manipulation for 'adverb neuron' (same as before)
- NEW: Custom loss function with label smoothing!
- For in-depth details, see my GitHub. 🤗

----

## A fine-tune of OpenAI / CLIP ViT-L/14 that has an unprecedented ImageNet/ObjectNet accuracy of ~0.90 (original pre-trained model / OpenAI's CLIP: ~0.85)**.

Made possible with Geometric Parametrization (GmP):

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.