jinaai

86 models • 15 total models in database
Sort by:

jina-embeddings-v3

jina-embeddings-v3: Multilingual Embeddings With Task LoRA `jina-embeddings-v3` is a multilingual multi-task text embedding model designed for a variety of NLP applications. Based on the Jina-XLM-RoBERTa architecture, this model supports Rotary Position Embeddings to handle long input sequences up to 8192 tokens. Additionally, it features 5 LoRA adapters to generate task-specific embeddings efficiently. Key Features: - Extended Sequence Length: Supports up to 8192 tokens with RoPE. - Task-Specific Embedding: Customize embeddings through the `task` argument with the following options: - `retrieval.query`: Used for query embeddings in asymmetric retrieval tasks - `retrieval.passage`: Used for passage embeddings in asymmetric retrieval tasks - `separation`: Used for embeddings in clustering and re-ranking applications - `classification`: Used for embeddings in classification tasks - `text-matching`: Used for embeddings in tasks that quantify similarity between two texts, such as STS or symmetric retrieval tasks - Matryoshka Embeddings: Supports flexible embedding sizes (`32, 64, 128, 256, 512, 768, 1024`), allowing for truncating embeddings to fit your application. Supported Languages: While the foundation model supports 100 languages, we've focused our tuning efforts on the following 30 languages: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu, and Vietnamese. > ⚠️ Important Notice: > We fixed a bug in the `encode` function #60 where Matryoshka embedding truncation occurred after normalization, leading to non-normalized truncated embeddings. This issue has been resolved in the latest code revision. > > If you have encoded data using the previous version and wish to maintain consistency, please use the specific code revision when loading the model: `AutoModel.frompretrained('jinaai/jina-embeddings-v3', coderevision='da863dd04a4e5dce6814c6625adfba87b83838aa', ...)` Mean pooling takes all token embeddings from the model's output and averages them at the sentence or paragraph level. This approach has been shown to produce high-quality sentence embeddings. We provide an `encode` function that handles this for you automatically. However, if you're working with the model directly, outside of the `encode` function, you'll need to apply mean pooling manually. Here's how you can do it: The easiest way to start using `jina-embeddings-v3` is with the Jina Embedding API. Alternatively, you can use `jina-embeddings-v3` directly via Transformers package: python import onnxruntime import numpy as np from transformers import AutoTokenizer, PretrainedConfig Mean pool function def meanpooling(modeloutput: np.ndarray, attentionmask: np.ndarray): tokenembeddings = modeloutput inputmaskexpanded = np.expanddims(attentionmask, axis=-1) inputmaskexpanded = np.broadcastto(inputmaskexpanded, tokenembeddings.shape) sumembeddings = np.sum(tokenembeddings inputmaskexpanded, axis=1) summask = np.clip(np.sum(inputmaskexpanded, axis=1), amin=1e-9, amax=None) return sumembeddings / summask Load tokenizer and model config tokenizer = AutoTokenizer.frompretrained('jinaai/jina-embeddings-v3') config = PretrainedConfig.frompretrained('jinaai/jina-embeddings-v3') Tokenize input inputtext = tokenizer('sample text', returntensors='np') ONNX session modelpath = 'jina-embeddings-v3/onnx/model.onnx' session = onnxruntime.InferenceSession(modelpath) Prepare inputs for ONNX model tasktype = 'text-matching' taskid = np.array(config.loraadaptations.index(tasktype), dtype=np.int64) inputs = { 'inputids': inputtext['inputids'], 'attentionmask': inputtext['attentionmask'], 'taskid': taskid } Apply mean pooling and normalization to the model outputs embeddings = meanpooling(outputs, inputtext["attentionmask"]) embeddings = embeddings / np.linalg.norm(embeddings, ord=2, axis=1, keepdims=True) bibtex @misc{sturua2024jinaembeddingsv3multilingualembeddingstask, title={jina-embeddings-v3: Multilingual Embeddings With Task LoRA}, author={Saba Sturua and Isabelle Mohr and Mohammad Kalim Akram and Michael Günther and Bo Wang and Markus Krimmel and Feng Wang and Georgios Mastrapas and Andreas Koukounas and Andreas Koukounas and Nan Wang and Han Xiao}, year={2024}, eprint={2409.10173}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2409.10173}, }

license:cc-by-nc-4.0
2,656,348
1,135

jina-bert-flash-implementation

To run the model on GPU, you need to install Flash Attention. You may either install from pypi (which may not work with fused-dense), or from source. To install from source, clone the GitHub repository: ```console git clone [email protected]:Dao-AILab/flash-attention.git ``` The code provided here should work with commit `43950dd`. Change to the cloned repo and install: ```console cd flash-attention && python setup.py install ``` This will compile the flash-attention kernel, which will take some ti

644,126
5

jina-reranker-v2-base-multilingual

--- pipeline_tag: text-ranking tags: - transformers - reranker - cross-encoder - transformers.js - sentence-transformers language: - multilingual inference: false license: cc-by-nc-4.0 library_name: transformers ---

license:cc-by-nc-4.0
595,211
323

jina-embeddings-v4

Jina Embeddings v4: Universal Embeddings for Multimodal Multilingual Retrieval Intended Usage & Model Info `jina-embeddings-v4` is a universal embedding model for multimodal and multilingual retrie...

277,376
497

jina-embeddings-v2-base-en

--- tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb datasets: - allenai/c4 language: en inference: false license: apache-2.0 model-index: - name: jina-embedding-b-en-v2 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 74.73134328358209 - type: ap value: 37.765427081831035 - type

license:apache-2.0
202,193
728

jina-embeddings-v5-text-small

license:cc-by-nc-4.0
200,323
141

jina-embeddings-v2-base-zh

--- tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - transformers - transformers.js inference: false license: apache-2.0 language: - en - zh model-index: - name: jina-embeddings-v2-base-zh results: - task: type: STS dataset: type: C-MTEB/AFQMC name: MTEB AFQMC config: default split: validation revision: None metrics: - type: cos_sim_pearson value: 48.51403119231363 - type: cos_sim_spearman value: 50.5928547846445 - type: euclidean_pearson value: 48.75043631055907

license:apache-2.0
200,107
245

jina-reranker-m0

--- pipeline_tag: text-classification tags: - vidore - reranker - qwen2_vl language: - multilingual base_model: - Qwen/Qwen2-VL-2B-Instruct inference: false license: cc-by-nc-4.0 library_name: transformers ---

NaNK
license:cc-by-nc-4.0
196,172
118

jina-embeddings-v5-text-small-retrieval

llama.cpp
136,958
9

jina-embeddings-v5-text-nano

license:cc-by-nc-4.0
136,618
57

jina-reranker-v3

license:cc-by-nc-4.0
121,873
111

jina-reranker-v1-turbo-en

license:apache-2.0
92,323
64

jina-embeddings-v2-base-code

<!-- TODO: add evaluation results here --> <br><br>

license:apache-2.0
90,156
124

jina-clip-v1

license:apache-2.0
83,914
257

jina-embeddings-v2-base-de

The easiest way to starting using `jina-embeddings-v2-base-de` is to use Jina AI's Embedding API. `jina-embeddings-v2-base-de` is a German/English bilingual text embedding model supporting 8192 sequence length. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. We have designed it for high performance in mono-lingual & cross-lingual applications and trained it specifically to support mixed German-English input without bias. Additionally, we provide the following embedding models: `jina-embeddings-v2-base-de` ist ein zweisprachiges Text Embedding Modell für Deutsch und Englisch, welches Texteingaben mit einer Länge von bis zu 8192 Token unterstützt. Es basiert auf der adaptierten Bert-Modell-Architektur JinaBERT, welche mithilfe einer symmetrische Variante von ALiBi längere Eingabetexte erlaubt. Wir haben, das Model für hohe Performance in einsprachigen und cross-lingual Anwendungen entwickelt und speziell darauf trainiert, gemischte deutsch-englische Eingaben ohne einen Bias zu kodieren. Des Weiteren stellen wir folgende Embedding-Modelle bereit: - `jina-embeddings-v2-small-en`: 33 million parameters. - `jina-embeddings-v2-base-en`: 137 million parameters. - `jina-embeddings-v2-base-zh`: 161 million parameters Chinese-English Bilingual embeddings. - `jina-embeddings-v2-base-de`: 161 million parameters German-English Bilingual embeddings (you are here). - [`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon). - `jina-embeddings-v2-base-code`: 161 million parameters code embeddings. The data and training details are described in this technical report. Please apply mean pooling when integrating the model. `mean poooling` takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an `encode` function to deal with this. However, if you would like to do it without using the default `encode` function: You can use Jina Embedding models directly from transformers package. If you only want to handle shorter sequence, such as 2k, pass the `maxlength` parameter to the `encode` function: Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well): 1. Managed SaaS: Get started with a free key on Jina AI's Embedding API. 2. Private and high-performance deployment: Get started by picking from our suite of models and deploy them on AWS Sagemaker. We evaluated our Bilingual model on all German and English evaluation tasks availble on the MTEB benchmark. In addition, we evaluated the models agains a couple of other German, English, and multilingual models on additional German evaluation tasks: > In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out. Join our Discord community and chat with other community members about ideas. If you find Jina Embeddings useful in your research, please cite the following paper:

license:apache-2.0
69,847
81

jina-clip-v2

--- base_model: - jinaai/xlm-roberta-flash-implementation language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - 'no' - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th

license:cc-by-nc-4.0
49,617
330

jina-colbert-v2

--- license: cc-by-nc-4.0 language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu -...

license:cc-by-nc-4.0
43,013
135

jina-embeddings-v2-small-en

The easiest way to starting using `jina-embeddings-v2-small-en` is to use Jina AI's Embedding API. `jina-embeddings-v2-small-en` is an English, monolingual embedding model supporting 8192 sequence length. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. The backbone `jina-bert-v2-small-en` is pretrained on the C4 dataset. The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process. The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi. This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc. This model has 33 million parameters, which enables lightning-fast and memory efficient inference, while still delivering impressive performance. Additionally, we provide the following embedding models: - `jina-embeddings-v2-small-en`: 33 million parameters (you are here). - `jina-embeddings-v2-base-en`: 137 million parameters. - `jina-embeddings-v2-base-zh`: 161 million parameters Chinese-English Bilingual embeddings. - `jina-embeddings-v2-base-de`: 161 million parameters German-English Bilingual embeddings. - [`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon). Please apply mean pooling when integrating the model. `mean poooling` takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an `encode` function to deal with this. However, if you would like to do it without using the default `encode` function: You can use Jina Embedding models directly from transformers package. If you only want to handle shorter sequence, such as 2k, pass the `maxlength` parameter to the `encode` function: The latest sentence-transformers also supports Jina embeddings: 1. Managed SaaS: Get started with a free key on Jina AI's Embedding API. 2. Private and high-performance deployment: Get started by picking from our suite of models and deploy them on AWS Sagemaker. > In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out. 1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese. 2. Multimodal embedding models enable Multimodal RAG applications. 3. High-performt rerankers. If you forgot to pass the `trustremotecode=True` flag when calling `AutoModel.frompretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized. This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model: Join our Discord community and chat with other community members about ideas. If you find Jina Embeddings useful in your research, please cite the following paper:

license:apache-2.0
27,362
140

jina-embeddings-v2-base-es

The easiest way to starting using `jina-embeddings-v2-base-es` is to use Jina AI's Embedding API. `jina-embeddings-v2-base-es` is a Spanish/English bilingual text embedding model supporting 8192 sequence length. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. We have designed it for high performance in mono-lingual & cross-lingual applications and trained it specifically to support mixed Spanish-English input without bias. Additionally, we provide the following embedding models: `jina-embeddings-v2-base-es` es un modelo (embedding) de texto bilingüe Inglés/Español que admite una longitud de secuencia de 8192. Se basa en la arquitectura BERT (JinaBERT) que incorpora la variante bi-direccional simétrica de ALiBi para permitir una mayor longitud de secuencia. Hemos diseñado este modelo para un alto rendimiento en aplicaciones monolingües y bilingües, y está entrenando específicamente para admitir entradas mixtas de español e inglés sin sesgo. Adicionalmente, proporcionamos los siguientes modelos (embeddings): - `jina-embeddings-v2-small-en`: 33 million parameters. - `jina-embeddings-v2-base-en`: 137 million parameters. - `jina-embeddings-v2-base-zh`: Chinese-English Bilingual embeddings. - `jina-embeddings-v2-base-de`: German-English Bilingual embeddings. - [`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (you are here). The data and training details are described in this technical report Please apply mean pooling when integrating the model. `mean pooling` takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an `encode` function to deal with this. However, if you would like to do it without using the default `encode` function: You can use Jina Embedding models directly from the `transformers` package: If you only want to handle shorter sequence, such as 2k, pass the `maxlength` parameter to the `encode` function: Or you can use the model with the `sentence-transformers` package: And if you only want to handle shorter sequence, such as 2k, then you can set the `model.maxseqlength` Alternatives to Transformers and Sentence Transformers 1. Managed SaaS: Get started with a free key on Jina AI's Embedding API. 2. Private and high-performance deployment: Get started by picking from our suite of models and deploy them on AWS Sagemaker. > In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out. 1. Bilingual embedding models supporting more European & Asian languages, including French, Italian and Japanese. 2. Multimodal embedding models enable Multimodal RAG applications. 3. High-performt rerankers. Join our Discord community and chat with other community members about ideas. If you find Jina Embeddings useful in your research, please cite the following paper:

license:apache-2.0
24,180
34

jina-embeddings-v5-text-nano-retrieval

llama.cpp
19,681
6

jina-embeddings-v4-vllm-retrieval

Jina Embeddings v4: Universal Embeddings for Multimodal Multilingual Retrieval This repository hosts a vLLM-compatible version of `jina-embeddings-v4` with the retrieval adapter merged into the base `Qwen2.5-VL` weights. This architecture modification enables native compatibility with vLLM without requiring custom adapter-handling code.

15,252
32

jina-embeddings-v5-text-nano-text-matching

llama.cpp
15,149
2

jina-embeddings-v5-text-nano-classification

llama.cpp
14,917
4

jina-embeddings-v5-text-nano-clustering

llama.cpp
14,264
3

jina-code-embeddings-0.5b

NaNK
license:cc-by-nc-4.0
13,526
16

jina-embeddings-v5-text-small-text-matching

llama.cpp
13,436
4

jina-colbert-v2-64

license:cc-by-nc-4.0
12,574
9

jina-embeddings-v5-text-small-clustering

llama.cpp
12,521
3

jina-embeddings-v5-text-small-classification

llama.cpp
11,180
2

ReaderLM-v2

license:cc-by-nc-4.0
8,455
724

jina-embeddings-v4-text-retrieval-GGUF

NaNK
6,957
22

jina-embeddings-v5-text-nano-retrieval-GGUF

llama.cpp
6,322
0

jina-reranker-v1-tiny-en

license:apache-2.0
5,998
22

jina-embeddings-v5-text-nano-text-matching-GGUF

llama.cpp
5,450
0

jina-embeddings-v5-text-nano-classification-GGUF

llama.cpp
5,090
0

jina-embeddings-v5-text-small-classification-GGUF

llama.cpp
5,015
0

jina-embeddings-v5-text-small-clustering-GGUF

llama.cpp
4,966
0

jina-embeddings-v5-text-nano-clustering-GGUF

llama.cpp
4,895
0

jina-embeddings-v5-text-small-retrieval-GGUF

llama.cpp
4,889
3

jina-code-embeddings-1.5b

Jina Code Embeddings: A Small but Performant Code Embedding Model Intended Usage & Model Info `jina-code-embeddings` is an embedding model for code retrieval. The model supports various types of code retrieval (text-to-code, code-to-code, code-to-text, code-to-completion) and technical question answering across 15+ programming languages. Built on Qwen/Qwen2.5-Coder-1.5B, `jina-code-embeddings-1.5b` features: - Multilingual support (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems. - Task-specific instruction prefixes for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time. - Flexible embedding size: dense embeddings are 1536-dimensional by default but can be truncated to as low as 128 with minimal performance loss. | Feature | Jina Code Embeddings 1.5B | |------------|------------| | Base Model | Qwen2.5-Coder-1.5B | | Supported Tasks | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` | | Model DType | BFloat 16 | | Max Sequence Length | 32768 | | Embedding Vector Dimension | 1536 | | Matryoshka dimensions | 128, 256, 512, 1024, 1536 | | Pooling Strategy | Last-token pooling | | Attention Mechanism | FlashAttention2 | Optional / Recommended - flash-attention: Installing flash-attention is recommended for improved inference speed and efficiency, but not mandatory. - sentence-transformers: If you want to use the model via the `sentence-transformers` interface, install this package as well. Please refer to our technical report of jina-code-embeddings for training details and benchmarks. If you find it useful in your research, please cite the following paper: Join our Discord community and chat with other community members about ideas.

NaNK
license:cc-by-nc-4.0
4,866
27

jina-embeddings-v5-text-small-text-matching-GGUF

llama.cpp
4,827
1

jina-embedding-b-en-v1

`jina-embedding-b-en-v1` is a language model that has been trained using Jina AI's Linnaeus-Clean dataset. This dataset consists of 380 million pairs of sentences, which include both query-document pairs. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process. The Linnaeus-Full dataset, from which the Linnaeus-Clean dataset is derived, originally contained 1.6 billion sentence pairs. The model has a range of use cases, including information retrieval, semantic textual similarity, text reranking, and more. With a standard size of 110 million parameters, the model enables fast inference while delivering better performance than our small model. It is recommended to use a single GPU for inference. Additionally, we provide the following options: - `jina-embedding-t-en-v1`: 14 million parameters. - `jina-embedding-s-en-v1`: 35 million parameters - `jina-embedding-b-en-v1`: 110 million parameters (you are here). - `jina-embedding-l-en-v1`: 330 million parameters. - `jina-embedding-1b-en-v1`: 1.2 billion parameters, 10 times bert-base (soon). - `jina-embedding-6b-en-v1`: 6 billion parameters, 30 times bert-base (soon). We compared the model against `all-minilm-l6-v2`/`all-mpnet-base-v2` from sbert and `text-embeddings-ada-002` from OpenAI: |Name|param |dimension| |------------------------------|-----|------| |all-minilm-l6-v2|23m |384| |all-mpnet-base-v2 |110m |768| |ada-embedding-002|Unknown/OpenAI API |1536| |jina-embedding-t-en-v1|14m |312| |jina-embedding-s-en-v1|35m |512| |jina-embedding-b-en-v1|110m |768| |jina-embedding-l-en-v1|330m |1024| |Name|STS12|STS13|STS14|STS15|STS16|STS17|TRECOVID|Quora|SciFact| |------------------------------|-----|-----|-----|-----|-----|-----|--------|-----|-----| |all-minilm-l6-v2|0.724|0.806|0.756|0.854|0.79 |0.876|0.473 |0.876|0.645 | |all-mpnet-base-v2|0.726|0.835|0.78 |0.857|0.8 |0.906|0.513 |0.875|0.656 | |ada-embedding-002|0.698|0.833|0.761|0.861|0.86 |0.903|0.685 |0.876|0.726 | |jina-embedding-t-en-v1|0.717|0.773|0.731|0.829|0.777|0.860|0.482 |0.840|0.522 | |jina-embedding-s-en-v1|0.743|0.786|0.738|0.837|0.80|0.875|0.523 |0.857|0.524 | |jina-embedding-b-en-v1|0.751|0.809|0.761|0.856|0.812|0.890|0.606 |0.876|0.594 | |jina-embedding-l-en-v1|0.745|0.832|0.781|0.869|0.837|0.902|0.573 |0.881|0.598 | 1. The development of `jina-embedding-s-en-v2` is currently underway with two main objectives: improving performance and increasing the maximum sequence length. 2. We are currently working on a bilingual embedding model that combines English and X language. The upcoming model will be called `jina-embedding-s/b/l-de-v1`. Join our Discord community and chat with other community members about ideas. If you find Jina Embeddings useful in your research, please cite the following paper:

license:apache-2.0
4,720
8

jina-embeddings-v4-text-code-GGUF

A collection of GGUF and quantizations for `jina-embeddings-v4`. > [!IMPORTANT] > We highly recommend to first read this blog post for more technical details and customized llama.cpp build. > [!TIP] > Multimodal v4-GGUF is now available, check out this blog post for the walkthrough. `jina-embeddings-v4` is a cutting-edge universal embedding model for multimodal multilingual retrieval. It's based on `qwen2.5-vl-3b-instruct` with three LoRA adapters: `retrieval` (optimized for retrieval tasks), `text-matching` (optimized for sentence similarity tasks), and `code` (optimized for code retrieval tasks). It is also heavily trained for visual document retrieval and late-interaction style multi-vector output. We removed the visual components of `qwen2.5-vl` and merged all LoRA adapters back into the base language model. This results in three task-specific v4 models with 3.09B parameters, downsized from the original jina-embeddings-v4 3.75B parameters: | HuggingFace Repo | Task | |---|---| | `jinaai/jina-embeddings-v4-text-retrieval-GGUF` | Text retrieval | | `jinaai/jina-embeddings-v4-text-code-GGUF` | Code retrieval | | `jinaai/jina-embeddings-v4-text-matching-GGUF` | Sentence similarity | All models above provide F16, Q80, Q6K, Q5KM, Q4KM, Q3KM and dynamic quantizations such as IQ1S, IQ2XXS. Limitations vs original v4 model - They can not handle image input. - They can not output multi-vector embeddings. - You must add `Query: ` or `Passage: ` in front of the input. Check this table for the details. We forked llama.cpp and make it work with image input and embedding output. Check out this new blog post for the walkthrough. Run `llama-server` to host the embedding model as OpenAI API compatible HTTP server. As an example for using `text-matching` with `F16`, you can do: Remarks: - `--pooling mean` is required as v4 is mean-pooling embeddings. - setting `--pooling none` is not as same as the multi-vector embeddings of v4. The original v4 has a trained MLP on top of the last hidden states to output multi-vector embeddings, each has 128-dim. In GGUF, this MLP was chopped off. Note: When using `retrieval` and `code` models, add `Query: ` or `Passage:` in front of your input, like this: You can also use `llama-embedding` for one-shot embedding: To get fully consistent results as if you were using `AutoModel.frompretrained("jinaai/jina-embeddings-v4")...`, you need to be very careful about the prefixes and manually add them to your GGUF model inputs. Here's a reference table: | Input Type | Task | `promptname` (Role) | Actual Input Processed by Model | |------------|------|-------------|-------------------------------| | Text | `retrieval` | `query` (default) | `Query: {originaltext}` | | Text | `retrieval` | `passage` | `Passage: {originaltext}` | | Text | `text-matching` | `query` (default) | `Query: {originaltext}` | | Text | `text-matching` | `passage` | `Query: {originaltext}` ⚠️ | | Text | `code` | `query` (default) | `Query: {originaltext}` | | Text | `code` | `passage` | `Passage: {originaltext}` | | Image | Any task | N/A | ` user\n \ \ Describe the image.\ ` | To some users, ⚠️ indicates a somewhat surprising behavior where `promptname='passage'` gets overridden to `"Query: "` when using `text-matching` in the original `AutoModel.frompretrained("jinaai/jina-embeddings-v4")....` However, this is reasonable since `text-matching` is a sentence similarity task with no left/right roles—the inputs are symmetric. Note, v4 is trained with Matryoshka embeddings, and converting to GGUF doesn't break the Matryoshka feature. Let's say you get embeddings with shape `NxD` - you can simply use `embeddings[:, :truncatedim]` to get smaller truncated embeddings. Note that not every dimension is trained though. For v4, you can set `truncatedim` to any of these values: `[128, 256, 512, 1024, 2048]`. We use `llama-quantize` with `imatrix` to quantize models from float16. `imatrix` is generated by `llama-imatrix -m jina-embeddings-v4-text-retrieval-F16.gguf -f calibrationdatav5rc.txt -ngl 99 --no-ppl -o imatrix-retrieval-512.dat`. `calibrationdatav5rc.txt` can be found here and is recommended by Unsloth docs. Here's the speed and quality evaluation on two nano benchmarks. The higher the better. `IQ3S` seems to be a good balance between size and speed. Table 1: Tokens per Second on NanoHotpotQA `Documents` | Quantization | BPW | File Size (GB) | Peak VRAM (GB) | Token/s w FA | Token/s w/o FA | |------------------|-----------|-----|-----------|--------------|----------------| | IQ1S | 2.04 | 0.73 | 4.04 | 3625 | 2050 | | IQ1M | 2.19 | 0.79 | 4.09 | 3349 | 1997 | | IQ2XXS | 2.44 | 0.88 | 4.19 | 3701 | 2071 | | IQ2M | 2.94 | 1.06 | 4.37 | 3407 | 1989 | | Q2K | 3.29 | 1.18 | 4.49 | 3173 | 1905 | | IQ3XXS | 3.31 | 1.19 | 4.50 | 3668 | 2067 | | IQ3XS | 3.59 | 1.29 | 4.60 | 3604 | 2053 | | IQ3S | 3.76 | 1.35 | 4.66 | 3599 | 2049 | | IQ3M | 3.84 | 1.38 | 4.69 | 3603 | 2053 | | Q3KM | 4.11 | 1.48 | 4.78 | 3450 | 2008 | | IQ4NL | 4.72 | 1.69 | 5.00 | 3571 | 2039 | | IQ4XS | 4.49 | 1.61 | 4.92 | 3585 | 2046 | | Q4KM | 4.99 | 1.79 | 5.10 | 3558 | 2045 | | Q5KS | 5.61 | 2.02 | 5.32 | 3567 | 2044 | | Q5KM | 5.75 | 2.07 | 5.38 | 3528 | 2034 | | Q6K | 6.56 | 2.36 | 5.66 | 3334 | 1981 | | Q80 | 8.50 | 3.05 | 6.36 | 3767 | 2101 | | F16 | 16.00 | 5.75 | 9.70 | 3399 | 2023 | | v3 (Transformers) | 16.00 | 1.10 | 2.82 | | 16505 | | v4 (Transformers) | 16.00 | 7.40 | 14.45 | | 1865 | Table 2: NDCG@5 | Quantization | NanoHotpotQA | NanoFiQA2018 | NanoArguAna | NanoNFCorpus | NanoSciFact | Δ to v3 (HotpotQA) | Δ to v4 (HotpotQA) | Δ to v3 (FiQA2018) | Δ to v4 (FiQA2018) | Δ to v3 (ArguAna) | Δ to v4 (ArguAna) | Δ to v3 (NFCorpus) | Δ to v4 (NFCorpus) | Δ to v3 (SciFact) | Δ to v4 (SciFact) | |------------------|--------------|--------------|-------------|--------------|-------------|-------------------|-------------------|-------------------|-------------------|------------------|------------------|-------------------|-------------------|------------------|------------------| | IQ1S | 0.6369 | 0.3178 | 0.3798 | 0.2933 | 0.5934 | -14% | -20% | -38% | -43% | -17% | -22% | -28% | -33% | -24% | -25% | | IQ1M | 0.6316 | 0.3313 | 0.5167 | 0.3256 | 0.6114 | -15% | -21% | -36% | -41% | +12% | +7% | -20% | -25% | -22% | -23% | | IQ2XXS | 0.7236 | 0.4582 | 0.4584 | 0.4067 | 0.7392 | -2% | -9% | -11% | -18% | -0% | -5% | -0% | -7% | -5% | -7% | | IQ2M | 0.7427 | 0.5869 | 0.5090 | 0.4468 | 0.7880 | +0% | -7% | +14% | +5% | +11% | +5% | +10% | +3% | +1% | -1% | | Q2K | 0.7683 | 0.5744 | 0.5168 | 0.4183 | 0.7546 | +4% | -4% | +12% | +3% | +12% | +7% | +3% | -4% | -4% | -5% | | IQ3XXS | 0.7780 | 0.5991 | 0.4811 | 0.4267 | 0.7610 | +5% | -2% | +16% | +8% | +5% | -1% | +5% | -2% | -3% | -4% | | IQ3XS | 0.7727 | 0.5615 | 0.5195 | 0.4439 | 0.7726 | +5% | -3% | +9% | +1% | +13% | +7% | +9% | +2% | -1% | -3% | | IQ3S | 0.8002 | 0.5505 | 0.4886 | 0.4381 | 0.7690 | +8% | +0% | +7% | -1% | +6% | +1% | +8% | +1% | -2% | -3% | | IQ3M | 0.8106 | 0.5387 | 0.5091 | 0.4462 | 0.7760 | +10% | +2% | +5% | -3% | +11% | +5% | +10% | +3% | -1% | -3% | | Q3KM | 0.7567 | 0.5267 | 0.4486 | 0.4092 | 0.7775 | +2% | -5% | +2% | -5% | -2% | -7% | +1% | -6% | -1% | -2% | | IQ4NL | 0.7930 | 0.5598 | 0.4911 | 0.4285 | 0.7794 | +7% | -1% | +9% | +0% | +7% | +1% | +5% | -2% | -0% | -2% | | IQ4XS | 0.7979 | 0.5627 | 0.4947 | 0.4258 | 0.7789 | +8% | +0% | +9% | +1% | +8% | +2% | +5% | -2% | -0% | -2% | | Q4KM | 0.8029 | 0.5569 | 0.4883 | 0.4226 | 0.7877 | +9% | +1% | +8% | +0% | +6% | +1% | +4% | -3% | +1% | -1% | | Q5KS | 0.7969 | 0.5581 | 0.4721 | 0.4288 | 0.7842 | +8% | +0% | +8% | +0% | +3% | -3% | +5% | -1% | +0% | -2% | | Q5KM | 0.7927 | 0.5601 | 0.4745 | 0.4247 | 0.7873 | +7% | -1% | +9% | +1% | +3% | -2% | +4% | -2% | +1% | -1% | | Q6K | 0.7951 | 0.5636 | 0.4822 | 0.4337 | 0.7846 | +8% | +0% | +10% | +1% | +5% | -0% | +7% | -0% | +0% | -1% | | Q80 | 0.7938 | 0.5687 | 0.4784 | 0.4335 | 0.7851 | +7% | +0% | +11% | +2% | +4% | -1% | +7% | -0% | +0% | -1% | | F16 | 0.7940 | 0.5610 | 0.4931 | 0.4343 | 0.7963 | +7% | +0% | +9% | +1% | +7% | +2% | +7% | -0% | +2% | +0% | | v3 (Transformers) | 0.7393 | 0.5144 | 0.4600 | 0.4068 | 0.7820 | +0% | -7% | +0% | -8% | +0% | -5% | +0% | -6% | +0% | -2% | | v4 (Transformers) | 0.7977 | 0.5571 | 0.4844 | 0.4351 | 0.7963 | +8% | +0% | +8% | +0% | +5% | +0% | +7% | +0% | +2% | +0% |

NaNK
2,817
7

jina-embeddings-v4-text-matching-GGUF

A collection of GGUF and quantizations for `jina-embeddings-v4`. > [!IMPORTANT] > We highly recommend to first read this blog post for more technical details and customized llama.cpp build. > [!TIP] > Multimodal v4-GGUF is now available, check out this blog post for the walkthrough. `jina-embeddings-v4` is a cutting-edge universal embedding model for multimodal multilingual retrieval. It's based on `qwen2.5-vl-3b-instruct` with three LoRA adapters: `retrieval` (optimized for retrieval tasks), `text-matching` (optimized for sentence similarity tasks), and `code` (optimized for code retrieval tasks). It is also heavily trained for visual document retrieval and late-interaction style multi-vector output. We removed the visual components of `qwen2.5-vl` and merged all LoRA adapters back into the base language model. This results in three task-specific v4 models with 3.09B parameters, downsized from the original jina-embeddings-v4 3.75B parameters: | HuggingFace Repo | Task | |---|---| | `jinaai/jina-embeddings-v4-text-retrieval-GGUF` | Text retrieval | | `jinaai/jina-embeddings-v4-text-code-GGUF` | Code retrieval | | `jinaai/jina-embeddings-v4-text-matching-GGUF` | Sentence similarity | All models above provide F16, Q80, Q6K, Q5KM, Q4KM, Q3KM and dynamic quantizations such as IQ1S, IQ2XXS. Limitations vs original v4 model - They can not handle image input. - They can not output multi-vector embeddings. - You must add `Query: ` or `Passage: ` in front of the input. Check this table for the details. We forked llama.cpp and make it work with image input and embedding output. Check out this new blog post for the walkthrough. Run `llama-server` to host the embedding model as OpenAI API compatible HTTP server. As an example for using `text-matching` with `F16`, you can do: Remarks: - `--pooling mean` is required as v4 is mean-pooling embeddings. - setting `--pooling none` is not as same as the multi-vector embeddings of v4. The original v4 has a trained MLP on top of the last hidden states to output multi-vector embeddings, each has 128-dim. In GGUF, this MLP was chopped off. Note: When using `retrieval` and `code` models, add `Query: ` or `Passage:` in front of your input, like this: You can also use `llama-embedding` for one-shot embedding: To get fully consistent results as if you were using `AutoModel.frompretrained("jinaai/jina-embeddings-v4")...`, you need to be very careful about the prefixes and manually add them to your GGUF model inputs. Here's a reference table: | Input Type | Task | `promptname` (Role) | Actual Input Processed by Model | |------------|------|-------------|-------------------------------| | Text | `retrieval` | `query` (default) | `Query: {originaltext}` | | Text | `retrieval` | `passage` | `Passage: {originaltext}` | | Text | `text-matching` | `query` (default) | `Query: {originaltext}` | | Text | `text-matching` | `passage` | `Query: {originaltext}` ⚠️ | | Text | `code` | `query` (default) | `Query: {originaltext}` | | Text | `code` | `passage` | `Passage: {originaltext}` | | Image | Any task | N/A | ` user\n \ \ Describe the image.\ ` | To some users, ⚠️ indicates a somewhat surprising behavior where `promptname='passage'` gets overridden to `"Query: "` when using `text-matching` in the original `AutoModel.frompretrained("jinaai/jina-embeddings-v4")....` However, this is reasonable since `text-matching` is a sentence similarity task with no left/right roles—the inputs are symmetric. Note, v4 is trained with Matryoshka embeddings, and converting to GGUF doesn't break the Matryoshka feature. Let's say you get embeddings with shape `NxD` - you can simply use `embeddings[:, :truncatedim]` to get smaller truncated embeddings. Note that not every dimension is trained though. For v4, you can set `truncatedim` to any of these values: `[128, 256, 512, 1024, 2048]`. We use `llama-quantize` with `imatrix` to quantize models from float16. `imatrix` is generated by `llama-imatrix -m jina-embeddings-v4-text-retrieval-F16.gguf -f calibrationdatav5rc.txt -ngl 99 --no-ppl -o imatrix-retrieval-512.dat`. `calibrationdatav5rc.txt` can be found here and is recommended by Unsloth docs. Here's the speed and quality evaluation on two nano benchmarks. The higher the better. `IQ3S` seems to be a good balance between size and speed. Table 1: Tokens per Second on NanoHotpotQA `Documents` | Quantization | BPW | File Size (GB) | Peak VRAM (GB) | Token/s w FA | Token/s w/o FA | |------------------|-----------|-----|-----------|--------------|----------------| | IQ1S | 2.04 | 0.73 | 4.04 | 3625 | 2050 | | IQ1M | 2.19 | 0.79 | 4.09 | 3349 | 1997 | | IQ2XXS | 2.44 | 0.88 | 4.19 | 3701 | 2071 | | IQ2M | 2.94 | 1.06 | 4.37 | 3407 | 1989 | | Q2K | 3.29 | 1.18 | 4.49 | 3173 | 1905 | | IQ3XXS | 3.31 | 1.19 | 4.50 | 3668 | 2067 | | IQ3XS | 3.59 | 1.29 | 4.60 | 3604 | 2053 | | IQ3S | 3.76 | 1.35 | 4.66 | 3599 | 2049 | | IQ3M | 3.84 | 1.38 | 4.69 | 3603 | 2053 | | Q3KM | 4.11 | 1.48 | 4.78 | 3450 | 2008 | | IQ4NL | 4.72 | 1.69 | 5.00 | 3571 | 2039 | | IQ4XS | 4.49 | 1.61 | 4.92 | 3585 | 2046 | | Q4KM | 4.99 | 1.79 | 5.10 | 3558 | 2045 | | Q5KS | 5.61 | 2.02 | 5.32 | 3567 | 2044 | | Q5KM | 5.75 | 2.07 | 5.38 | 3528 | 2034 | | Q6K | 6.56 | 2.36 | 5.66 | 3334 | 1981 | | Q80 | 8.50 | 3.05 | 6.36 | 3767 | 2101 | | F16 | 16.00 | 5.75 | 9.70 | 3399 | 2023 | | v3 (Transformers) | 16.00 | 1.10 | 2.82 | | 16505 | | v4 (Transformers) | 16.00 | 7.40 | 14.45 | | 1865 | Table 2: NDCG@5 | Quantization | NanoHotpotQA | NanoFiQA2018 | NanoArguAna | NanoNFCorpus | NanoSciFact | Δ to v3 (HotpotQA) | Δ to v4 (HotpotQA) | Δ to v3 (FiQA2018) | Δ to v4 (FiQA2018) | Δ to v3 (ArguAna) | Δ to v4 (ArguAna) | Δ to v3 (NFCorpus) | Δ to v4 (NFCorpus) | Δ to v3 (SciFact) | Δ to v4 (SciFact) | |------------------|--------------|--------------|-------------|--------------|-------------|-------------------|-------------------|-------------------|-------------------|------------------|------------------|-------------------|-------------------|------------------|------------------| | IQ1S | 0.6369 | 0.3178 | 0.3798 | 0.2933 | 0.5934 | -14% | -20% | -38% | -43% | -17% | -22% | -28% | -33% | -24% | -25% | | IQ1M | 0.6316 | 0.3313 | 0.5167 | 0.3256 | 0.6114 | -15% | -21% | -36% | -41% | +12% | +7% | -20% | -25% | -22% | -23% | | IQ2XXS | 0.7236 | 0.4582 | 0.4584 | 0.4067 | 0.7392 | -2% | -9% | -11% | -18% | -0% | -5% | -0% | -7% | -5% | -7% | | IQ2M | 0.7427 | 0.5869 | 0.5090 | 0.4468 | 0.7880 | +0% | -7% | +14% | +5% | +11% | +5% | +10% | +3% | +1% | -1% | | Q2K | 0.7683 | 0.5744 | 0.5168 | 0.4183 | 0.7546 | +4% | -4% | +12% | +3% | +12% | +7% | +3% | -4% | -4% | -5% | | IQ3XXS | 0.7780 | 0.5991 | 0.4811 | 0.4267 | 0.7610 | +5% | -2% | +16% | +8% | +5% | -1% | +5% | -2% | -3% | -4% | | IQ3XS | 0.7727 | 0.5615 | 0.5195 | 0.4439 | 0.7726 | +5% | -3% | +9% | +1% | +13% | +7% | +9% | +2% | -1% | -3% | | IQ3S | 0.8002 | 0.5505 | 0.4886 | 0.4381 | 0.7690 | +8% | +0% | +7% | -1% | +6% | +1% | +8% | +1% | -2% | -3% | | IQ3M | 0.8106 | 0.5387 | 0.5091 | 0.4462 | 0.7760 | +10% | +2% | +5% | -3% | +11% | +5% | +10% | +3% | -1% | -3% | | Q3KM | 0.7567 | 0.5267 | 0.4486 | 0.4092 | 0.7775 | +2% | -5% | +2% | -5% | -2% | -7% | +1% | -6% | -1% | -2% | | IQ4NL | 0.7930 | 0.5598 | 0.4911 | 0.4285 | 0.7794 | +7% | -1% | +9% | +0% | +7% | +1% | +5% | -2% | -0% | -2% | | IQ4XS | 0.7979 | 0.5627 | 0.4947 | 0.4258 | 0.7789 | +8% | +0% | +9% | +1% | +8% | +2% | +5% | -2% | -0% | -2% | | Q4KM | 0.8029 | 0.5569 | 0.4883 | 0.4226 | 0.7877 | +9% | +1% | +8% | +0% | +6% | +1% | +4% | -3% | +1% | -1% | | Q5KS | 0.7969 | 0.5581 | 0.4721 | 0.4288 | 0.7842 | +8% | +0% | +8% | +0% | +3% | -3% | +5% | -1% | +0% | -2% | | Q5KM | 0.7927 | 0.5601 | 0.4745 | 0.4247 | 0.7873 | +7% | -1% | +9% | +1% | +3% | -2% | +4% | -2% | +1% | -1% | | Q6K | 0.7951 | 0.5636 | 0.4822 | 0.4337 | 0.7846 | +8% | +0% | +10% | +1% | +5% | -0% | +7% | -0% | +0% | -1% | | Q80 | 0.7938 | 0.5687 | 0.4784 | 0.4335 | 0.7851 | +7% | +0% | +11% | +2% | +4% | -1% | +7% | -0% | +0% | -1% | | F16 | 0.7940 | 0.5610 | 0.4931 | 0.4343 | 0.7963 | +7% | +0% | +9% | +1% | +7% | +2% | +7% | -0% | +2% | +0% | | v3 (Transformers) | 0.7393 | 0.5144 | 0.4600 | 0.4068 | 0.7820 | +0% | -7% | +0% | -8% | +0% | -5% | +0% | -6% | +0% | -2% | | v4 (Transformers) | 0.7977 | 0.5571 | 0.4844 | 0.4351 | 0.7963 | +8% | +0% | +8% | +0% | +5% | +0% | +7% | +0% | +2% | +0% |

NaNK
2,640
5

jina-embeddings-v3-hf

license:cc-by-nc-4.0
2,301
3

jina-embedding-s-en-v1

license:apache-2.0
1,543
26

jina-reranker-v3-GGUF

GGUF quantizations of jina-reranker-v3 using llama.cpp. A 0.6B parameter multilingual listwise reranker quantized for efficient inference. - Python 3.8+ - llama.cpp binaries (`llama-embedding` and `llama-tokenize`) - Hanxiao's llama.cpp fork recommended: https://github.com/hanxiao/llama.cpp - `jina-reranker-v3-BF16.gguf` - Quantized model weights (BF16, 1.1GB) - `projector.safetensors` - MLP projector weights (3MB) - `rerank.py` - Reranker implementation `GGUFReranker.rerank(query, documents, topn=None, returnembeddings=False, instruction=None)` Arguments: - `query` (str): Search query - `documents` (List[str]): Documents to rerank - `topn` (int, optional): Return only top N results - `returnembeddings` (bool): Include embeddings in output - `instruction` (str, optional): Custom ranking instruction Returns: List of dicts with keys: `index`, `relevancescore`, `document`, and optionally `embedding` If you find `jina-reranker-v3` useful in your research, please cite the original paper: This MLX implementation follows the same CC BY-NC 4.0 license as the original model. For commercial usage inquiries, please contact Jina AI.

llama.cpp
1,489
7

jina-code-embeddings-0.5b-GGUF

The GGUF version of the code embedding model trained by Jina AI . Jina Code Embeddings: A Small but Performant Code Embedding Model `jina-code-embeddings-0.5b-GGUF` is the GGUF export of our jina-code-embeddings-0.5b, built on Qwen/Qwen2.5-Coder-0.5B. The model supports code retrieval and technical QA across 15+ programming languages and multiple domains, including web development, software development, machine learning, data science, and educational coding problems. Key Features | Feature | Jina Code Embeddings 0.5B GGUF | |------------------------|--------------------------------| | Base Model | Qwen2.5-Coder-0.5B | | Supported Tasks | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` | | Max Sequence Length | 32768 (recommended ≤ 8192) | | Embedding Vector Dim | 896 | | Matryoshka Dimensions | 64, 128, 256, 512, 896 (client-side slice) | | Pooling Strategy | MUST use `--pooling last` (EOS) | > Matryoshka note: `llama.cpp` always returns 896-d embeddings for this model. To use 64/128/256/512, slice client-side (e.g., take the first k elements). Use the appropriate prefix for queries and passages at inference time. Follow the official instructions: https://github.com/ggml-org/llama.cpp Hugging Face repo (GGUF): https://huggingface.co/jinaai/jina-code-embeddings-0.5b-GGUF Pick a file (e.g., `jina-code-embeddings-0.5b-F16.gguf`). You can either: auto-download by passing the repo and file directly to `llama.cpp` use a local path with `-m` > Tips: `-ngl ` to offload layers to GPU. Max context is 32768 but stick to `--ubatch-size` ≤ 8192 for best results. See our technical report: https://arxiv.org/abs/2508.21290

NaNK
license:cc-by-nc-4.0
1,428
3

jina-code-embeddings-1.5b-GGUF

NaNK
license:cc-by-nc-4.0
1,096
13

jina-vlm

license:cc-by-nc-4.0
876
114

jina-reranker-m0-GGUF

jina-reranker-m0-gguf `jina-reranker-m0` is a cutting-edge multimodal, multilingual reranker for text, code, image and visual document reranking. Check out its features and benchmarks here. We offer `jinaai/jina-reranker-m0-GGUF` with various quantization levels on HuggingFace. This repo covers how to use them and how they’re built. Using GGUFs is a bit different than `AutoModel.frompretrained` API. In short, three steps: (1) write prompt (2) get last embedding (3) map embedding to the score. To use these GGUFs, follow these three steps: 1. Construct your `(QUERY, DOCUMENT)` pair as a prompt: Refer to `test.txt` for the corret batch construction. Note how `\n` is NOT the instance separator, but ` ` is. 1. Get the last embedding for this prompt using `llama.cpp` or `llama-embedding`. You can change `F16` to other quantizations: - Get a single embedding: Due to jina-reranker-m0's design, you must use --pooling last --embd-normalize -1. Also, add --embd-separator " "—llama-embedding defaults to \n as the separator, which breaks multiline docs/queries, so swap it for or something similar. 1. Feed `lastembeddings` into a predefined MLP to get the relevance score. How GGUF Was Built `jina-reranker-m0` builds on `Qwen/Qwen2-VL-2B`. But two quirks make it trickier: First, the model uses `tokenid=100` as a scoring token at the end of each (query, document) pair to trigger "the scoring state". This token was arbitrarily picked during m0's training, which complicates things for GGUF users familiar with string-level inputs, as it doesn’t play nice with BPE tokenizers. Our fix is that we swapped `100` with ` : 151649` in the tokenizer before building GGUFs. So, you’ll need to append ` ` to each (`QUERY`, `DOCUMENT`) pair like this: Second, the scoring MLP isn’t included in the GGUF because `llama.cpp` doesn’t support it well. Instead, we dump the MLP into a separate `mlpweights.npz` file. This MLP is a simple two-layer setup with ReLU activation, mapping the last hidden state of ` ` from 1536 dimensions to a single score. To use it in Python, load and reconstruct the MLP like this: This MLP is lightweight and can easily be moved to a GPU if needed. Final rerank scores from `jina-reranker-m0-GGUF` are calculated as `mlp(lastembeddings)`.

NaNK
license:cc-by-nc-4.0
860
13

jina-embeddings-v4-vllm-code

785
3

jina-embedding-t-en-v1

license:apache-2.0
762
30

text-seg-lm-qwen2-0.5b-cot-topic-chunking

NaNK
675
11

Reader Lm 1.5b

A new version of this model has been released! ReaderLM-v2! Jina Reader-LM is a series of models that convert HTML content to Markdown content, which is useful for content conversion tasks. The model is trained on a curated collection of HTML content and its corresponding Markdown content. | Name | Context Length | Download | |-----------------|-------------------|-----------------------------------------------------------------------| | reader-lm-0.5b | 256K | 🤗 Hugging Face | | reader-lm-1.5b | 256K | 🤗 Hugging Face | | | On Google Colab The easiest way to experience reader-lm is by running our Colab notebook, where we demonstrate how to use reader-lm-1.5b to convert the HackerNews website into markdown. The notebook is optimized to run smoothly on Google Colab’s free T4 GPU tier. You can also load reader-lm-0.5b or change the URL to any website and explore the output. Note that the input (i.e., the prompt) to the model is the raw HTML—no prefix instruction is required. To use this model, you need to install `transformers`: AWS Sagemaker & Azure Marketplace AWS 0.5b AWS 1.5b Azure 0.5b Azure 1.5b

NaNK
license:cc-by-nc-4.0
535
608

jina-colbert-v1-en

Jina-ColBERT is a ColBERT-style model but based on JinaBERT so it can support both 8k context length, fast and accurate retrieval. JinaBERT is a BERT architecture that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. The Jina-ColBERT model is trained on MSMARCO passage ranking dataset, following a very similar training procedure with ColBERTv2. The only difference is that we use `jina-bert-v2-base-en` as the backbone instead of `bert-base-uncased`. For more information about ColBERT, please refer to the ColBERTv1 and ColBERTv2 paper, and the original code. To use this model, you will need to install the latest version of the ColBERT repository: TL;DR: Our Jina-ColBERT achieves the competitive retrieval performance with ColBERTv2 on all benchmarks, and outperforms ColBERTv2 on datasets in where documents have longer context length. We evaluate the in-domain performance on the dev subset of MSMARCO passage ranking dataset. We follow the same evaluation settings in the ColBERTv2 paper and rerun the results of ColBERTv2 using the released checkpoint. | Model | MRR@10 | Recall@50 | Recall@1k | | --- | :---: | :---: | :---: | | ColBERTv2 | 39.7 | 86.8 | 97.6 | | Jina-ColBERT-v1 | 39.0 | 85.6 | 96.2 | Following ColBERTv2, we evaluate the out-of-domain performance on 13 public BEIR datasets and use NDCG@10 as the main metric. We follow the same evaluation settings in the ColBERTv2 paper and rerun the results of ColBERTv2 using the released checkpoint. Note that both ColBERTv2 and Jina-ColBERT-v1 only employ MSMARCO passage ranking dataset for training, so below results are the fully zero-shot performance. | dataset | ColBERTv2 | Jina-ColBERT-v1 | | --- | :---: | :---: | | ArguAna | 46.5 | 49.4 | | ClimateFEVER | 18.1 | 19.6 | | DBPedia | 45.2 | 41.3 | | FEVER | 78.8 | 79.5 | | FiQA | 35.4 | 36.8 | | HotPotQA | 67.5 | 65.6 | | NFCorpus | 33.7 | 33.8 | | NQ | 56.1 | 54.9 | | Quora | 85.5 | 82.3 | | SCIDOCS | 15.4 | 16.9 | | SciFact | 68.9 | 70.1 | | TREC-COVID | 72.6 | 75.0 | | Webis-touché2020 | 26.0 | 27.0 | | Average | 50.0 | 50.2 | We also evaluate the zero-shot performance on datasets where documents have longer context length and compare with some long-context embedding models. Here we use the LoCo benchmark, which contains 5 datasets with long context length. | Model | Used context length | Model max context length | Avg. NDCG@10 | | --- | :---: | :---: | :---: | | ColBERTv2 | 512 | 512 | 74.3 | | Jina-ColBERT-v1 (truncated) | 512 | 8192 | 75.5 | | Jina-ColBERT-v1 | 8192 | 8192 | 83.7 | | Jina-embeddings-v2-base-en | 8192 | 8192 | 85.4 | \ denotes that we truncate the context length to 512 for documents. The context length of queries is all 512. To summarize, Jina-ColBERT achieves the comparable retrieval performance with ColBERTv2 on all benchmarks, and outperforms ColBERTv2 on datasets in where documents have longer context length. We evaluate the reranking performance of ColBERTv2 and Jina-ColBERT on BEIR. We use BM25 as the first-stage retrieval model. The full evaluation code can be found in this repo. In summary, Jina-ColBERT outperforms ColBERTv2, even achieving comparable performance with some cross-encoder. The best model, jina-reranker, will be open-sourced soon! |BM25|ColBERTv2|Jina-ColBERT|MiniLM-L-6-v2|BGE-reranker-base-v1|BGE-reranker-large-v1|Jina-reranker-base-v1| | --- | :---: | :---: | :---: | :---: | :---: | :---: | Arguana |29.99|33.42|33.95|30.67|23.26|25.42|42.59| Climate-Fever |16.51|20.66|21.87|24.70|31.60|31.98|25.49| DBPedia |31.80|42.16|41.43|43.90|41.56|43.79|43.68| FEVER |65.13|81.07|83.49|80.77|87.07|89.11|86.10| FiQA |23.61|35.60|36.68|34.87|33.17|37.70|41.38| HotpotQA |63.30|68.84|68.62|72.65|79.04|79.98|75.61| NFCorpus |33.75|36.69|36.38|36.48|32.71|36.57|37.73| NQ |30.55|51.27|51.01|52.01|53.55|56.81|56.82| Quora |78.86|85.18|82.75|82.45|78.44|81.06|87.31| SCIDOCS |14.90|15.39|16.67|16.28|15.06|16.84|19.56| SciFact |67.89|70.23|70.95|69.53|70.62|74.14|75.01| TREC-COVID |59.47|75.00|76.89|74.45|67.46|74.32|82.09| Webis-touche2020|44.22|32.12|32.56|28.40|34.37|35.66|31.62| Average |43.08|49.82|50.25|49.78|49.84|52.57|54.23| We are planning to improve the performance of Jina-ColBERT by fine-tuning on more datasets in the future. Additionally, we provide the following embedding models, you can also use them for retrieval. - `jina-embeddings-v2-base-en`: 137 million parameters. - `jina-embeddings-v2-base-zh`: 161 million parameters Chinese-English bilingual model. - `jina-embeddings-v2-base-de`: 161 million parameters German-English bilingual model. - `jina-embeddings-v2-base-es`: 161 million parameters Spanish-English bilingual model. Join our Discord community and chat with other community members about ideas.

license:apache-2.0
468
100

jina-embeddings-v5-text-small-retrieval-mlx

license:cc-by-nc-4.0
365
2

jina-embeddings-v5-text-small-clustering-mlx

license:cc-by-nc-4.0
239
0

jina-embeddings-v4-vllm-text-matching

224
6

jina-embeddings-v5-text-small-text-matching-mlx

license:cc-by-nc-4.0
224
1

reader-lm-0.5b

NaNK
license:cc-by-nc-4.0
218
145

jina-embedding-l-en-v1

license:apache-2.0
215
25

jina-embeddings-v5-text-small-classification-mlx

license:cc-by-nc-4.0
193
0

jina-vlm-mlx

license:cc-by-nc-4.0
96
1

jina-vlm-mlx-4bit

NaNK
license:cc-by-nc-4.0
96
1

jina-embeddings-v5-text-nano-clustering-mlx

license:cc-by-nc-4.0
96
0

jina-reranker-v3-mlx

MLX port of jina-reranker-v3, a 0.6B parameter multilingual listwise reranker optimized for Apple Silicon. Features native MLX implementation with 100% matching of rank scores and embeddings to the original implementation. No transformers library required. Returns: List of dicts with keys: - `document`: Original document text - `relevancescore`: Float score (higher = more relevant) - `index`: Position in input documents list - `embedding`: Document embedding (if `returnembeddings=True`) If you find `jina-reranker-v3` useful in your research, please cite the original paper: This MLX implementation follows the same CC BY-NC 4.0 license as the original model. For commercial usage inquiries, please contact Jina AI.

NaNK
license:cc-by-nc-4.0
96
0

jina-embeddings-v5-text-nano-retrieval-mlx

license:cc-by-nc-4.0
79
0

jina-embeddings-v5-text-nano-classification-mlx

license:cc-by-nc-4.0
76
0

jina-embeddings-v5-text-nano-text-matching-mlx

license:cc-by-nc-4.0
73
0

xlm-roberta-flash-implementation

license:cc-by-nc-4.0
38
34

flat-2d-animerge

16
9

starcoder-1b-textbook

NaNK
license:cc-by-nc-sa-4.0
13
7

text-seg-lm-qwen2-0.5b

NaNK
7
10

text-seg-lm-qwen2-0.5b-summary-chunking

NaNK
5
7

jina-bert-v2-qk-post-norm

5
1

jina-bert-v2-qk-devlin-norm-1e-2

license:apache-2.0
5
0

jina-embeddings-v3-small-ci

4
0

jina-bert-implementation

license:apache-2.0
0
10

jina-clip-implementation

license:cc-by-nc-4.0
0
9

clip-models

license:apache-2.0
0
4

falcon-7b-code-alpaca-lora

NaNK
license:cc-by-nc-4.0
0
3

falcon-7b-code-alpaca

NaNK
license:cc-by-nc-4.0
0
3

falcon-40b-code-alpaca

NaNK
license:cc-by-nc-4.0
0
3

falcon-40b-code-alpaca-lora

NaNK
license:cc-by-nc-4.0
0
2

jina-reranker-m0-debug

NaNK
license:cc-by-nc-4.0
0
2