Felladrin
gguf-smollm-360M-instruct-add-basics
gguf-MXFP4-gpt-oss-20b-Derestricted
Llama-68M-Chat-v1
gguf-jina-reranker-v1-tiny-en
Model creator: Jina AI Original model: jina-reranker-v1-tiny-en GGUF quantization: based on llama.cpp release f4d2b This model is designed for blazing-fast reranking while maintaining competitive performance. What's more, it leverages the power of our JinaBERT model as its foundation. `JinaBERT` itself is a unique variant of the BERT architecture that supports the symmetric bidirectional variant of ALiBi. This allows `jina-reranker-v1-tiny-en` to process significantly longer sequences of text compared to other reranking models, up to an impressive 8,192 tokens. To achieve the remarkable speed, the `jina-reranker-v1-tiny-en` employ a technique called knowledge distillation. Here, a complex, but slower, model (like our original jina-reranker-v1-base-en) acts as a teacher, condensing its knowledge into a smaller, faster student model. This student retains most of the teacher's knowledge, allowing it to deliver similar accuracy in a fraction of the time. Here's a breakdown of the reranker models we provide: | Model Name | Layers | Hidden Size | Parameters (Millions) | | ------------------------------------------------------------------------------------ | ------ | ----------- | --------------------- | | jina-reranker-v1-base-en | 12 | 768 | 137.0 | | jina-reranker-v1-turbo-en | 6 | 384 | 37.8 | | jina-reranker-v1-tiny-en | 4 | 384 | 33.0 | > Currently, the `jina-reranker-v1-base-en` model is not available on Hugging Face. You can access it via the Jina AI Reranker API. As you can see, the `jina-reranker-v1-turbo-en` offers a balanced approach with 6 layers and 37.8 million parameters. This translates to fast search and reranking while preserving a high degree of accuracy. The `jina-reranker-v1-tiny-en` prioritizes speed even further, achieving the fastest inference speeds with its 4-layer, 33.0 million parameter architecture. This makes it ideal for scenarios where absolute top accuracy is less crucial. 1. The easiest way to starting using `jina-reranker-v1-tiny-en` is to use Jina AI's Reranker API. 2. Alternatively, you can use the latest version of the `sentence-transformers>=0.27.0` library. You can install it via pip: Then, you can use the following code to interact with the model: 3. You can also use the `transformers` library to interact with the model programmatically. 4. You can also use the `transformers.js` library to run the model directly in JavaScript (in-browser, Node.js, Deno, etc.)! If you haven't already, you can install the Transformers.js JavaScript library from NPM using: Then, you can use the following code to interact with the model: That's it! You can now use the `jina-reranker-v1-tiny-en` model in your projects. We evaluated Jina Reranker on 3 key benchmarks to ensure top-tier performance and search relevance. | Model Name | NDCG@10 (17 BEIR datasets) | NDCG@10 (5 LoCo datasets) | Hit Rate (LlamaIndex RAG) | | ------------------------------------------ | -------------------------- | ------------------------- | ------------------------- | | `jina-reranker-v1-base-en` | 52.45 | 87.31 | 85.53 | | `jina-reranker-v1-turbo-en` | 49.60 | 69.21 | 85.13 | | `jina-reranker-v1-tiny-en` (you are here) | 48.54 | 70.29 | 85.00 | | `mxbai-rerank-base-v1` | 49.19 | - | 82.50 | | `mxbai-rerank-xsmall-v1` | 48.80 | - | 83.69 | | `ms-marco-MiniLM-L-6-v2` | 48.64 | - | 82.63 | | `ms-marco-MiniLM-L-4-v2` | 47.81 | - | 83.82 | | `bge-reranker-base` | 47.89 | - | 83.03 | - `NDCG@10` is a measure of ranking quality, with higher scores indicating better search results. `Hit Rate` measures the percentage of relevant documents that appear in the top 10 search results. - The results of LoCo datasets on other models are not available since they do not support long documents more than 512 tokens. For more details, please refer to our benchmarking sheets. Join our Discord community and chat with other community members about ideas.
TinyMistral-248M-Chat-v4
Minueza-32M-Base
gguf-sharded-Qwen2-0.5B-Instruct
gguf-Qwen1.5-0.5B-Chat
gguf-flan-t5-small
gguf-gemma-2b-orpo
gguf-flan-t5-large
gguf-Aira-2-355M
gguf-Q2_K_S-Mixed-AutoRound-MiniMax-M2.1
gguf-TinyMistral-248M-Chat-v2
gguf-pythia-1.4b-sft-full
gguf-multi-qa-MiniLM-L6-cos-v1
gguf-Smol-Llama-101M-Chat-v1
gguf-Qwen2-0.5B-Instruct
gguf-Q5_K_M-Qwen2.5-0.5B-Instruct
gguf-MobileLLaMA-1.4B-Chat
gguf-Phi-3-mini-4k-instruct
gguf-openhermes-tinyllama-sft-qlora
gguf-LaMini-Flan-T5-248M
gguf-flan-t5-base
gguf-sharded-LaMini-Flan-T5-783M
gguf-q5_k_m-granite-3.0-2b-instruct
gguf-WizardVicuna-pythia-410m-deduped
gguf-sharded-Aira-2-355M
gguf-sharded-WizardVicuna-pythia-410m-deduped
gguf-Qwen1.5-0.5B-Chat_llamafy
gguf-sharded-Qwen2-1.5B-Instruct
gguf-sharded-Qwen1.5-0.5B-Chat_llamafy
gguf-Qwen2-1.5B-Instruct
gguf-SmolLM-135M-Instruct
gguf-Qwen2-0.5B-Instruct-llamafy
gguf-sharded-Qwen2-0.5B-Instruct-llamafy
gguf-sharded-gemma-2b-orpo
gguf-sharded-UD-Q4_K_XL-Qwen3-0.6B
gguf-sharded-Phi-3-mini-4k-instruct
Llama-160M-Chat-v1
Language: en License: apache-2.0
gguf-zephyr-220m-dpo-full
gguf-Llama-160M-Chat-v1
gguf-h2o-danube3-500m-chat
gguf-spin_gpt2_medium_alpaca_e2
Smol-Llama-101M-Chat-v1
gguf-TinyMistral-248M-SFT-v4
gguf-LaMini-Flan-T5-77M
gguf-Q8_0-bge-reranker-v2-m3
gguf-gemma-2-2b-it-abliterated
gguf-sharded-TinyMistral-248M-Chat-v2
gguf-1.5-Pints-2K-v0.1
gguf-sharded-openhermes-1b-olmo-sft-qlora
gguf-internlm2-chat-1_8b
gguf-sharded-internlm2-chat-1_8b
gguf-t5-base-grammar-correction
gguf-Pythia-Chat-Base-7B
gguf-MicroLlama
gguf-TinySolar-248m-4k-code-instruct
gguf-Q5_K_M-Fox-1-1.6B-Instruct-v0.1
gguf-sharded-flan-t5-large
gguf-openhermes-1b-olmo-sft-qlora
gguf-zephyr-1b-olmo-sft-qlora
gguf-sharded-TinySolar-248m-4k-code-instruct
gguf-Lite-Mistral-150M-v2-Instruct
gguf-prem-1B-chat
gguf-q8_0-madlad400-3b-mt
gguf-sharded-Llama-160M-Chat-v1
gguf-sharded-h2o-danube2-1.8b-chat
gguf-llama-160m
gguf-Lite-Oute-1-65M-Instruct
gguf-Q8_0-Qwen2.5-0.5B-Instruct
gguf-sharded-Qwen1.5-0.5B-Chat
gguf-Q5_K_M-smollm-360M-instruct-add-basics
gguf-sharded-falcon-mamba-7b-instruct
gguf-Hare-1.1B-Chat
gguf-OLMoE-1B-7B-0924-Instruct
gguf-q5_k_m-madlad400-3b-mt
gguf-sharded-prem-1B-chat
gguf-h2o-danube2-1.8b-chat
gguf-NuExtract-tiny
candle-quantized-LaMini-Flan-T5-248M
gguf-flan-t5-base-instruct-dolly_hhrlhf
gguf-Q4_K_M-Yi-1.5-6B-Chat
gguf-TinyMistral-248M-Chat-v1
gguf-smol_llama-220M-openhermes
gguf-sharded-MobileLLaMA-1.4B-Chat
gguf-DopeyTinyLlama-1.1B-v1
gguf-flan-alpaca-base
gguf-gpt2-chatbot
gguf-sharded-zephyr-1b-olmo-sft-qlora
gguf-Sheared-Pythia-160m-Platypus
gguf-t5-address-standardizer
gguf-sharded-Q5_K_L-Llama-3.2-3B-Instruct
gguf-Q8_0-SmolLM2-360M-Instruct
GGUF version of HuggingFaceTB/SmolLM2-360M-Instruct.
gguf-Pythia-31M-Chat-v1
gguf-pythia-3b-deduped-sft
gguf-MaxMini-Instruct-248M
gguf-SmolLM-360M-Instruct
gguf-sharded-Qwen2-1.5B-Instruct-imat
gguf-Q4_K_S-MiniCPM4-0.5B-QAT-Int4-unquantized
Felladrin/MiniCPM4-0.5B-QAT-Int4-unquantized-Q4KS-GGUF This model was converted to GGUF format from `openbmb/MiniCPM4-0.5B-QAT-Int4-unquantized` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Pythia-31M-Chat-v1
gguf-TinyMistral-248M-v2.5-Instruct-orpo
gguf-q5_k_l-imat-arcee-lite
gguf-Q3_K_XL-falcon-mamba-7b
gguf-vicuna-68m
gguf-TinyLlama-1.1B-1T-OpenOrca
gguf-sharded-spin_gpt2_medium_alpaca_e2
gguf-IPythia-410m
gguf-LaMini-Flan-T5-783M
gguf-TinySolar-248m-4k
gguf-sharded-Q5_K_L-Llama-3.2-1B-Instruct
gguf-Q5_K_M-NanoLM-1B-Instruct-v2
gguf-sharded-pythia-3b-deduped-sft
gguf-Aira-2-124M
gguf-Aira-2-124M-DPO
gguf-MiniMA-2-1B
gguf-q5_k_m-phi-3.5-mini-instruct
gguf-sharded-openhermes-tinyllama-sft-qlora
gguf-sharded-Aira-2-124M-DPO
gguf-sharded-IPythia-410m
gguf-sharded-TinyMistral-248M-v2.5-Instruct-orpo
gguf-sharded-pythia-1.4b-sft-full
gguf-sharded-Aira-2-124M
gguf-falcon-mamba-7b-instruct
Minueza-2-96M-Instruct-Variant-10
Minueza-32M-UltraChat
Language model with Apache 2.0 license.
gguf-sharded-LaMini-Flan-T5-248M
gguf-774M-03_09_2024
gguf-gpt2-alpaca-gpt4
gguf-flan-t5-small-finetuned-openai-summarize_from_feedback
gguf-sharded-gemma-2-2b-it-abliterated
gguf-Q5_K_M-Qwen3-4B-Merge-Variant-01
Felladrin/gguf-Q5KM-Qwen3-4B-Merge-Variant-01 This model was converted to GGUF format from `Felladrin/Qwen3-4B-Merge-Variant-01` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
gguf-sharded-Q4_K_S-AFM-4.5B
Sharded GGUF version of bartowski/arcee-aiAFM-4.5B-GGUF.
gguf-mamba-130m-hf
gguf-Q5_K_M-NanoLM-0.3B-Instruct-v2
Minueza-2-96M
gguf-Lite-Oute-1-300M-Instruct
gguf-sharded-Q4_K_S-gemma-3-270m-it
Sharded GGUF version of bartowski/googlegemma-3-270m-it-GGUF.
gguf-q5_k_m-tinydolphin-2.8.2-1.1b-laser
gguf-Q5_K_M-MagpieLM-4B-Chat-v0.1
gguf-sharded-UD-Q4_K_XL-Phi-4-mini-reasoning
Sharded GGUF version of unsloth/Phi-4-mini-reasoning-GGUF.
gguf-sharded-Q4_K_S-MiniCPM4-0.5B-QAT
Sharded GGUF version of Felladrin/MiniCPM4-0.5B-QAT-Int4-unquantized-Q4KS-GGUF.
gguf-Minueza-32M-Chat
gguf-Minueza-32Mx2-Chat
gguf-IQ3_XXS-OLMo-7B-0424-Instruct-hf
gguf-sharded-UD-Q4_K_XL-Qwen3-1.7B
gguf-sharded-Q4_K_S-Polaris-4B-Preview
gguf-sharded-Q4_K_S-cogito-v1-preview-llama-3B
gguf-Tinyllama-616M-Cinder
gguf-q5_k_m-imat-qwen2-0.5b-instruct
gguf-Q5_K_M-Nemotron-Mini-4B-Instruct
gguf-Q8_0-smollm-135M-instruct-v0.2
gguf-vicuna-160m
gguf-phi-1_5
gguf-flan-t5-small-cnndm
gguf-Q8_0-all-MiniLM-L6-v2
gguf-Mixnueza-6x32M-MoE
gguf-Minueza-32M-Base
gguf-q5_k_m-h2o-danube2-1.8b-chat
gguf-Q5_K_M-LlamaCorn-1.1B-Chat
gguf-Q8_0-Qwen2.5-Coder-1.5B-Instruct
mlx-5bit-Qwen3-4B-Merge-Variant-01
Qwen2-96M
gguf-Minueza-32M-UltraChat
gguf-sharded-TinyLlama-1.1B-1T-OpenOrca
gguf-sharded-h2o-danube3-500m-chat
gguf-Q3_K_L-Yi-1.5-6B-Chat
gguf-sharded-Q4_K_S-DeepSeek-R1-Distill-Qwen-1.5B
gguf-sharded-Q4_K_S-LFM2-350M
gguf-sharded-Q4_K_S-LFM2-700M
gguf-sharded-OLMo-7B-Instruct
gguf-Q5_K_L-Nemotron-Mini-4B-Instruct
gguf-Q5_K_M-SmolLM2-1.7B-Instruct
GGUF version of HuggingFaceTB/SmolLM2-1.7B-Instruct.
gguf-sharded-Q4_K_S-SmolLM3-3B
Sharded GGUF version of bartowski/HuggingFaceTBSmolLM3-3B-GGUF.
gguf-sharded-Q4_K_S-granite-3.3-2b-instruct
Minueza-32M-Chat
gguf-1.5-Pints-16K-v0.1
onnx-gpt2-medium-chat
onnx-gpt2-conversational-retrain
gguf-sharded-vicuna-160m
gguf-Q5_K_M-Phi-1_5-Instruct-v0.1
gguf-sharded-Q4_K_S-OLMoE-1B-7B-0924-Instruct
gguf-sharded-Q4_K_S-LFM2-1.2B
Minueza-2-96M-Instruct-Variant-02
gguf-sharded-Q3_K_L-OLMoE-1B-7B-0924-Instruct
gguf-sharded-Q4_K_S-Llama-3.1-Nemotron-Nano-4B-v1.1
Sharded GGUF version of bartowski/nvidiaLlama-3.1-Nemotron-Nano-4B-v1.1-GGUF.
gguf-sharded-stablelm-2-1_6b-chat
gguf-sharded-phi-2-orange-v2
gguf-q5_k_m-h2o-danube3-500m-chat
gguf-Q2_K_L-Llama-3.1-SuperNova-Lite
gguf-sharded-Q5_K_L-granite-3.0-3b-a800m-instruct
Sharded GGUF version of bartowski/granite-3.0-3b-a800m-instruct-GGUF.
gguf-sharded-Q8_0-Qwen2.5-Coder-0.5B-Instruct
gguf-Q4_0-Qwen2.5-Coder-32B-Instruct-abliterated
gguf-sharded-UD-Q4_K_XL-OLMo-2-0425-1B-Instruct
Qwen3-4B-Merge-Variant-01
llama2_xs_460M_experimental_evol_instruct
onnx-megatron-gpt2-345m-evol_instruct_v2
onnx-Pythia-31M-Chat-v1
onnx-Minueza-32M-Chat
gguf-mamba-370m-hf
gguf-stablelm-2-1_6b-chat
gguf-sharded-zephyr-220m-dpo-full
gguf-Q4_K_S-OLMo-7B-0424-Instruct-hf
gguf-Q5_K_M-NanoLM-70M-Instruct-v1
gguf-sharded-Q5_K_L-Replete-LLM-V2.5-Qwen-3b
gguf-sharded-Q5_K_L-Replete-LLM-V2.5-Qwen-0.5b
gguf-Q8_0-SmolLM2-135M-Instruct
GGUF version of HuggingFaceTB/SmolLM2-135M-Instruct.
gguf-Q4_K_M-MiniCPM3-4B
gguf-Q8_0-LaMini-Flan-T5-248M
gguf-sharded-Q4_K_S-gemma-3-4b-it
Minueza-2-96M-Instruct-Variant-03
gguf-sharded-Q4_K_S-Apriel-5B-Instruct-llamafied
gguf-sharded-Q4_K_S-h2o-danube3.1-4b-chat
gguf-sharded-Q4_K_S-gemma-3n-E2B-it
Sharded GGUF version of bartowski/googlegemma-3n-E2B-it-GGUF.
gguf-sharded-Q4_K_S-Falcon-H1-0.5B-Instruct
Sharded GGUF version of mradermacher/Falcon-H1-0.5B-Instruct-i1-GGUF.
Minueza-32Mx2-Chat
Sheared-Pythia-160m-Platypus
onnx-gpt2-large-conversational-retrain
onnx-GPT2-Medium-Alpaca-355m
onnx-llama2_xs_460M_experimental_evol_instruct
onnx-gpt2-alpaca
gguf-sharded-Phi-3-mini-4k-instruct-iMat
gguf-sharded-smashed-WizardLM-2-7B
gguf-sharded-Mistral-7B-OpenOrca
gguf-sharded-wavecoder-ultra-6.7b
mlc-q4f16-Phi-3.5-mini-instruct
gguf-Q5_K_M-TinyJensen-1.1B-Chat
gguf-Q5_K_M-fastchat-t5-3b-v1.0
gguf-Q5_K_M-Sheared-LLaMA-1.3B-ShareGPT
gguf-Q5_K_M-OLMo-1B-SFT-hf
gguf-sharded-q3_k_m-jais-adapted-7b-chat
Sharded GGUF version of QuantFactory/jais-adapted-7b-chat-GGUF.
gguf-q8_0-h2o-danube3-500m-chat
gguf-sharded-Q4_K_S-Qwen2.5-0.5B-Instruct
gguf-sharded-q5_k_m-internlm2_5-1_8b-chat
Sharded GGUF version of internlm/internlm25-18b-chat-gguf.
gguf-Q4_0-Qwen2.5-Coder-32B-Instruct
gguf-sharded-Q4_K_S-SmolLM2-135M-Instruct
gguf-sharded-Q4_K_S-TAID-LLM-1.5B
gguf-sharded-Q3_K_M-OLMoE-1B-7B-0125-Instruct
Minueza-2-96M-Instruct-Variant-06
Minueza-2-96M-Instruct-Variant-08
onnx-TinyMistral-248M-v2
LaMini-Neo-125M-Evol-Instruct
onnx-bloomz-560m-sft-chat
llama2_xs_460M_experimental_platypus
onnx-flan-alpaca-base
onnx-Cerebras-GPT-111M-instruction
onnx-flan-t5-base-samsum
onnx-Evol-Orca-LaMini-flan-t5-small
onnx-Smol-Llama-101M-Chat-v1
onnx-tinyllama-15M
onnx-tinyllama-42M
onnx-Gerbil-A-32m
Minueza-32M-Deita
onnx-TinyMistral-248M-Chat-v1
gguf-TinyLlama-1.1B-Chat-v1.0
mlc-q4f16_1-gemma-2-2b-it
gguf-sharded-Q3_K_XL-OLMoE-1B-7B-0924-Instruct
gguf-Q5_K_M-TinyLlama-1.1B-Chat-v1.0
gguf-sharded-Q5_K_M-TinyLlama-1.1B-Chat-v1.0
gguf-sharded-Q5_K_M-LlamaCorn-1.1B-Chat
gguf-sharded-F16-1.5-Pints-2K-v0.1
gguf-sharded-BF16-1.5-Pints-16K-v0.1
gguf-sharded-Q5_K-1.5-Pints-2K-v0.1
gguf-sharded-q5_k_l-granite-3.0-1b-a400m-instruct
gguf-Q5_K_L-AMD-OLMo-1B-SFT-DPO
Sharded GGUF version of bartowski/AMD-OLMo-1B-SFT-DPO-GGUF.
gguf-sharded-Q5_K_L-h2o-danube3-500m-chat
gguf-sharded-Q5_K_M-EXAONE-3.5-2.4B-Instruct
gguf-sharded-Q4_K_S-granite-3.1-1b-a400m-instruct
gguf-sharded-Q4_K_S-SmolLM2-360M-Instruct
Sharded GGUF version of bartowski/SmolLM2-360M-Instruct-GGUF.
gguf-sharded-Q4_K_S-AMD-OLMo-1B-SFT-DPO
gguf-sharded-Q4_K_S-h2o-danube3-500m-chat
gguf-sharded-Q4_K_S-MiniCPM3-4B
gguf-sharded-Q4_K_S-Phi-3.5-mini-instruct
gguf-sharded-Q4_K_S-Falcon3-1B-Instruct
gguf-sharded-Q4_K_S-granite-3.1-3b-a800m-instruct
gguf-sharded-Q4_K_S-pythia-1.4b-sft-full
Sharded GGUF version of Felladrin/gguf-pythia-1.4b-sft-full.
gguf-Q4_K_S-1.5-Pints-16K-v0.1
gguf-sharded-Q4_K_S-internlm2_5-1_8b-chat
gguf-sharded-Q4_K_S-EXAONE-3.5-2.4B-Instruct
gguf-sharded-Q4_K_S-MagpieLM-4B-Chat-v0.1
gguf-sharded-Q4_K_S-Nemotron-Mini-4B-Instruct
gguf-sharded-Q4_K_S-stablelm-2-zephyr-1.6b
Sharded GGUF version of second-state/stablelm-2-zephyr-1.6b-GGUF.