zeroentropy

5 models • 2 total models in database

Sort by:

zerank-1-small

In search enginers, rerankers are crucial for improving the accuracy of your retrieval system. This 1.7B reranker is the smaller version of our flagship model zeroentropy/zerank-1. Though the model is over 2x smaller, it maintains nearly the same standard of performance, continuing to outperform other popular rerankers, and displaying massive accuracy gains over traditional vector search. We release this model under the open-source Apache 2.0 license, in order to support the open-source community and push the frontier of what's possible with open-source models. The model can also be inferenced using ZeroEntropy's /models/rerank endpoint. NDCG@10 scores between `zerank-1-small` and competing closed-source proprietary rerankers. Since we are evaluating rerankers, OpenAI's `text-embedding-3-small` is used as an initial retriever for the Top 100 candidate documents. | Task | Embedding | cohere-rerank-v3.5 | Salesforce/Llama-rank-v1 | zerank-1-small | zerank-1 | |----------------|-----------|--------------------|--------------------------|----------------|----------| | Code | 0.678 | 0.724 | 0.694 | 0.730 | 0.754 | | Conversational | 0.250 | 0.571 | 0.484 | 0.556 | 0.596 | | Finance | 0.839 | 0.824 | 0.828 | 0.861 | 0.894 | | Legal | 0.703 | 0.804 | 0.767 | 0.817 | 0.821 | | Medical | 0.619 | 0.750 | 0.719 | 0.773 | 0.796 | | STEM | 0.401 | 0.510 | 0.595 | 0.680 | 0.694 | Comparing BM25 and Hybrid Search without and with `zerank-1-small`:

NaNK

license:apache-2.0

17,989

zerank-2

NaNK

license:cc-by-nc-4.0

9,728

zerank-1

In search engines, rerankers are crucial for improving the accuracy of your retrieval system. However, SOTA rerankers are closed-source and proprietary. At ZeroEntropy, we've trained a SOTA reranker outperforming closed-source competitors, and we're launching our model here on HuggingFace. This reranker outperforms proprietary rerankers such as `cohere-rerank-v3.5` and `Salesforce/LlamaRank-v1` across a wide variety of domains, including finance, legal, code, STEM, medical, and conversational data. At ZeroEntropy we've developed an innovative multi-stage pipeline that models query-document relevance scores as adjusted Elo ratings. See our Technical Report (Coming soon!) for more details. Since we're a small company, this model is only released under a non-commercial license. If you'd like a commercial license, please contact us at [email protected] and we'll get you a license ASAP. For this model's smaller twin, see zerank-1-small, which we've fully open-sourced under an Apache 2.0 License. The model can also be inferenced using ZeroEntropy's /models/rerank endpoint. NDCG@10 scores between `zerank-1` and competing closed-source proprietary rerankers. Since we are evaluating rerankers, OpenAI's `text-embedding-3-small` is used as an initial retriever for the Top 100 candidate documents. | Task | Embedding | cohere-rerank-v3.5 | Salesforce/Llama-rank-v1 | zerank-1-small | zerank-1 | |----------------|-----------|--------------------|--------------------------|----------------|--------------| | Code | 0.678 | 0.724 | 0.694 | 0.730 | 0.754 | | Conversational | 0.250 | 0.571 | 0.484 | 0.556 | 0.596 | | Finance | 0.839 | 0.824 | 0.828 | 0.861 | 0.894 | | Legal | 0.703 | 0.804 | 0.767 | 0.817 | 0.821 | | Medical | 0.619 | 0.750 | 0.719 | 0.773 | 0.796 | | STEM | 0.401 | 0.510 | 0.595 | 0.680 | 0.694 | Comparing BM25 and Hybrid Search without and with zerank-1:

NaNK

license:cc-by-nc-4.0

1,567

zembed-1

license:cc-by-nc-4.0

1,163

zegen-1

NaNK

license:cc-by-nc-4.0