sentence-transformers

125 models • 15 total models in database
Sort by:

all-MiniLM-L6-v2

all-MiniLM-L6-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained `nreimers/MiniLM-L6-H384-uncased` model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developed this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developed this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. Our model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 256 word pieces is truncated. We use the pretrained `nreimers/MiniLM-L6-H384-uncased` model. Please refer to the model card for more detailed information about the pre-training procedure. We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the cross entropy loss by comparing with true pairs. We trained our model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core). We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with a 2e-5 learning rate. The full training script is accessible in this current repository: `trainscript.py`. We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences. We sampled each dataset given a weighted probability which configuration is detailed in the `dataconfig.json` file. | Dataset | Paper | Number of training tuples | |--------------------------------------------------------|:----------------------------------------:|:--------------------------:| | Reddit comments (2015-2018) | paper | 726,484,430 | | S2ORC Citation pairs (Abstracts) | paper | 116,288,806 | | WikiAnswers Duplicate question pairs | paper | 77,427,422 | | PAQ (Question, Answer) pairs | paper | 64,371,441 | | S2ORC Citation pairs (Titles) | paper | 52,603,982 | | S2ORC (Title, Abstract) | paper | 41,769,185 | | Stack Exchange (Title, Body) pairs | - | 25,316,456 | | Stack Exchange (Title+Body, Answer) pairs | - | 21,396,559 | | Stack Exchange (Title, Answer) pairs | - | 21,396,559 | | MS MARCO triplets | paper | 9,144,553 | | GOOAQ: Open Question Answering with Diverse Answer Types | paper | 3,012,496 | | Yahoo Answers (Title, Answer) | paper | 1,198,260 | | Code Search | - | 1,151,414 | | COCO Image captions | paper | 828,395| | SPECTER citation triplets | paper | 684,100 | | Yahoo Answers (Question, Answer) | paper | 681,164 | | Yahoo Answers (Title, Question) | paper | 659,896 | | SearchQA | paper | 582,261 | | Eli5 | paper | 325,475 | | Flickr 30k | paper | 317,695 | | Stack Exchange Duplicate questions (titles) | | 304,525 | | AllNLI (SNLI and MultiNLI | paper SNLI, paper MultiNLI | 277,230 | | Stack Exchange Duplicate questions (bodies) | | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | | 250,460 | | Sentence Compression | paper | 180,000 | | Wikihow | paper | 128,542 | | Altlex | paper | 112,696 | | Quora Question Triplets | - | 103,663 | | Simple Wikipedia | paper | 102,225 | | Natural Questions (NQ) | paper | 100,231 | | SQuAD2.0 | paper | 87,599 | | TriviaQA | - | 73,346 | | Total | | 1,170,060,424 |

145,767,733
4,130

all-mpnet-base-v2

all-mpnet-base-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. Text Embeddings Inference (TEI) is a blazing fast inference solution for text embedding models. Send a request to `/v1/embeddings` to generate embeddings via the OpenAI Embeddings API: Or check the Text Embeddings Inference API specification instead. The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We used the pretrained `microsoft/mpnet-base` model and fine-tuned in on a 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developed this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developed this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks. By default, input text longer than 384 word pieces is truncated. We use the pretrained `microsoft/mpnet-base` model. Please refer to the model card for more detailed information about the pre-training procedure. We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch. We then apply the cross entropy loss by comparing with true pairs. We trained our model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core). We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with a 2e-5 learning rate. The full training script is accessible in this current repository: `trainscript.py`. We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences. We sampled each dataset given a weighted probability which configuration is detailed in the `dataconfig.json` file. | Dataset | Paper | Number of training tuples | |--------------------------------------------------------|:----------------------------------------:|:--------------------------:| | Reddit comments (2015-2018) | paper | 726,484,430 | | S2ORC Citation pairs (Abstracts) | paper | 116,288,806 | | WikiAnswers Duplicate question pairs | paper | 77,427,422 | | PAQ (Question, Answer) pairs | paper | 64,371,441 | | S2ORC Citation pairs (Titles) | paper | 52,603,982 | | S2ORC (Title, Abstract) | paper | 41,769,185 | | Stack Exchange (Title, Body) pairs | - | 25,316,456 | | Stack Exchange (Title+Body, Answer) pairs | - | 21,396,559 | | Stack Exchange (Title, Answer) pairs | - | 21,396,559 | | MS MARCO triplets | paper | 9,144,553 | | GOOAQ: Open Question Answering with Diverse Answer Types | paper | 3,012,496 | | Yahoo Answers (Title, Answer) | paper | 1,198,260 | | Code Search | - | 1,151,414 | | COCO Image captions | paper | 828,395| | SPECTER citation triplets | paper | 684,100 | | Yahoo Answers (Question, Answer) | paper | 681,164 | | Yahoo Answers (Title, Question) | paper | 659,896 | | SearchQA | paper | 582,261 | | Eli5 | paper | 325,475 | | Flickr 30k | paper | 317,695 | | Stack Exchange Duplicate questions (titles) | | 304,525 | | AllNLI (SNLI and MultiNLI | paper SNLI, paper MultiNLI | 277,230 | | Stack Exchange Duplicate questions (bodies) | | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | | 250,460 | | Sentence Compression | paper | 180,000 | | Wikihow | paper | 128,542 | | Altlex | paper | 112,696 | | Quora Question Triplets | - | 103,663 | | Simple Wikipedia | paper | 102,225 | | Natural Questions (NQ) | paper | 100,231 | | SQuAD2.0 | paper | 87,599 | | TriviaQA | - | 73,346 | | Total | | 1,170,060,424 |

17,967,275
1,186

paraphrase-multilingual-MiniLM-L12-v2

sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

17,696,945
1,108

paraphrase-multilingual-mpnet-base-v2

sentence-transformers/paraphrase-multilingual-mpnet-base-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. Using this model becomes easy when you have sentence-transformers installed: Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. Text Embeddings Inference (TEI) is a blazing fast inference solution for text embedding models. Send a request to `/v1/embeddings` to generate embeddings via the OpenAI Embeddings API: Or check the Text Embeddings Inference API specification instead. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

5,690,341
422

multi-qa-mpnet-base-dot-v1

--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - text-embeddings-inference datasets: - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - search_qa - eli5 - natural_questions - trivia_qa - embedding-data/QQP - embedding-data/PAQ_pairs - embedding-data/Amazon-QA - embedding-data/WikiAnswers pipeline_tag: sentence-similarity ---

4,359,828
183

all-MiniLM-L12-v2

all-MiniLM-L12-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.

3,946,067
299

paraphrase-MiniLM-L6-v2

This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. Using this model becomes easy when you have sentence-transformers installed: Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

3,570,990
145

gtr-t5-base

--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity pipeline_tag: sentence-similarity ---

3,294,475
26

multi-qa-MiniLM-L6-cos-v1

--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - search_qa - eli5 - natural_questions - trivia_qa - embedding-data/QQP - embedding-data/PAQ_pairs - embedding-data/Amazon-QA - embedding-data/WikiAnswers pipeline_tag: sentence-similarity ---

1,724,737
135

distiluse-base-multilingual-cased-v2

sentence-transformers/distiluse-base-multilingual-cased-v2 This is a sentence-transformers model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search. Using this model becomes easy when you have sentence-transformers installed: If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

license:apache-2.0
1,533,111
195

msmarco-distilbert-base-tas-b

--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - ms_marco pipeline_tag: sentence-similarity ---

license:apache-2.0
1,412,856
43

paraphrase-mpnet-base-v2

--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - text-embeddings-inference pipeline_tag: sentence-similarity ---

license:apache-2.0
1,386,605
45

stsb-xlm-r-multilingual

--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity ---

license:apache-2.0
1,131,814
53

distiluse-base-multilingual-cased-v1

--- language: - multilingual - ar - zh - nl - en - fr - de - it - ko - pl - pt - ru - es - tr license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity pipeline_tag: sentence-similarity ---

license:apache-2.0
809,583
124

LaBSE

--- language: - multilingual - af - sq - am - ar - hy - as - az - eu - be - bn - bs - bg - my - ca - ceb - zh - co - hr - cs - da - nl - en - eo - et - fi - fr - fy - gl - ka - de - el - gu - ht - ha - haw - he - hi - hmn - hu - is - ig - id - ga - it - ja - jv - kn - kk - km - rw - ko - ku - ky - lo - la - lv - lt - lb - mk - mg - ms - ml - mt - mi - mr - mn - ne - no - ny - or - fa - pl - pt - pa - ro - ru - sm - gd - sr - st - sn - si - sk - sl - so - es - su - sw - sv - tl - tg - ta - tt - t

license:apache-2.0
754,770
319

all-roberta-large-v1

--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity ---

license:apache-2.0
697,278
65

bert-base-nli-mean-tokens

--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- **⚠️ This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: [SBERT.net - Pretrained Models](https://www.sbert.net/docs/pretrained_models.html)**

license:apache-2.0
673,734
40

paraphrase-MiniLM-L3-v2

--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - flax-sentence-embeddings/stackexchange_xml - s2orc - ms_marco - wiki_atomic_edits - snli - multi_nli - embedding-data/altlex - embedding-data/simple-wiki - embedding-data/flickr30k-captions - embedding-data/coco_captions - embedding-data/sentence-compression - embedding-data/QQP - yahoo_answers_topics pipeline_tag: sentence-similarity ---

license:apache-2.0
507,823
23

msmarco-bert-base-dot-v5

--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity ---

357,215
18

all-distilroberta-v1

--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - s2orc - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - code_search_net - search_qa - eli5 - snli - multi_nli - wikihow - natural_questions - trivia_qa - embedding-data/sentence-compression - embedding-data/flickr30k-captions - embedding-data/altlex - embedding-data/simple-wiki - embeddi

license:apache-2.0
333,324
41

stsb-roberta-base

--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity ---

license:apache-2.0
327,585
1

distilbert-base-nli-mean-tokens

--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: feature-extraction ---

license:apache-2.0
290,989
12

paraphrase-xlm-r-multilingual-v1

--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity ---

license:apache-2.0
247,598
69

multi-qa-mpnet-base-cos-v1

--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - text-embeddings-inference pipeline_tag: sentence-similarity ---

219,284
42

multi-qa-distilbert-cos-v1

--- language: - en library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - flax-sentence-embeddings/stackexchange_xml - ms_marco - gooaq - yahoo_answers_topics - search_qa - eli5 - natural_questions - trivia_qa - embedding-data/QQP - embedding-data/PAQ_pairs - embedding-data/Amazon-QA - embedding-data/WikiAnswers pipeline_tag: sentence-similarity ---

208,637
24

msmarco-distilbert-base-v4

--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity ---

license:apache-2.0
203,189
11

paraphrase-albert-small-v2

--- license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers datasets: - flax-sentence-embeddings/stackexchange_xml - s2orc - ms_marco - wiki_atomic_edits - snli - multi_nli - embedding-data/altlex - embedding-data/simple-wiki - embedding-data/flickr30k-captions - embedding-data/coco_captions - embedding-data/sentence-compression - embedding-data/QQP - yahoo_answers_topics pipeline_tag: sentence-similarity ---

license:apache-2.0
199,553
10

distiluse-base-multilingual-cased

license:apache-2.0
173,186
18

msmarco-MiniLM-L12-cos-v5

145,954
9

nli-mpnet-base-v2

license:apache-2.0
121,871
15

clip-ViT-B-32-multilingual-v1

sentence-transformers/clip-ViT-B-32-multilingual-v1 This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text (in 50+ languages) and images to a common dense vector space such that images and the matching texts are close. This model can be used for image search (users search through a large collection of images) and for multi-lingual zero-shot image classification (image labels are defined as text). Using this model becomes easy when you have sentence-transformers installed: Multilingual Image Search - Demo For a demo of multilingual image search, have a look at: ImageSearch-multilingual.ipynb ( Colab version ) For more details on image search and zero-shot image classification, have a look at the documentation on SBERT.net. Training This model has been created using Multilingual Knowledge Distillation. As teacher model, we used the original `clip-ViT-B-32` and then trained a multilingual DistilBERT model as student model. Using parallel data, the multilingual student model learns to align the teachers vector space across many languages. As a result, you get an text embedding model that works for 50+ languages. The image encoder from CLIP is unchanged, i.e. you can use the original CLIP image encoder to encode images. Have a look at the SBERT.net - Multilingual-Models documentation on more details and for training code. We used the following 50+ languages to align the vector spaces: ar, bg, ca, cs, da, de, el, es, et, fa, fi, fr, fr-ca, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, pt, pt-br, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh-cn, zh-tw. The original multilingual DistilBERT supports 100+ lanugages. The model also work for these languages, but might not yield the best results. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

license:apache-2.0
93,325
180

stsb-roberta-large

⚠️ This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: SBERT.net - Pretrained Models This is a sentence-transformers model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search. Using this model becomes easy when you have sentence-transformers installed: Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

license:apache-2.0
93,259
4

paraphrase-MiniLM-L12-v2

license:apache-2.0
89,405
6

bert-large-nli-max-tokens

⚠️ This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: SBERT.net - Pretrained Models This is a sentence-transformers model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search. Using this model becomes easy when you have sentence-transformers installed: Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

license:apache-2.0
73,132
0

multi-qa-distilbert-dot-v1

63,878
1

msmarco-distilbert-cos-v5

59,707
10

sentence-t5-base

license:apache-2.0
59,699
51

msmarco-MiniLM-L6-v3

license:apache-2.0
49,058
14

multi-qa-MiniLM-L6-dot-v1

multi-qa-MiniLM-L6-dot-v1 This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and was designed for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources. For an introduction to semantic search, have a look at: SBERT.net - Semantic Search Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the correct pooling-operation on-top of the contextualized word embeddings. In the following some technical details how this model must be used: | Setting | Value | | --- | :---: | | Dimensions | 384 | | Produces normalized embeddings | No | | Pooling-Method | CLS pooling | | Suitable score functions | dot-product (e.g. `util.dotscore`) | The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised contrastive learning objective. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset. We developped this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developped this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. Our model is intented to be used for semantic search: It encodes queries / questions and text paragraphs in a dense vector space. It finds relevant documents for the given passages. Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text. The full training script is accessible in this current repository: `trainscript.py`. We use the pretrained `nreimers/MiniLM-L6-H384-uncased` model. Please refer to the model card for more detailed information about the pre-training procedure. We use the concatenation from multiple datasets to fine-tune our model. In total we have about 215M (question, answer) pairs. We sampled each dataset given a weighted probability which configuration is detailed in the `dataconfig.json` file. The model was trained with MultipleNegativesRankingLoss using CLS-pooling, dot-product as similarity function, and a scale of 1. | Dataset | Number of training tuples | |--------------------------------------------------------|:--------------------------:| | WikiAnswers Duplicate question pairs from WikiAnswers | 77,427,422 | | PAQ Automatically generated (Question, Paragraph) pairs for each paragraph in Wikipedia | 64,371,441 | | Stack Exchange (Title, Body) pairs from all StackExchanges | 25,316,456 | | Stack Exchange (Title, Answer) pairs from all StackExchanges | 21,396,559 | | MS MARCO Triplets (query, answer, hardnegative) for 500k queries from Bing search engine | 17,579,773 | | GOOAQ: Open Question Answering with Diverse Answer Types (query, answer) pairs for 3M Google queries and Google featured snippet | 3,012,496 | | Amazon-QA (Question, Answer) pairs from Amazon product pages | 2,448,839 | Yahoo Answers (Title, Answer) pairs from Yahoo Answers | 1,198,260 | | Yahoo Answers (Question, Answer) pairs from Yahoo Answers | 681,164 | | Yahoo Answers (Title, Question) pairs from Yahoo Answers | 659,896 | | SearchQA (Question, Answer) pairs for 140k questions, each with Top5 Google snippets on that question | 582,261 | | ELI5 (Question, Answer) pairs from Reddit ELI5 (explainlikeimfive) | 325,475 | | Stack Exchange Duplicate questions pairs (titles) | 304,525 | | Quora Question Triplets (Question, DuplicateQuestion, HardNegative) triplets for Quora Questions Pairs dataset | 103,663 | | Natural Questions (NQ) (Question, Paragraph) pairs for 100k real Google queries with relevant Wikipedia paragraph | 100,231 | | SQuAD2.0 (Question, Paragraph) pairs from SQuAD2.0 dataset | 87,599 | | TriviaQA (Question, Evidence) pairs | 73,346 | | Total | 214,988,242 |

43,721
17

stsb-roberta-base-v2

license:apache-2.0
40,465
5

msmarco-distilbert-dot-v5

msmarco-distilbert-dot-v5 This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for semantic search. It has been trained on 500K (query, answer) pairs from the MS MARCO dataset. For an introduction to semantic search, have a look at: SBERT.net - Semantic Search Usage (Sentence-Transformers) Using this model becomes easy when you have sentence-transformers installed: Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the correct pooling-operation on-top of the contextualized word embeddings. In the following some technical details how this model must be used: | Setting | Value | | --- | :---: | | Dimensions | 768 | | Max Sequence Length | 512 | | Produces normalized embeddings | No | | Pooling-Method | Mean pooling | | Suitable score functions | dot-product (e.g. `util.dotscore`) | See `trainscript.py` in this repository for the used training script. `torch.utils.data.dataloader.DataLoader` of length 7858 with parameters: `sentencetransformers.losses.MarginMSELoss.MarginMSELoss` If you find this model helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks: This model is released under the Apache 2 license. However, note that this model was trained on the MS MARCO dataset which has it's own license restrictions: MS MARCO - Terms and Conditions.

license:apache-2.0
34,640
15

paraphrase-distilroberta-base-v1

license:apache-2.0
31,423
7

sentence-t5-xl

This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space. The model works well for sentence similarity tasks, but doesn't perform that well for semantic search tasks. This model was converted from the Tensorflow model st5-3b-1 to PyTorch. When using this model, have a look at the publication: Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models. The tfhub model and this PyTorch model can produce slightly different embeddings, however, when run on the same benchmarks, they produce identical results. The model uses only the encoder from a T5-3B model. The weights are stored in FP16. Using this model becomes easy when you have sentence-transformers installed: The model requires sentence-transformers version 2.2.0 or newer. If you find this model helpful, please cite the respective publication: Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models

license:apache-2.0
31,120
6

paraphrase-TinyBERT-L6-v2

license:apache-2.0
27,783
5

all-mpnet-base-v1

license:apache-2.0
26,411
12

distilbert-base-nli-stsb-mean-tokens

license:apache-2.0
22,417
11

allenai-specter

license:apache-2.0
17,147
27

nli-distilroberta-base-v2

license:apache-2.0
16,597
1

roberta-large-nli-stsb-mean-tokens

license:apache-2.0
16,362
1

embeddinggemma-300m-medical

EmbeddingGemma-300m finetuned on the Medical Instruction and RetrIeval Dataset (MIRIAD) This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the miriad/miriad-4.4M dataset (specifically the first 100.000 question-passage pairs from tomaarsen/miriad-4.4M-split). It maps sentences & documents to a 768-dimensional dense vector space and can be used for medical information retrieval, specifically designed for searching for passages (up to 1k tokens) of scientific medical papers using detailed medical questions. This model has been trained using code from our EmbeddingGemma blogpost to showcase how the EmbeddingGemma model can be finetuned on specific domains/tasks for even stronger performance. It is not affiliated with Google. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 1024 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Training Dataset: - miriad-4.4m-split (the first 100.000 samples of the `default` subset) - Language: en - License: apache-2.0 - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face | Metric | miriad-eval-1kq-31kd | miriad-test-1kq-31kd | |:--------------------|:---------------------|:---------------------| | cosineaccuracy@1 | 0.822 | 0.802 | | cosineaccuracy@3 | 0.926 | 0.907 | | cosineaccuracy@5 | 0.945 | 0.942 | | cosineaccuracy@10 | 0.976 | 0.963 | | cosineprecision@1 | 0.822 | 0.802 | | cosineprecision@3 | 0.3087 | 0.3023 | | cosineprecision@5 | 0.189 | 0.1884 | | cosineprecision@10 | 0.0976 | 0.0963 | | cosinerecall@1 | 0.822 | 0.802 | | cosinerecall@3 | 0.926 | 0.907 | | cosinerecall@5 | 0.945 | 0.942 | | cosinerecall@10 | 0.976 | 0.963 | | cosinendcg@10 | 0.9026 | 0.8862 | | cosinemrr@10 | 0.8788 | 0.8611 | | cosinemap@100 | 0.8797 | 0.863 | Dataset: miriad-4.4m-split at 596b9ab Size: 100,000 training samples Columns: question and passagetext Approximate statistics based on the first 1000 samples: | | question | passagetext | |:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------| | type | string | string | | details | min: 7 tokens mean: 20.79 tokens max: 60 tokens | min: 481 tokens mean: 945.6 tokens max: 1024 tokens | Samples: | question | passagetext | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What factors may contribute to increased pulmonary conduit durability in patients who undergo the Ross operation compared to those with right ventricular outflow tract obstruction? | I n 1966, Ross and Somerville 1 reported the first use of an aortic homograft to establish right ventricle-to-pulmonary artery continuity in a patient with tetralogy of Fallot and pulmonary atresia. Since that time, pulmonary position homografts have been used in a variety of right-sided congenital heart lesions. Actuarial 5-year homograft survivals for cryopreserved homografts are reported to range between 55% and 94%, with the shortest durability noted in patients less than 2 years of age. 4 Pulmonary position homografts also are used to replace pulmonary autografts explanted to repair left-sided outflow disease (the Ross operation). Several factors may be likely to favor increased pulmonary conduit durability in Ross patients compared with those with right ventricular outflow tract obstruction, including later age at operation (allowing for larger homografts), more normal pulmonary artery architecture, absence of severe right ventricular hypertrophy, and more natural positioning of ... | | How does MCAM expression in hMSC affect the growth and maintenance of hematopoietic progenitors? | After culture in a 3-dimensional hydrogel-based matrix, which constitutes hypoxic conditions, MCAM expression is lost. Concordantly, Tormin et al. demonstrated that MCAM is down-regulated under hypoxic conditions. 10 Furthermore, it was shown by others and our group that oxygen tension causes selective modification of hematopoietic cell and mesenchymal stromal cell interactions in co-culture systems as well as influence HSPC metabolism. [44] [45] [46] Thus, the observed differences between Sharma et al. and our data in HSPC supporting capacity of hMSC are likely due to the different culture conditions used. Further studies are required to clarify the influence of hypoxia in our model system. Altogether these findings provide further evidence for the importance of MCAM in supporting HSPC. Furthermore, previous reports have shown that MCAM is down-regulated in MSC after several passages as well as during aging and differentiation. 19, 47 Interestingly, MCAM overexpression in hMSC enhance... | | What is the relationship between Fanconi anemia and breast and ovarian cancer susceptibility genes? | ( 31 ) , of which 5% -10 % may be caused by genetic factors ( 32 ) , up to half a million of these patients may be at risk of secondary hereditary neoplasms. The historic observation of twofold to fi vefold increased risks of cancers of the ovary, thyroid, and connective tissue after breast cancer ( 33 ) presaged the later syndromic association of these tumors with inherited mutations of BRCA1, BRCA2, PTEN, and p53 ( 16 ) . By far the largest cumulative risk of a secondary cancer in BRCA mutation carriers is associated with cancer in the contralateral breast, which may reach a risk of 29.5% at 10 years ( 34 ) . The Breast Cancer Linkage Consortium ( 35 , 36 ) also documented threefold to fi vefold increased risks of subsequent cancers of prostate, pancreas, gallbladder, stomach, skin (melanoma), and uterus in BRCA2 mutation carriers and twofold increased risks of prostate and pancreas cancer in BRCA1 mutation carriers; these results are based largely on self-reported family history inf... | Loss: CachedMultipleNegativesRankingLoss with these parameters: Dataset: miriad-4.4m-split at 596b9ab Size: 1,000 evaluation samples Columns: question and passagetext Approximate statistics based on the first 1000 samples: | | question | passagetext | |:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------| | type | string | string | | details | min: 7 tokens mean: 20.91 tokens max: 61 tokens | min: 465 tokens mean: 943.1 tokens max: 1024 tokens | Samples: | question | passagetext | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What are some hereditary cancer syndromes that can result in various forms of cancer? | Hereditary Cancer Syndromes, including Hereditary Breast and Ovarian Cancer (HBOC) and Lynch Syndrome (LS), can result in various forms of cancer due to germline mutations in cancer predisposition genes. While the major contributory genes for these syndromes have been identified and well-studied (BRCA1/ BRCA2 for HBOC and MSH2/MSH6/MLH1/PMS2/ EPCAM for LS), there remains a large percentage of associated cancer cases that are negative for germline mutations in these genes, including 80% of women with a personal or family history of breast cancer who are negative for BRCA1/2 mutations [1] . Similarly, between 30 and 50% of families fulfill stringent criteria for LS and test negative for germline mismatch repair gene mutations [2] . Adding complexity to these disorders is the significant overlap in the spectrum of cancers observed between various hereditary cancer syndromes, including many cancer susceptibility syndromes. Some that contribute to elevated breast cancer risk include Li-Frau... | | How do MAK-4 and MAK-5 exert their antioxidant properties? | Hybrid F1 mice were injected with urethane (300 mg/kg) at 8 days of age. A group was then put on a MAK-supplemented diet, another group was fed a standard pellet diet. At 36 weeks of age the mice were sacrificed and the livers examined for the presence of tumors mouse (Panel A) and for the number of nodules per mouse (Panel B) ( p We than measured the influence of the MAK-4+5 combination on the expression of the three liver-specific connexins (cx26, cx32, and cx43). The level of cx26 expression was similar in all the groups of mice treated with the MAK-supplemented diet and in the control (Figure 4, Panel A) . A significant, time-dependent increase in cx32 was observed in the liver of all the groups of MAK treated mice compared to the normal diet-fed controls. Cx32 expression increased 2-fold after 1 week of treatment, and 3-to 4-fold at 3 months (Figure 4, Pane... | | What are the primary indications for a decompressive craniectomy, and what role does neurocritical care play in determining the suitability of a patient for this procedure? | Decompressive craniectomy is a valid neurosurgical strategy now a day as an alternative to control an elevated intracranial pressure (ICP) and controlling the risk of uncal and/or subfalcine herniation, in refractory cases to the postural, ventilator, and pharmacological measures to control it. The neurocritical care and the ICP monitorization are key determinants to identify and postulate the inclusion criteria to consider a patient as candidate to this procedure, as it is always considered a rescue surgical technique. Head trauma and ischemic or hemorrhagic cerebrovascular disease with progressive deterioration due to mass effect are some of the cases that may require a decompressive craniectomy with its different variants. However, this procedure per se can have complications described in the postcraniectomy syndrome and may occur in short, medium, or even long term. 1,2 The paradoxical herniation is a condition in which there is a deviation of the midline with mass effect, even t... | Loss: CachedMultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `evalstrategy`: steps - `perdevicetrainbatchsize`: 128 - `perdeviceevalbatchsize`: 128 - `learningrate`: 2e-05 - `numtrainepochs`: 1 - `warmupratio`: 0.1 - `fp16`: True - `prompts`: {'question': 'task: search result | query: ', 'passagetext': 'title: none | text: '} - `batchsampler`: noduplicates - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 128 - `perdeviceevalbatchsize`: 128 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 2e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.1 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorch - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: {'question': 'task: search result | query: ', 'passagetext': 'title: none | text: '} - `batchsampler`: noduplicates - `multidatasetbatchsampler`: proportional - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | Validation Loss | miriad-eval-1kq-31kdcosinendcg@10 | miriad-test-1kq-31kdcosinendcg@10 | |:------:|:----:|:-------------:|:---------------:|:-----------------------------------:|:-----------------------------------:| | -1 | -1 | - | - | 0.8474 | 0.8340 | | 0.0256 | 20 | 0.1019 | - | - | - | | 0.0512 | 40 | 0.0444 | - | - | - | | 0.0767 | 60 | 0.0408 | - | - | - | | 0.1023 | 80 | 0.0462 | - | - | - | | 0.1279 | 100 | 0.0542 | 0.0525 | 0.8616 | - | | 0.1535 | 120 | 0.0454 | - | - | - | | 0.1790 | 140 | 0.0403 | - | - | - | | 0.2046 | 160 | 0.0463 | - | - | - | | 0.2302 | 180 | 0.0508 | - | - | - | | 0.2558 | 200 | 0.0497 | 0.0449 | 0.8643 | - | | 0.2813 | 220 | 0.0451 | - | - | - | | 0.3069 | 240 | 0.0445 | - | - | - | | 0.3325 | 260 | 0.0489 | - | - | - | | 0.3581 | 280 | 0.0452 | - | - | - | | 0.3836 | 300 | 0.0461 | 0.0406 | 0.8832 | - | | 0.4092 | 320 | 0.0415 | - | - | - | | 0.4348 | 340 | 0.04 | - | - | - | | 0.4604 | 360 | 0.0399 | - | - | - | | 0.4859 | 380 | 0.0423 | - | - | - | | 0.5115 | 400 | 0.0352 | 0.0316 | 0.8823 | - | | 0.5371 | 420 | 0.0408 | - | - | - | | 0.5627 | 440 | 0.0356 | - | - | - | | 0.5882 | 460 | 0.0371 | - | - | - | | 0.6138 | 480 | 0.0276 | - | - | - | | 0.6394 | 500 | 0.028 | 0.0280 | 0.8807 | - | | 0.6650 | 520 | 0.0302 | - | - | - | | 0.6905 | 540 | 0.0345 | - | - | - | | 0.7161 | 560 | 0.0325 | - | - | - | | 0.7417 | 580 | 0.033 | - | - | - | | 0.7673 | 600 | 0.0314 | 0.0264 | 0.8910 | - | | 0.7928 | 620 | 0.033 | - | - | - | | 0.8184 | 640 | 0.029 | - | - | - | | 0.8440 | 660 | 0.0396 | - | - | - | | 0.8696 | 680 | 0.0266 | - | - | - | | 0.8951 | 700 | 0.0262 | 0.0240 | 0.8968 | - | | 0.9207 | 720 | 0.0262 | - | - | - | | 0.9463 | 740 | 0.0327 | - | - | - | | 0.9719 | 760 | 0.0293 | - | - | - | | 0.9974 | 780 | 0.0304 | - | - | - | | -1 | -1 | - | - | 0.9026 | 0.8862 | Environmental Impact Carbon emissions were measured using CodeCarbon. - Energy Consumed: 0.828 kWh - Carbon Emitted: 0.331 kg of CO2 - Hours Used: 5.520 hours Training Hardware - On Cloud: No - GPU Model: 1 x NVIDIA GeForce RTX 3090 - CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K - RAM Size: 31.78 GB Framework Versions - Python: 3.11.6 - Sentence Transformers: 5.2.0.dev0 - Transformers: 4.56.0.dev0 - PyTorch: 2.7.1+cu126 - Accelerate: 1.6.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1

license:apache-2.0
15,658
32

sentence-t5-large

license:apache-2.0
14,517
25

stsb-mpnet-base-v2

license:apache-2.0
9,977
13

msmarco-MiniLM-L6-cos-v5

8,898
10

use-cmlm-multilingual

use-cmlm-multilingual This is a pytorch version of the universal-sentence-encoder-cmlm/multilingual-base-br model. It can be used to map 109 languages to a shared vector space. As the model is based LaBSE, it perform quite comparable on downstream tasks. Using this model becomes easy when you have sentence-transformers installed: Have a look at universal-sentence-encoder-cmlm/multilingual-base-br for the respective publication that describes this model.

license:apache-2.0
8,716
20

xlm-r-bert-base-nli-stsb-mean-tokens

license:apache-2.0
8,078
0

paraphrase-distilroberta-base-v2

license:apache-2.0
8,013
11

roberta-base-nli-stsb-mean-tokens

license:apache-2.0
7,813
0

xlm-r-distilroberta-base-paraphrase-v1

license:apache-2.0
7,550
1

gtr-t5-large

license:apache-2.0
6,980
39

all-MiniLM-L12-v1

license:apache-2.0
4,808
13

msmarco-roberta-base-v2

license:apache-2.0
4,569
1

xlm-r-100langs-bert-base-nli-stsb-mean-tokens

license:apache-2.0
4,272
8

all-MiniLM-L6-v1

license:apache-2.0
3,871
17

msmarco-distilbert-base-dot-prod-v3

license:apache-2.0
3,229
3

msmarco-distilbert-multilingual-en-de-v2-tmp-lng-aligned

license:apache-2.0
3,077
5

bert-large-nli-stsb-mean-tokens

license:apache-2.0
2,861
3

distilbert-multilingual-nli-stsb-quora-ranking

license:apache-2.0
2,614
9

quora-distilbert-multilingual

license:apache-2.0
2,515
7

facebook-dpr-ctx_encoder-multiset-base

license:apache-2.0
2,363
5

msmarco-distilbert-base-v3

license:apache-2.0
2,064
4

nli-roberta-base-v2

license:apache-2.0
2,012
1

msmarco-roberta-base-v3

license:apache-2.0
1,644
0

msmarco-MiniLM-L12-v3

license:apache-2.0
1,572
25

bert-base-nli-stsb-mean-tokens

license:apache-2.0
1,323
2

sentence-t5-xxl

license:apache-2.0
1,272
34

msmarco-roberta-base-ance-firstp

license:apache-2.0
1,091
4

stsb-bert-base

license:apache-2.0
1,063
1

nli-roberta-large

license:apache-2.0
1,054
0

msmarco-distilroberta-base-v2

license:apache-2.0
867
3

facebook-dpr-question_encoder-multiset-base

license:apache-2.0
780
1

roberta-large-nli-mean-tokens

license:apache-2.0
738
0

stsb-distilbert-base

license:apache-2.0
662
6

Gtr T5 Xl

This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space. The model was specifically trained for the task of sematic search. This model was converted from the Tensorflow model gtr-xl-1 to PyTorch. When using this model, have a look at the publication: Large Dual Encoders Are Generalizable Retrievers. The tfhub model and this PyTorch model can produce slightly different embeddings, however, when run on the same benchmarks, they produce identical results. The model uses only the encoder from a T5-3B model. The weights are stored in FP16. Using this model becomes easy when you have sentence-transformers installed: The model requires sentence-transformers version 2.2.0 or newer. If you find this model helpful, please cite the respective publication: Large Dual Encoders Are Generalizable Retrievers

license:apache-2.0
655
17

nq-distilbert-base-v1

license:apache-2.0
599
1

stsb-distilroberta-base-v2

license:apache-2.0
561
2

bert-large-nli-mean-tokens

license:apache-2.0
489
0

nli-bert-large

license:apache-2.0
457
1

roberta-base-nli-mean-tokens

license:apache-2.0
429
0

bert-base-nli-max-tokens

license:apache-2.0
410
3

msmarco-distilbert-base-v2

license:apache-2.0
309
1

facebook-dpr-question_encoder-single-nq-base

license:apache-2.0
301
2

nli-bert-base-cls-pooling

license:apache-2.0
277
0

distilroberta-base-msmarco-v1

license:apache-2.0
269
1

msmarco-bert-co-condensor

license:apache-2.0
253
3

distilbert-base-nli-stsb-quora-ranking

license:apache-2.0
243
0

facebook-dpr-ctx_encoder-single-nq-base

license:apache-2.0
165
0

stsb-bert-large

license:apache-2.0
164
1

bert-base-nli-cls-token

license:apache-2.0
161
2

gtr-t5-xxl

license:apache-2.0
147
27

paraphrase-albert-base-v2

license:apache-2.0
137
5

nli-roberta-base

license:apache-2.0
136
1

bert-large-nli-cls-token

license:apache-2.0
117
0

msmarco-distilbert-multilingual-en-de-v2-tmp-trained-scratch

license:apache-2.0
82
2

distilroberta-base-paraphrase-v1

license:apache-2.0
75
0

quora-distilbert-base

license:apache-2.0
66
1

nli-bert-base

license:apache-2.0
60
1

xlm-r-large-en-ko-nli-ststb

license:apache-2.0
56
0

nli-distilbert-base

license:apache-2.0
48
0

distilroberta-base-msmarco-v2

license:apache-2.0
36
0

xlm-r-base-en-ko-nli-ststb

license:apache-2.0
30
1

nli-bert-base-max-pooling

license:apache-2.0
29
0

nli-distilbert-base-max-pooling

license:apache-2.0
26
0

nli-bert-large-cls-pooling

license:apache-2.0
25
0

xlm-r-100langs-bert-base-nli-mean-tokens

license:apache-2.0
22
0

nli-bert-large-max-pooling

license:apache-2.0
21
1

distilbert-base-nli-max-tokens

license:apache-2.0
17
0

bert-base-wikipedia-sections-mean-tokens

license:apache-2.0
6
0

xlm-r-bert-base-nli-mean-tokens

license:apache-2.0
6
0

clip-ViT-B-32

0
138

clip-ViT-L-14

0
81

static-similarity-mrl-multilingual-v1

license:apache-2.0
0
70

static-retrieval-mrl-en-v1

license:apache-2.0
0
51

average_word_embeddings_glove.6B.300d

NaNK
license:apache-2.0
0
11

clip-ViT-B-16

0
6

average_word_embeddings_komninos

license:apache-2.0
0
4