NeuML

67 models • 4 total models in database

Sort by:

pubmedbert-base-embeddings

--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers base_model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext language: en license: apache-2.0 ---

license:apache-2.0

167,291

153

glove-6B

This model is an export of these GloVe-6B English Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Given that pre-trained embeddings models can get quite large, there is also a SQLite version that lazily loads vectors.

NaNK

—

4,382

colbert-bert-tiny

This is a ColBERT model finetuned from google/bertuncasedL-2H-128A-2 on the msmarco-bm25 dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is primarily designed for unit tests in limited compute environments such as GitHub Actions. But it does work to an extent for basic use cases.

NaNK

license:apache-2.0

562

language-id-quantized

This model is an export of this FastText Language Identification model for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Language detection is an important task and identification with n-gram models is an efficient and highly accurate way to do it. This model is a quantized version of the base language id model. It's using 2x256 Product Quantization like the original quantized model from FastText. This shrinks this model down to 4MB with only a minor hit on accuracy.

license:cc-by-sa-3.0

420

colbert-muvera-micro

license:apache-2.0

337

pubmedbert-base-embeddings-2M

license:apache-2.0

248

Bert Hash Nano

license:apache-2.0

221

pubmedbert-base-colbert

license:apache-2.0

210

word2vec-quantized

license:apache-2.0

209

t5-small-txtsql

license:apache-2.0

187

glove-2024-dolma

This model is an export of the new GloVe 2024 Dolma Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Given that pre-trained embeddings models can get quite large, there is also a SQLite version that lazily loads vectors.

—

152

pylate-bert-tiny

This is a PyLate model finetuned from google/bertuncasedL-2H-128A-2 on the msmarco-bm25 dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is primarily designed for unit tests in limited compute environments such as GitHub Actions. But it does work to an extent for basic use cases.

NaNK

license:apache-2.0

148

pubmedbert-base-splade

license:apache-2.0

145

txtai-wikipedia

license:cc-by-sa-3.0

110

gliner-bert-tiny

GLiNER model using BERT Tiny as the base model with urchade/synthetic-pii-ner-mistral-v1 as the training dataset. This model is primarily designed for unit tests in limited compute environments such as GitHub Actions. But it does work to an extent for basic use cases.

license:apache-2.0

107

tiny-random-qwen2vl

—

107

bert-hash-femto

This is a set of 3 Nano BERT models with a modified embeddings layer. The embeddings layer is the same BERT vocabulary (30,522 tokens) projected to a smaller dimensional space then re-encoded to the hidden size. This method is inspired by MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings. The number of projections is like a hash. Setting the projections parameter to 5 is like generating a 160-bit hash (5 x float32) for each token. That hash is then projected to the hidden size. This significantly reduces the number of parameters necessary for token embeddings. Standard token embeddings: - 30,522 (vocab size) x 768 (hidden size) = 23,440,896 parameters - 23,440,896 x 4 (float32) = 93,763,584 bytes Hash token embeddings: - 30,522 (vocab size) x 5 (hash buckets) + 5 x 768 (projection matrix)= 156,450 parameters - 156,450 x 4 (float32) = 625,800 bytes These models are pre-trained on the same training corpus as BERT (with a copy of Wikipedia from 2025) as recommended in the paper Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Below is a subset of GLUE scores on the dev set using the script provided by Hugging Face Transformers with the following parameters. | Model | Parameters | MNLI (acc m/mm) | MRPC (f1/acc) | SST-2 (acc) | | ----- | ---------- | --------------- | ---------------- | ----------- | | baseline (bert-tiny) | 4.4M | 0.7114 / 0.7161 | 0.8318 / 0.7353 | 0.8222 | | bert-hash-femto | 0.243M | 0.5697 / 0.5750 | 0.8122 / 0.6838 | 0.7821 | | bert-hash-pico | 0.448M | 0.6228 / 0.6363 | 0.8205 / 0.7083 | 0.7878 | | bert-hash-nano | 0.969M | 0.6565 / 0.6670 | 0.8172 / 0.7083 | 0.8131 | These models can be loaded using Hugging Face Transformers as follows. Note that given that this is a custom architecture, `trustremotecode` needs to be set. Training your own Nano model is simple. All you need is a Hugging Face dataset and the code below using txtai. This model demonstrates that smaller models can still be productive models. The hope is that this work opens the door to many in building small encoder models that pack a punch. Models can be trained in a matter of hours using consumer GPUs. Imagine more specialized models like this for medical, legal, science and more. Read more about this model and how it was built in this article.

license:apache-2.0

bert-small-cord19-squad2

—

language-id

license:cc-by-sa-3.0

colbert-muvera-femto

This is a PyLate model finetuned from neuml/bert-hash-femto on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 50-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding. This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). Note: txtai 9.0+ is required for late interaction model support Late interaction models excel as reranker pipelines. Alternatively, the model can be loaded with PyLate. The following table shows a subset of BEIR scored with the txtai benchmarks script. Scores reported are `ndcg@10` and grouped into the following three categories. | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 | | ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 | | ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 | | ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 | | ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 | MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 | | ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 | | ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 | | ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 | | ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 | | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 | | ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 | | ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 | | ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 | | ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 | Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts. As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more. This model is only 250K parameters with a file size of 950K. Keeping that in mind, it's surprising how decent the scores are! Nano BEIR Dataset: `NanoBEIRmean` Evaluated with pylate.evaluation.nanobeirevaluator.NanoBEIREvaluator | Metric | Value | |:--------------------|:-----------| | MaxSimaccuracy@1 | 0.4318 | | MaxSimaccuracy@3 | 0.5753 | | MaxSimaccuracy@5 | 0.64 | | MaxSimaccuracy@10 | 0.7062 | | MaxSimprecision@1 | 0.4318 | | MaxSimprecision@3 | 0.2655 | | MaxSimprecision@5 | 0.215 | | MaxSimprecision@10 | 0.149 | | MaxSimrecall@1 | 0.2379 | | MaxSimrecall@3 | 0.3485 | | MaxSimrecall@5 | 0.4115 | | MaxSimrecall@10 | 0.4745 | | MaxSimndcg@10 | 0.4495 | | MaxSimmrr@10 | 0.5194 | | MaxSimmap@100 | 0.3725 | - `evalstrategy`: steps - `perdevicetrainbatchsize`: 32 - `learningrate`: 0.0003 - `numtrainepochs`: 1 - `warmupratio`: 0.05 - `fp16`: True - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 32 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 0.0003 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.05 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `project`: huggingface - `trackiospaceid`: trackio - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: no - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: True - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: proportional Framework Versions - Python: 3.10.18 - Sentence Transformers: 4.0.2 - PyLate: 1.3.2 - Transformers: 4.57.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.1.1 - Tokenizers: 0.22.1

license:apache-2.0

word2vec

license:apache-2.0

txtai-intro

license:apache-2.0

colbert-muvera-nano

This is a PyLate model finetuned from neuml/bert-hash-nano on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding. This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). Note: txtai 9.0+ is required for late interaction model support Late interaction models excel as reranker pipelines. Alternatively, the model can be loaded with PyLate. The following table shows a subset of BEIR scored with the txtai benchmarks script. Scores reported are `ndcg@10` and grouped into the following three categories. | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 | | ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 | | ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 | | ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 | | ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 | MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 | | ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 | | ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 | | ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 | | ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 | | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 | | ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 | | ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 | | ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 | | ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 | Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts. As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more. This model packs a punch into 950K parameters. It's the same architecture as the 4M parameter model with the modified embeddings layer taking the parameter county down. It even beats the original ColBERT v2 model on a couple of the benchmarks. Nano BEIR Dataset: `NanoBEIRmean` Evaluated with pylate.evaluation.nanobeirevaluator.NanoBEIREvaluator | Metric | Value | |:--------------------|:-----------| | MaxSimaccuracy@1 | 0.5272 | | MaxSimaccuracy@3 | 0.6722 | | MaxSimaccuracy@5 | 0.7446 | | MaxSimaccuracy@10 | 0.8046 | | MaxSimprecision@1 | 0.5272 | | MaxSimprecision@3 | 0.317 | | MaxSimprecision@5 | 0.2509 | | MaxSimprecision@10 | 0.1745 | | MaxSimrecall@1 | 0.3102 | | MaxSimrecall@3 | 0.4296 | | MaxSimrecall@5 | 0.4991 | | MaxSimrecall@10 | 0.5698 | | MaxSimndcg@10 | 0.5479 | | MaxSimmrr@10 | 0.6191 | | MaxSimmap@100 | 0.4704 | - `evalstrategy`: steps - `perdevicetrainbatchsize`: 32 - `learningrate`: 0.0003 - `numtrainepochs`: 1 - `warmupratio`: 0.05 - `fp16`: True - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 32 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 0.0003 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.05 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `project`: huggingface - `trackiospaceid`: trackio - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: no - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: True - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: proportional Framework Versions - Python: 3.10.18 - Sentence Transformers: 4.0.2 - PyLate: 1.3.2 - Transformers: 4.57.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.1.1 - Tokenizers: 0.22.1

license:apache-2.0

bert-small-cord19qa

—

pubmedbert-base-embeddings-8M

license:apache-2.0

fasttext-quantized

license:cc-by-sa-3.0

colbert-muvera-small

license:apache-2.0

pubmedbert-base-embeddings-1M

NaNK

license:apache-2.0

txtai-wikipedia-slim

license:cc-by-sa-3.0

pubmedbert-base-embeddings-100K

NaNK

license:apache-2.0

glove-2024-wikigiga-quantized

This model is an export of the new GloVe 2024 WikiGiga Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. This model is a quantized version of the base model. It's using 10x256 Product Quantization.

—

bert-hash-pico

license:apache-2.0

colbert-muvera-pico

This is a PyLate model finetuned from neuml/bert-hash-pico on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 80-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding. This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). Note: txtai 9.0+ is required for late interaction model support Late interaction models excel as reranker pipelines. Alternatively, the model can be loaded with PyLate. The following table shows a subset of BEIR scored with the txtai benchmarks script. Scores reported are `ndcg@10` and grouped into the following three categories. | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 | | ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 | | ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 | | ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 | | ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 | MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 | | ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 | | ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 | | ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 | | ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 | | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 | | ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 | | ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 | | ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 | | ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 | Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts. As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more. At 450K parameters, this model does shockingly well! It's not too far off from the baseline 4M parameter model at 1/10th the size. It's also not too far off from the original ColBERT v2 model, which has 110M parameters. Nano BEIR Dataset: `NanoBEIRmean` Evaluated with pylate.evaluation.nanobeirevaluator.NanoBEIREvaluator | Metric | Value | |:--------------------|:-----------| | MaxSimaccuracy@1 | 0.4826 | | MaxSimaccuracy@3 | 0.6368 | | MaxSimaccuracy@5 | 0.7015 | | MaxSimaccuracy@10 | 0.7585 | | MaxSimprecision@1 | 0.4826 | | MaxSimprecision@3 | 0.2979 | | MaxSimprecision@5 | 0.2345 | | MaxSimprecision@10 | 0.1649 | | MaxSimrecall@1 | 0.2728 | | MaxSimrecall@3 | 0.4051 | | MaxSimrecall@5 | 0.4649 | | MaxSimrecall@10 | 0.532 | | MaxSimndcg@10 | 0.5069 | | MaxSimmrr@10 | 0.5733 | | MaxSimmap@100 | 0.4287 | - `evalstrategy`: steps - `perdevicetrainbatchsize`: 32 - `learningrate`: 0.0003 - `numtrainepochs`: 1 - `warmupratio`: 0.05 - `fp16`: True - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 32 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 0.0003 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.05 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `project`: huggingface - `trackiospaceid`: trackio - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: no - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: True - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: proportional Framework Versions - Python: 3.10.18 - Sentence Transformers: 4.0.2 - PyLate: 1.3.2 - Transformers: 4.57.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.1.1 - Tokenizers: 0.22.1

license:apache-2.0

txtai-arxiv

—

ljspeech-vits-onnx

license:apache-2.0

biomedbert-hash-nano-colbert

license:apache-2.0

biomedbert-hash-nano-embeddings

license:apache-2.0

glove-2024-wikigiga

This model is an export of the new GloVe 2024 WikiGiga Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Given that pre-trained embeddings models can get quite large, there is also a SQLite version that lazily loads vectors.

—

pubmedbert-base-embeddings-500K

NaNK

license:apache-2.0

vctk-vits-onnx

license:apache-2.0

bert-small-cord19

—

glove-2024-dolma-quantized

This model is an export of the new GloVe 2024 Dolma Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. This model is a quantized version of the base model. It's using 10x256 Product Quantization.

—

Llama-3.1_OpenScholar-8B-AWQ

This is Llama-3.1OpenScholar-8B with AWQ Quantization applied using the following code.

NaNK

llama

txtai-hfposts

license:apache-2.0

txtai-neuml-linkedin

license:apache-2.0

txtai-astronomy

license:cc-by-sa-3.0

t5-small-bashsql

license:apache-2.0

fasttext

license:cc-by-sa-3.0

kokoro-int8-onnx

license:apache-2.0

txtchat-personas

license:apache-2.0

kokoro-fp16-onnx

license:apache-2.0

kokoro-base-onnx

license:apache-2.0

Txtai Speecht5 Onnx

Fine-tuned version of SpeechT5 TTS exported to ONNX. This model was exported to ONNX using the Optimum library. txtai has a built in Text to Speech (TTS) pipeline that makes using this model easy. This model was fine-tuned using the code in this Hugging Face article and a custom set of WAV files. The ONNX export uses the following code, which requires installing `optimum`. When no speaker argument is passed in, the default speaker embeddings are used. The defaults speaker is David Mezzetti, the primary developer of txtai. It's possible to build custom speaker embeddings as shown below. Fine-tuning the model with a new voice leads to the best results but zero-shot speaker embeddings are OK in some cases. The following code requires installing `torchaudio` and `speechbrain`. Speaker embeddings from the original SpeechT5 TTS training set are supported. See the README for more.

license:apache-2.0

domain-labeler

license:apache-2.0

bert-tiny-sts-last-pooling

NaNK

license:apache-2.0

bert-hash-nano-embeddings

license:apache-2.0

bert-hash-pico-embeddings

license:apache-2.0

bert-hash-femto-embeddings

license:apache-2.0

bert-tiny-prompts

—

biomedbert-base-reranker

license:apache-2.0

biomedbert-hash-nano

license:apache-2.0

biomedbert-base-colbert

license:apache-2.0

txtai-apps

license:apache-2.0

NeuML

pubmedbert-base-embeddings

glove-6B

bioclinical-modernbert-base-embeddings

pubmedbert-base-embeddings-matryoshka

glove-6B-quantized

ljspeech-jets-onnx

colbert-bert-tiny

language-id-quantized

colbert-muvera-micro

pubmedbert-base-embeddings-2M

Bert Hash Nano

pubmedbert-base-colbert

word2vec-quantized

t5-small-txtsql

glove-2024-dolma

pylate-bert-tiny

pubmedbert-base-splade

txtai-wikipedia

gliner-bert-tiny

tiny-random-qwen2vl

bert-hash-femto

bert-small-cord19-squad2

language-id

colbert-muvera-femto

word2vec

txtai-intro

colbert-muvera-nano

bert-small-cord19qa

pubmedbert-base-embeddings-8M

fasttext-quantized

colbert-muvera-small

pubmedbert-base-embeddings-1M

txtai-wikipedia-slim

pubmedbert-base-embeddings-100K

glove-2024-wikigiga-quantized

bert-hash-pico

colbert-muvera-pico

txtai-arxiv

ljspeech-vits-onnx

biomedbert-hash-nano-colbert

biomedbert-hash-nano-embeddings

glove-2024-wikigiga

pubmedbert-base-embeddings-500K

vctk-vits-onnx

bert-small-cord19

glove-2024-dolma-quantized

Llama-3.1_OpenScholar-8B-AWQ

txtai-hfposts

txtai-neuml-linkedin

txtai-astronomy

t5-small-bashsql

fasttext

kokoro-int8-onnx

txtchat-personas

kokoro-fp16-onnx

kokoro-base-onnx

Txtai Speecht5 Onnx

domain-labeler

bert-tiny-sts-last-pooling

bert-hash-nano-embeddings

bert-hash-pico-embeddings

bert-hash-femto-embeddings

bert-tiny-prompts

biomedbert-base-reranker

biomedbert-hash-nano

biomedbert-base-colbert

txtai-apps