NeuML

67 models • 4 total models in database
Sort by:

pubmedbert-base-embeddings

--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers base_model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext language: en license: apache-2.0 ---

license:apache-2.0
167,291
153

glove-6B

This model is an export of these GloVe-6B English Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Given that pre-trained embeddings models can get quite large, there is also a SQLite version that lazily loads vectors.

NaNK
4,382
2

bioclinical-modernbert-base-embeddings

license:apache-2.0
3,803
9

pubmedbert-base-embeddings-matryoshka

license:apache-2.0
2,696
23

glove-6B-quantized

NaNK
2,116
3

ljspeech-jets-onnx

license:apache-2.0
663
25

colbert-bert-tiny

This is a ColBERT model finetuned from google/bertuncasedL-2H-128A-2 on the msmarco-bm25 dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is primarily designed for unit tests in limited compute environments such as GitHub Actions. But it does work to an extent for basic use cases.

NaNK
license:apache-2.0
562
2

language-id-quantized

This model is an export of this FastText Language Identification model for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Language detection is an important task and identification with n-gram models is an efficient and highly accurate way to do it. This model is a quantized version of the base language id model. It's using 2x256 Product Quantization like the original quantized model from FastText. This shrinks this model down to 4MB with only a minor hit on accuracy.

license:cc-by-sa-3.0
420
1

colbert-muvera-micro

license:apache-2.0
337
25

pubmedbert-base-embeddings-2M

license:apache-2.0
248
3

Bert Hash Nano

license:apache-2.0
221
11

pubmedbert-base-colbert

license:apache-2.0
210
6

word2vec-quantized

license:apache-2.0
209
1

t5-small-txtsql

license:apache-2.0
187
8

glove-2024-dolma

This model is an export of the new GloVe 2024 Dolma Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Given that pre-trained embeddings models can get quite large, there is also a SQLite version that lazily loads vectors.

152
3

pylate-bert-tiny

This is a PyLate model finetuned from google/bertuncasedL-2H-128A-2 on the msmarco-bm25 dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is primarily designed for unit tests in limited compute environments such as GitHub Actions. But it does work to an extent for basic use cases.

NaNK
license:apache-2.0
148
3

pubmedbert-base-splade

license:apache-2.0
145
5

txtai-wikipedia

license:cc-by-sa-3.0
110
75

gliner-bert-tiny

GLiNER model using BERT Tiny as the base model with urchade/synthetic-pii-ner-mistral-v1 as the training dataset. This model is primarily designed for unit tests in limited compute environments such as GitHub Actions. But it does work to an extent for basic use cases.

license:apache-2.0
107
1

tiny-random-qwen2vl

107
1

bert-hash-femto

This is a set of 3 Nano BERT models with a modified embeddings layer. The embeddings layer is the same BERT vocabulary (30,522 tokens) projected to a smaller dimensional space then re-encoded to the hidden size. This method is inspired by MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings. The number of projections is like a hash. Setting the projections parameter to 5 is like generating a 160-bit hash (5 x float32) for each token. That hash is then projected to the hidden size. This significantly reduces the number of parameters necessary for token embeddings. Standard token embeddings: - 30,522 (vocab size) x 768 (hidden size) = 23,440,896 parameters - 23,440,896 x 4 (float32) = 93,763,584 bytes Hash token embeddings: - 30,522 (vocab size) x 5 (hash buckets) + 5 x 768 (projection matrix)= 156,450 parameters - 156,450 x 4 (float32) = 625,800 bytes These models are pre-trained on the same training corpus as BERT (with a copy of Wikipedia from 2025) as recommended in the paper Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Below is a subset of GLUE scores on the dev set using the script provided by Hugging Face Transformers with the following parameters. | Model | Parameters | MNLI (acc m/mm) | MRPC (f1/acc) | SST-2 (acc) | | ----- | ---------- | --------------- | ---------------- | ----------- | | baseline (bert-tiny) | 4.4M | 0.7114 / 0.7161 | 0.8318 / 0.7353 | 0.8222 | | bert-hash-femto | 0.243M | 0.5697 / 0.5750 | 0.8122 / 0.6838 | 0.7821 | | bert-hash-pico | 0.448M | 0.6228 / 0.6363 | 0.8205 / 0.7083 | 0.7878 | | bert-hash-nano | 0.969M | 0.6565 / 0.6670 | 0.8172 / 0.7083 | 0.8131 | These models can be loaded using Hugging Face Transformers as follows. Note that given that this is a custom architecture, `trustremotecode` needs to be set. Training your own Nano model is simple. All you need is a Hugging Face dataset and the code below using txtai. This model demonstrates that smaller models can still be productive models. The hope is that this work opens the door to many in building small encoder models that pack a punch. Models can be trained in a matter of hours using consumer GPUs. Imagine more specialized models like this for medical, legal, science and more. Read more about this model and how it was built in this article.

license:apache-2.0
99
10

bert-small-cord19-squad2

91
1

language-id

license:cc-by-sa-3.0
65
1

colbert-muvera-femto

This is a PyLate model finetuned from neuml/bert-hash-femto on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 50-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding. This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). Note: txtai 9.0+ is required for late interaction model support Late interaction models excel as reranker pipelines. Alternatively, the model can be loaded with PyLate. The following table shows a subset of BEIR scored with the txtai benchmarks script. Scores reported are `ndcg@10` and grouped into the following three categories. | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 | | ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 | | ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 | | ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 | | ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 | MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 | | ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 | | ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 | | ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 | | ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 | | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 | | ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 | | ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 | | ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 | | ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 | Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts. As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more. This model is only 250K parameters with a file size of 950K. Keeping that in mind, it's surprising how decent the scores are! Nano BEIR Dataset: `NanoBEIRmean` Evaluated with pylate.evaluation.nanobeirevaluator.NanoBEIREvaluator | Metric | Value | |:--------------------|:-----------| | MaxSimaccuracy@1 | 0.4318 | | MaxSimaccuracy@3 | 0.5753 | | MaxSimaccuracy@5 | 0.64 | | MaxSimaccuracy@10 | 0.7062 | | MaxSimprecision@1 | 0.4318 | | MaxSimprecision@3 | 0.2655 | | MaxSimprecision@5 | 0.215 | | MaxSimprecision@10 | 0.149 | | MaxSimrecall@1 | 0.2379 | | MaxSimrecall@3 | 0.3485 | | MaxSimrecall@5 | 0.4115 | | MaxSimrecall@10 | 0.4745 | | MaxSimndcg@10 | 0.4495 | | MaxSimmrr@10 | 0.5194 | | MaxSimmap@100 | 0.3725 | - `evalstrategy`: steps - `perdevicetrainbatchsize`: 32 - `learningrate`: 0.0003 - `numtrainepochs`: 1 - `warmupratio`: 0.05 - `fp16`: True - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 32 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 0.0003 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.05 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `project`: huggingface - `trackiospaceid`: trackio - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: no - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: True - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: proportional Framework Versions - Python: 3.10.18 - Sentence Transformers: 4.0.2 - PyLate: 1.3.2 - Transformers: 4.57.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.1.1 - Tokenizers: 0.22.1

license:apache-2.0
64
20

word2vec

license:apache-2.0
62
1

txtai-intro

license:apache-2.0
58
2

colbert-muvera-nano

This is a PyLate model finetuned from neuml/bert-hash-nano on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding. This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). Note: txtai 9.0+ is required for late interaction model support Late interaction models excel as reranker pipelines. Alternatively, the model can be loaded with PyLate. The following table shows a subset of BEIR scored with the txtai benchmarks script. Scores reported are `ndcg@10` and grouped into the following three categories. | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 | | ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 | | ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 | | ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 | | ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 | MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 | | ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 | | ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 | | ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 | | ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 | | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 | | ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 | | ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 | | ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 | | ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 | Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts. As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more. This model packs a punch into 950K parameters. It's the same architecture as the 4M parameter model with the modified embeddings layer taking the parameter county down. It even beats the original ColBERT v2 model on a couple of the benchmarks. Nano BEIR Dataset: `NanoBEIRmean` Evaluated with pylate.evaluation.nanobeirevaluator.NanoBEIREvaluator | Metric | Value | |:--------------------|:-----------| | MaxSimaccuracy@1 | 0.5272 | | MaxSimaccuracy@3 | 0.6722 | | MaxSimaccuracy@5 | 0.7446 | | MaxSimaccuracy@10 | 0.8046 | | MaxSimprecision@1 | 0.5272 | | MaxSimprecision@3 | 0.317 | | MaxSimprecision@5 | 0.2509 | | MaxSimprecision@10 | 0.1745 | | MaxSimrecall@1 | 0.3102 | | MaxSimrecall@3 | 0.4296 | | MaxSimrecall@5 | 0.4991 | | MaxSimrecall@10 | 0.5698 | | MaxSimndcg@10 | 0.5479 | | MaxSimmrr@10 | 0.6191 | | MaxSimmap@100 | 0.4704 | - `evalstrategy`: steps - `perdevicetrainbatchsize`: 32 - `learningrate`: 0.0003 - `numtrainepochs`: 1 - `warmupratio`: 0.05 - `fp16`: True - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 32 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 0.0003 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.05 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `project`: huggingface - `trackiospaceid`: trackio - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: no - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: True - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: proportional Framework Versions - Python: 3.10.18 - Sentence Transformers: 4.0.2 - PyLate: 1.3.2 - Transformers: 4.57.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.1.1 - Tokenizers: 0.22.1

license:apache-2.0
43
1

bert-small-cord19qa

42
2

pubmedbert-base-embeddings-8M

license:apache-2.0
28
8

fasttext-quantized

license:cc-by-sa-3.0
26
2

colbert-muvera-small

license:apache-2.0
25
10

pubmedbert-base-embeddings-1M

NaNK
license:apache-2.0
25
2

txtai-wikipedia-slim

license:cc-by-sa-3.0
23
5

pubmedbert-base-embeddings-100K

NaNK
license:apache-2.0
21
2

glove-2024-wikigiga-quantized

This model is an export of the new GloVe 2024 WikiGiga Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. This model is a quantized version of the base model. It's using 10x256 Product Quantization.

21
1

bert-hash-pico

This is a set of 3 Nano BERT models with a modified embeddings layer. The embeddings layer is the same BERT vocabulary (30,522 tokens) projected to a smaller dimensional space then re-encoded to the hidden size. This method is inspired by MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings. The number of projections is like a hash. Setting the projections parameter to 5 is like generating a 160-bit hash (5 x float32) for each token. That hash is then projected to the hidden size. This significantly reduces the number of parameters necessary for token embeddings. Standard token embeddings: - 30,522 (vocab size) x 768 (hidden size) = 23,440,896 parameters - 23,440,896 x 4 (float32) = 93,763,584 bytes Hash token embeddings: - 30,522 (vocab size) x 5 (hash buckets) + 5 x 768 (projection matrix)= 156,450 parameters - 156,450 x 4 (float32) = 625,800 bytes These models are pre-trained on the same training corpus as BERT (with a copy of Wikipedia from 2025) as recommended in the paper Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Below is a subset of GLUE scores on the dev set using the script provided by Hugging Face Transformers with the following parameters. | Model | Parameters | MNLI (acc m/mm) | MRPC (f1/acc) | SST-2 (acc) | | ----- | ---------- | --------------- | ---------------- | ----------- | | baseline (bert-tiny) | 4.4M | 0.7114 / 0.7161 | 0.8318 / 0.7353 | 0.8222 | | bert-hash-femto | 0.243M | 0.5697 / 0.5750 | 0.8122 / 0.6838 | 0.7821 | | bert-hash-pico | 0.448M | 0.6228 / 0.6363 | 0.8205 / 0.7083 | 0.7878 | | bert-hash-nano | 0.969M | 0.6565 / 0.6670 | 0.8172 / 0.7083 | 0.8131 | These models can be loaded using Hugging Face Transformers as follows. Note that given that this is a custom architecture, `trustremotecode` needs to be set. Training your own Nano model is simple. All you need is a Hugging Face dataset and the code below using txtai. This model demonstrates that smaller models can still be productive models. The hope is that this work opens the door to many in building small encoder models that pack a punch. Models can be trained in a matter of hours using consumer GPUs. Imagine more specialized models like this for medical, legal, science and more. Read more about this model and how it was built in this article.

license:apache-2.0
17
2

colbert-muvera-pico

This is a PyLate model finetuned from neuml/bert-hash-pico on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 80-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding. This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). Note: txtai 9.0+ is required for late interaction model support Late interaction models excel as reranker pipelines. Alternatively, the model can be loaded with PyLate. The following table shows a subset of BEIR scored with the txtai benchmarks script. Scores reported are `ndcg@10` and grouped into the following three categories. | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 | | ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 | | ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 | | ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 | | ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 | MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 | | ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 | | ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 | | ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 | | ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 | | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 | | ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 | | ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 | | ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 | | ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 | Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts. As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more. At 450K parameters, this model does shockingly well! It's not too far off from the baseline 4M parameter model at 1/10th the size. It's also not too far off from the original ColBERT v2 model, which has 110M parameters. Nano BEIR Dataset: `NanoBEIRmean` Evaluated with pylate.evaluation.nanobeirevaluator.NanoBEIREvaluator | Metric | Value | |:--------------------|:-----------| | MaxSimaccuracy@1 | 0.4826 | | MaxSimaccuracy@3 | 0.6368 | | MaxSimaccuracy@5 | 0.7015 | | MaxSimaccuracy@10 | 0.7585 | | MaxSimprecision@1 | 0.4826 | | MaxSimprecision@3 | 0.2979 | | MaxSimprecision@5 | 0.2345 | | MaxSimprecision@10 | 0.1649 | | MaxSimrecall@1 | 0.2728 | | MaxSimrecall@3 | 0.4051 | | MaxSimrecall@5 | 0.4649 | | MaxSimrecall@10 | 0.532 | | MaxSimndcg@10 | 0.5069 | | MaxSimmrr@10 | 0.5733 | | MaxSimmap@100 | 0.4287 | - `evalstrategy`: steps - `perdevicetrainbatchsize`: 32 - `learningrate`: 0.0003 - `numtrainepochs`: 1 - `warmupratio`: 0.05 - `fp16`: True - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 32 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 0.0003 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.05 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `project`: huggingface - `trackiospaceid`: trackio - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: no - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: True - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: proportional Framework Versions - Python: 3.10.18 - Sentence Transformers: 4.0.2 - PyLate: 1.3.2 - Transformers: 4.57.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.1.1 - Tokenizers: 0.22.1

license:apache-2.0
13
1

txtai-arxiv

11
19

ljspeech-vits-onnx

license:apache-2.0
10
11

biomedbert-hash-nano-colbert

license:apache-2.0
10
1

biomedbert-hash-nano-embeddings

license:apache-2.0
9
2

glove-2024-wikigiga

This model is an export of the new GloVe 2024 WikiGiga Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Given that pre-trained embeddings models can get quite large, there is also a SQLite version that lazily loads vectors.

9
1

pubmedbert-base-embeddings-500K

NaNK
license:apache-2.0
8
2

vctk-vits-onnx

license:apache-2.0
6
3

bert-small-cord19

6
1

glove-2024-dolma-quantized

This model is an export of the new GloVe 2024 Dolma Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. This model is a quantized version of the base model. It's using 10x256 Product Quantization.

5
1

Llama-3.1_OpenScholar-8B-AWQ

This is Llama-3.1OpenScholar-8B with AWQ Quantization applied using the following code.

NaNK
llama
4
3

txtai-hfposts

license:apache-2.0
3
3

txtai-neuml-linkedin

license:apache-2.0
2
1

txtai-astronomy

license:cc-by-sa-3.0
2
1

t5-small-bashsql

license:apache-2.0
1
1

fasttext

license:cc-by-sa-3.0
1
1

kokoro-int8-onnx

license:apache-2.0
0
10

txtchat-personas

license:apache-2.0
0
6

kokoro-fp16-onnx

license:apache-2.0
0
4

kokoro-base-onnx

license:apache-2.0
0
3

Txtai Speecht5 Onnx

Fine-tuned version of SpeechT5 TTS exported to ONNX. This model was exported to ONNX using the Optimum library. txtai has a built in Text to Speech (TTS) pipeline that makes using this model easy. This model was fine-tuned using the code in this Hugging Face article and a custom set of WAV files. The ONNX export uses the following code, which requires installing `optimum`. When no speaker argument is passed in, the default speaker embeddings are used. The defaults speaker is David Mezzetti, the primary developer of txtai. It's possible to build custom speaker embeddings as shown below. Fine-tuning the model with a new voice leads to the best results but zero-shot speaker embeddings are OK in some cases. The following code requires installing `torchaudio` and `speechbrain`. Speaker embeddings from the original SpeechT5 TTS training set are supported. See the README for more.

license:apache-2.0
0
2

domain-labeler

license:apache-2.0
0
1

bert-tiny-sts-last-pooling

NaNK
license:apache-2.0
0
1

bert-hash-nano-embeddings

license:apache-2.0
0
1

bert-hash-pico-embeddings

license:apache-2.0
0
1

bert-hash-femto-embeddings

license:apache-2.0
0
1

bert-tiny-prompts

0
1

biomedbert-base-reranker

license:apache-2.0
0
1

biomedbert-hash-nano

license:apache-2.0
0
1

biomedbert-base-colbert

license:apache-2.0
0
1

txtai-apps

license:apache-2.0
0
1