NeuML
pubmedbert-base-embeddings
--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers base_model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext language: en license: apache-2.0 ---
glove-6B
This model is an export of these GloVe-6B English Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Given that pre-trained embeddings models can get quite large, there is also a SQLite version that lazily loads vectors.
bioclinical-modernbert-base-embeddings
pubmedbert-base-embeddings-matryoshka
glove-6B-quantized
ljspeech-jets-onnx
colbert-bert-tiny
This is a ColBERT model finetuned from google/bertuncasedL-2H-128A-2 on the msmarco-bm25 dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is primarily designed for unit tests in limited compute environments such as GitHub Actions. But it does work to an extent for basic use cases.
language-id-quantized
This model is an export of this FastText Language Identification model for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Language detection is an important task and identification with n-gram models is an efficient and highly accurate way to do it. This model is a quantized version of the base language id model. It's using 2x256 Product Quantization like the original quantized model from FastText. This shrinks this model down to 4MB with only a minor hit on accuracy.
colbert-muvera-micro
pubmedbert-base-embeddings-2M
Bert Hash Nano
pubmedbert-base-colbert
word2vec-quantized
t5-small-txtsql
glove-2024-dolma
This model is an export of the new GloVe 2024 Dolma Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Given that pre-trained embeddings models can get quite large, there is also a SQLite version that lazily loads vectors.
pylate-bert-tiny
This is a PyLate model finetuned from google/bertuncasedL-2H-128A-2 on the msmarco-bm25 dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is primarily designed for unit tests in limited compute environments such as GitHub Actions. But it does work to an extent for basic use cases.
pubmedbert-base-splade
txtai-wikipedia
gliner-bert-tiny
GLiNER model using BERT Tiny as the base model with urchade/synthetic-pii-ner-mistral-v1 as the training dataset. This model is primarily designed for unit tests in limited compute environments such as GitHub Actions. But it does work to an extent for basic use cases.
tiny-random-qwen2vl
bert-hash-femto
This is a set of 3 Nano BERT models with a modified embeddings layer. The embeddings layer is the same BERT vocabulary (30,522 tokens) projected to a smaller dimensional space then re-encoded to the hidden size. This method is inspired by MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings. The number of projections is like a hash. Setting the projections parameter to 5 is like generating a 160-bit hash (5 x float32) for each token. That hash is then projected to the hidden size. This significantly reduces the number of parameters necessary for token embeddings. Standard token embeddings: - 30,522 (vocab size) x 768 (hidden size) = 23,440,896 parameters - 23,440,896 x 4 (float32) = 93,763,584 bytes Hash token embeddings: - 30,522 (vocab size) x 5 (hash buckets) + 5 x 768 (projection matrix)= 156,450 parameters - 156,450 x 4 (float32) = 625,800 bytes These models are pre-trained on the same training corpus as BERT (with a copy of Wikipedia from 2025) as recommended in the paper Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Below is a subset of GLUE scores on the dev set using the script provided by Hugging Face Transformers with the following parameters. | Model | Parameters | MNLI (acc m/mm) | MRPC (f1/acc) | SST-2 (acc) | | ----- | ---------- | --------------- | ---------------- | ----------- | | baseline (bert-tiny) | 4.4M | 0.7114 / 0.7161 | 0.8318 / 0.7353 | 0.8222 | | bert-hash-femto | 0.243M | 0.5697 / 0.5750 | 0.8122 / 0.6838 | 0.7821 | | bert-hash-pico | 0.448M | 0.6228 / 0.6363 | 0.8205 / 0.7083 | 0.7878 | | bert-hash-nano | 0.969M | 0.6565 / 0.6670 | 0.8172 / 0.7083 | 0.8131 | These models can be loaded using Hugging Face Transformers as follows. Note that given that this is a custom architecture, `trustremotecode` needs to be set. Training your own Nano model is simple. All you need is a Hugging Face dataset and the code below using txtai. This model demonstrates that smaller models can still be productive models. The hope is that this work opens the door to many in building small encoder models that pack a punch. Models can be trained in a matter of hours using consumer GPUs. Imagine more specialized models like this for medical, legal, science and more. Read more about this model and how it was built in this article.
bert-small-cord19-squad2
language-id
colbert-muvera-femto
This is a PyLate model finetuned from neuml/bert-hash-femto on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 50-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding. This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). Note: txtai 9.0+ is required for late interaction model support Late interaction models excel as reranker pipelines. Alternatively, the model can be loaded with PyLate. The following table shows a subset of BEIR scored with the txtai benchmarks script. Scores reported are `ndcg@10` and grouped into the following three categories. | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 | | ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 | | ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 | | ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 | | ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 | MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 | | ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 | | ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 | | ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 | | ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 | | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 | | ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 | | ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 | | ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 | | ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 | Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts. As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more. This model is only 250K parameters with a file size of 950K. Keeping that in mind, it's surprising how decent the scores are! Nano BEIR Dataset: `NanoBEIRmean` Evaluated with pylate.evaluation.nanobeirevaluator.NanoBEIREvaluator | Metric | Value | |:--------------------|:-----------| | MaxSimaccuracy@1 | 0.4318 | | MaxSimaccuracy@3 | 0.5753 | | MaxSimaccuracy@5 | 0.64 | | MaxSimaccuracy@10 | 0.7062 | | MaxSimprecision@1 | 0.4318 | | MaxSimprecision@3 | 0.2655 | | MaxSimprecision@5 | 0.215 | | MaxSimprecision@10 | 0.149 | | MaxSimrecall@1 | 0.2379 | | MaxSimrecall@3 | 0.3485 | | MaxSimrecall@5 | 0.4115 | | MaxSimrecall@10 | 0.4745 | | MaxSimndcg@10 | 0.4495 | | MaxSimmrr@10 | 0.5194 | | MaxSimmap@100 | 0.3725 | - `evalstrategy`: steps - `perdevicetrainbatchsize`: 32 - `learningrate`: 0.0003 - `numtrainepochs`: 1 - `warmupratio`: 0.05 - `fp16`: True - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 32 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 0.0003 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.05 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `project`: huggingface - `trackiospaceid`: trackio - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: no - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: True - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: proportional Framework Versions - Python: 3.10.18 - Sentence Transformers: 4.0.2 - PyLate: 1.3.2 - Transformers: 4.57.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.1.1 - Tokenizers: 0.22.1
word2vec
txtai-intro
colbert-muvera-nano
This is a PyLate model finetuned from neuml/bert-hash-nano on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding. This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). Note: txtai 9.0+ is required for late interaction model support Late interaction models excel as reranker pipelines. Alternatively, the model can be loaded with PyLate. The following table shows a subset of BEIR scored with the txtai benchmarks script. Scores reported are `ndcg@10` and grouped into the following three categories. | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 | | ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 | | ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 | | ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 | | ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 | MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 | | ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 | | ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 | | ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 | | ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 | | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 | | ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 | | ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 | | ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 | | ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 | Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts. As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more. This model packs a punch into 950K parameters. It's the same architecture as the 4M parameter model with the modified embeddings layer taking the parameter county down. It even beats the original ColBERT v2 model on a couple of the benchmarks. Nano BEIR Dataset: `NanoBEIRmean` Evaluated with pylate.evaluation.nanobeirevaluator.NanoBEIREvaluator | Metric | Value | |:--------------------|:-----------| | MaxSimaccuracy@1 | 0.5272 | | MaxSimaccuracy@3 | 0.6722 | | MaxSimaccuracy@5 | 0.7446 | | MaxSimaccuracy@10 | 0.8046 | | MaxSimprecision@1 | 0.5272 | | MaxSimprecision@3 | 0.317 | | MaxSimprecision@5 | 0.2509 | | MaxSimprecision@10 | 0.1745 | | MaxSimrecall@1 | 0.3102 | | MaxSimrecall@3 | 0.4296 | | MaxSimrecall@5 | 0.4991 | | MaxSimrecall@10 | 0.5698 | | MaxSimndcg@10 | 0.5479 | | MaxSimmrr@10 | 0.6191 | | MaxSimmap@100 | 0.4704 | - `evalstrategy`: steps - `perdevicetrainbatchsize`: 32 - `learningrate`: 0.0003 - `numtrainepochs`: 1 - `warmupratio`: 0.05 - `fp16`: True - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 32 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 0.0003 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.05 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `project`: huggingface - `trackiospaceid`: trackio - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: no - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: True - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: proportional Framework Versions - Python: 3.10.18 - Sentence Transformers: 4.0.2 - PyLate: 1.3.2 - Transformers: 4.57.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.1.1 - Tokenizers: 0.22.1
bert-small-cord19qa
pubmedbert-base-embeddings-8M
fasttext-quantized
colbert-muvera-small
pubmedbert-base-embeddings-1M
txtai-wikipedia-slim
pubmedbert-base-embeddings-100K
glove-2024-wikigiga-quantized
This model is an export of the new GloVe 2024 WikiGiga Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. This model is a quantized version of the base model. It's using 10x256 Product Quantization.
bert-hash-pico
This is a set of 3 Nano BERT models with a modified embeddings layer. The embeddings layer is the same BERT vocabulary (30,522 tokens) projected to a smaller dimensional space then re-encoded to the hidden size. This method is inspired by MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings. The number of projections is like a hash. Setting the projections parameter to 5 is like generating a 160-bit hash (5 x float32) for each token. That hash is then projected to the hidden size. This significantly reduces the number of parameters necessary for token embeddings. Standard token embeddings: - 30,522 (vocab size) x 768 (hidden size) = 23,440,896 parameters - 23,440,896 x 4 (float32) = 93,763,584 bytes Hash token embeddings: - 30,522 (vocab size) x 5 (hash buckets) + 5 x 768 (projection matrix)= 156,450 parameters - 156,450 x 4 (float32) = 625,800 bytes These models are pre-trained on the same training corpus as BERT (with a copy of Wikipedia from 2025) as recommended in the paper Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Below is a subset of GLUE scores on the dev set using the script provided by Hugging Face Transformers with the following parameters. | Model | Parameters | MNLI (acc m/mm) | MRPC (f1/acc) | SST-2 (acc) | | ----- | ---------- | --------------- | ---------------- | ----------- | | baseline (bert-tiny) | 4.4M | 0.7114 / 0.7161 | 0.8318 / 0.7353 | 0.8222 | | bert-hash-femto | 0.243M | 0.5697 / 0.5750 | 0.8122 / 0.6838 | 0.7821 | | bert-hash-pico | 0.448M | 0.6228 / 0.6363 | 0.8205 / 0.7083 | 0.7878 | | bert-hash-nano | 0.969M | 0.6565 / 0.6670 | 0.8172 / 0.7083 | 0.8131 | These models can be loaded using Hugging Face Transformers as follows. Note that given that this is a custom architecture, `trustremotecode` needs to be set. Training your own Nano model is simple. All you need is a Hugging Face dataset and the code below using txtai. This model demonstrates that smaller models can still be productive models. The hope is that this work opens the door to many in building small encoder models that pack a punch. Models can be trained in a matter of hours using consumer GPUs. Imagine more specialized models like this for medical, legal, science and more. Read more about this model and how it was built in this article.
colbert-muvera-pico
This is a PyLate model finetuned from neuml/bert-hash-pico on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 80-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding. This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG). Note: txtai 9.0+ is required for late interaction model support Late interaction models excel as reranker pipelines. Alternatively, the model can be loaded with PyLate. The following table shows a subset of BEIR scored with the txtai benchmarks script. Scores reported are `ndcg@10` and grouped into the following three categories. | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 | | ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 | | ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 | | ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 | | ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 | MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 | | ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 | | ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 | | ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 | | ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 | | Model | Parameters | NFCorpus | SciDocs | SciFact | Average | |:------------------|:-----------|:---------|:---------|:--------|:--------| | ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 | | ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 | | ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 | | ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 | | ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 | Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts. As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more. At 450K parameters, this model does shockingly well! It's not too far off from the baseline 4M parameter model at 1/10th the size. It's also not too far off from the original ColBERT v2 model, which has 110M parameters. Nano BEIR Dataset: `NanoBEIRmean` Evaluated with pylate.evaluation.nanobeirevaluator.NanoBEIREvaluator | Metric | Value | |:--------------------|:-----------| | MaxSimaccuracy@1 | 0.4826 | | MaxSimaccuracy@3 | 0.6368 | | MaxSimaccuracy@5 | 0.7015 | | MaxSimaccuracy@10 | 0.7585 | | MaxSimprecision@1 | 0.4826 | | MaxSimprecision@3 | 0.2979 | | MaxSimprecision@5 | 0.2345 | | MaxSimprecision@10 | 0.1649 | | MaxSimrecall@1 | 0.2728 | | MaxSimrecall@3 | 0.4051 | | MaxSimrecall@5 | 0.4649 | | MaxSimrecall@10 | 0.532 | | MaxSimndcg@10 | 0.5069 | | MaxSimmrr@10 | 0.5733 | | MaxSimmap@100 | 0.4287 | - `evalstrategy`: steps - `perdevicetrainbatchsize`: 32 - `learningrate`: 0.0003 - `numtrainepochs`: 1 - `warmupratio`: 0.05 - `fp16`: True - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: steps - `predictionlossonly`: True - `perdevicetrainbatchsize`: 32 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 0.0003 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1.0 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.05 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `project`: huggingface - `trackiospaceid`: trackio - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: no - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: True - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: proportional Framework Versions - Python: 3.10.18 - Sentence Transformers: 4.0.2 - PyLate: 1.3.2 - Transformers: 4.57.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.1.1 - Tokenizers: 0.22.1
txtai-arxiv
ljspeech-vits-onnx
biomedbert-hash-nano-colbert
biomedbert-hash-nano-embeddings
glove-2024-wikigiga
This model is an export of the new GloVe 2024 WikiGiga Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. Given that pre-trained embeddings models can get quite large, there is also a SQLite version that lazily loads vectors.
pubmedbert-base-embeddings-500K
vctk-vits-onnx
bert-small-cord19
glove-2024-dolma-quantized
This model is an export of the new GloVe 2024 Dolma Vectors (300d) for `staticvectors`. `staticvectors` enables running inference in Python with NumPy. This helps it maintain solid runtime performance. This model is a quantized version of the base model. It's using 10x256 Product Quantization.
Llama-3.1_OpenScholar-8B-AWQ
This is Llama-3.1OpenScholar-8B with AWQ Quantization applied using the following code.
txtai-hfposts
txtai-neuml-linkedin
txtai-astronomy
t5-small-bashsql
fasttext
kokoro-int8-onnx
txtchat-personas
kokoro-fp16-onnx
kokoro-base-onnx
Txtai Speecht5 Onnx
Fine-tuned version of SpeechT5 TTS exported to ONNX. This model was exported to ONNX using the Optimum library. txtai has a built in Text to Speech (TTS) pipeline that makes using this model easy. This model was fine-tuned using the code in this Hugging Face article and a custom set of WAV files. The ONNX export uses the following code, which requires installing `optimum`. When no speaker argument is passed in, the default speaker embeddings are used. The defaults speaker is David Mezzetti, the primary developer of txtai. It's possible to build custom speaker embeddings as shown below. Fine-tuning the model with a new voice leads to the best results but zero-shot speaker embeddings are OK in some cases. The following code requires installing `torchaudio` and `speechbrain`. Speaker embeddings from the original SpeechT5 TTS training set are supported. See the README for more.