PORTULAN
serafim-335m-portuguese-pt-sentence-encoder-ir
serafim-100m-portuguese-pt-sentence-encoder-ir
albertina-100m-portuguese-ptbr-encoder
serafim-100m-portuguese-pt-sentence-encoder
serafim-900m-portuguese-pt-sentence-encoder
serafim-335m-portuguese-pt-sentence-encoder
gervasio-8b-portuguese-ptpt-decoder
Serafim 900m Portuguese Pt Sentence Encoder Ir
Serafim 900m Portuguese (PT) Sentence Transformer tuned for Information Retrieval (IR) This is a sentence-transformers model: It maps sentences & paragraphs to a 1536 dimensional dense vector space and can be used for tasks like clustering or semantic search. Using this model becomes easy when you have sentence-transformers installed: Usage (HuggingFace Transformers) Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. For an automated evaluation of this model, see the Sentence Embeddings Benchmark: https://seb.sbert.net Training The model was trained with the parameters: `sentencetransformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader` of length 1989040 with parameters: `sentencetransformers.losses.GISTEmbedLoss.GISTEmbedLoss` with parameters: The article has been presented at EPIA 2024 conference and published by Springer: @InProceedings{epia2024serafim, title={Open Sentence Embeddings for Portuguese with the Serafim PT encoders family}, author={Luís Gomes and António Branco and João Silva and João Rodrigues and Rodrigo Santos}, editor={Manuel Filipe Santos and José Machado and Paulo Novais and Paulo Cortez and Pedro Miguel Moreira}, booktitle={Progress in Artificial Intelligence}, doi={doi.org/10.1007/978-3-031-73503-522}, year={2024}, publisher={Springer Nature Switzerland}, address={Cham}, pages={267--279}, isbn={978-3-031-73503-5} } Before publication by Springer, the pre-print was available at arXiv: @misc{gomes2024opensentenceembeddingsportuguese, title={Open Sentence Embeddings for Portuguese with the Serafim PT encoders family}, author={Luís Gomes and António Branco and João Silva and João Rodrigues and Rodrigo Santos}, year={2024}, eprint={2407.19527}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2407.19527}, }