huyydangg
DEk21_hcmute_embedding
DEk21hcmuteembedding is a Vietnamese text embedding focused on RAG and production efficiency: š Trained Dataset: The model was trained on an in-house dataset consisting of approximately 100,000 examples of legal questions and their related contexts. āļø Efficiency: Trained with a Matryoshka loss, allowing embeddings to be truncated with minimal performance loss. This ensures that smaller embeddings are faster to compare, making the model efficient for real-world production use. Model Description - Model Type: Sentence Transformer - Maximum Sequence Length: 256 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Language: vietnamese - License: apache-2.0 - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face | model | type | ndcg@3 | ndcg@5 | ndcg@10 | mrr@3 | mrr@5 | mrr@10 | |:---------------------------------------------|:-------|---------:|---------:|----------:|---------:|---------:|---------:| | huyydangg/DEk21hcmuteembeddingwseg | dense | 0.908405 | 0.914792 | 0.917742 | 0.889583 | 0.893099 | 0.894266 | | AITeamVN/VietnameseEmbedding | dense | 0.842687 | 0.854993 | 0.865006 | 0.822135 | 0.82901 | 0.833389 | | bkai-foundation-models/vietnamese-bi-encoder | hybrid | 0.827247 | 0.844781 | 0.846937 | 0.799219 | 0.809505 | 0.806771 | | bkai-foundation-models/vietnamese-bi-encoder | dense | 0.814116 | 0.82965 | 0.839567 | 0.796615 | 0.805286 | 0.809572 | | AITeamVN/VietnameseEmbedding | hybrid | 0.788724 | 0.810062 | 0.820797 | 0.758333 | 0.77224 | 0.776461 | | BAAI/bge-m3 | dense | 0.784056 | 0.80665 | 0.817016 | 0.763281 | 0.775859 | 0.780293 | | BAAI/bge-m3 | hybrid | 0.775239 | 0.797382 | 0.811962 | 0.747656 | 0.763333 | 0.77128 | | huyydangg/DEk21hcmuteembedding | dense | 0.752173 | 0.769259 | 0.785101 | 0.72474 | 0.734427 | 0.741076 | | hiieu/halongembedding | hybrid | 0.73627 | 0.757183 | 0.779169 | 0.710417 | 0.721901 | 0.731976 | | bm25 | bm25 | 0.728122 | 0.74974 | 0.761612 | 0.699479 | 0.711198 | 0.715738 | | dangvantuan/vietnamese-embedding | dense | 0.718971 | 0.746521 | 0.763416 | 0.696354 | 0.711953 | 0.718854 | | dangvantuan/vietnamese-embedding | hybrid | 0.71711 | 0.743537 | 0.758315 | 0.690104 | 0.704792 | 0.712261 | | VoVanPhuc/sup-SimCSE-VietNamese-phobert-base | hybrid | 0.688483 | 0.713829 | 0.733894 | 0.660156 | 0.671198 | 0.676961 | | hiieu/halongembedding | dense | 0.656377 | 0.675881 | 0.701368 | 0.630469 | 0.641406 | 0.652057 | | VoVanPhuc/sup-SimCSE-VietNamese-phobert-base | dense | 0.558852 | 0.584799 | 0.611329 | 0.536979 | 0.55112 | 0.562218 | Citation