upskyy
bge-m3-korean
This model is korsts and kornli finetuning model from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: BAAI/bge-m3 - Maximum Sequence Length: 8192 tokens - Output Dimensionality: 1024 tokens - Similarity Function: Cosine Similarity Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. Semantic Similarity Dataset: `sts-dev` Evaluated with EmbeddingSimilarityEvaluator | Metric | Value | | :----------------- | :--------- | | pearsoncosine | 0.874 | | spearmancosine | 0.8724 | | pearsonmanhattan | 0.8593 | | spearmanmanhattan | 0.8688 | | pearsoneuclidean | 0.8598 | | spearmaneuclidean | 0.8694 | | pearsondot | 0.8684 | | spearmandot | 0.8666 | | pearsonmax | 0.874 | | spearmanmax | 0.8724 | Framework Versions - Python: 3.10.13 - Sentence Transformers: 3.0.1 - Transformers: 4.42.4 - PyTorch: 2.3.0+cu121 - Accelerate: 0.30.1 - Datasets: 2.16.1 - Tokenizers: 0.19.1