gte-multilingual-base

Name: gte-multilingual-base
Author: Alibaba-NLP

1.4M

329

GPT-3 class

277M

76 languages

license:apache-2.0

Alibaba-NLP

Embedding Model

OTHER

High

1.4M downloads

Battle-tested

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

1GB+ RAM

Mobile

Laptop

Server

Quick Summary

--- tags: - mteb - sentence-transformers - transformers - multilingual - sentence-similarity - text-embeddings-inference license: apache-2.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

1GB+ RAM

Code Examples

Get Dense Embeddings with Transformerspythontransformers

# Requires transformers>=4.36.0

import torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer

input_texts = [
    "what is the capital of China?",
    "how to implement quick sort in python?",
    "北京",
    "快排算法介绍"
]

model_name_or_path = 'Alibaba-NLP/gte-multilingual-base'
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModel.from_pretrained(model_name_or_path, trust_remote_code=True)

# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=8192, padding=True, truncation=True, return_tensors='pt')

outputs = model(**batch_dict)

dimension=768 # The output dimension of the output embedding, should be in [128, 768]
embeddings = outputs.last_hidden_state[:, 0][:dimension]

embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:1] @ embeddings[1:].T) * 100
print(scores.tolist())

# [[0.3016996383666992, 0.7503870129585266, 0.3203084468841553]]

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.