jina-embeddings-v5-text-nano-retrieval
38.6K
12
llama.cpp
by
jinaai
Embedding Model
OTHER
Fair
39K downloads
Community-tested
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Code Examples
Optional: set truncate_dim in encode() to control embedding sizepythonpytorch
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer(
"jinaai/jina-embeddings-v5-text-nano-retrieval",
trust_remote_code=True,
model_kwargs={"dtype": torch.bfloat16}, # Recommended for GPUs
config_kwargs={"_attn_implementation": "flash_attention_2"}, # Recommended but optional
)
# Optional: set truncate_dim in encode() to control embedding size
query = "Which planet is known as the Red Planet?"
documents = [
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
]
# Encode query and documents
query_embeddings = model.encode(sentences=query, prompt_name="query")
document_embeddings = model.encode(sentences=documents, prompt_name="document")
print(query_embeddings.shape, document_embeddings.shape)
# (768,) (4, 768)
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.5013, 0.7914, 0.6133, 0.5736]])1. Load tokenizer and ONNX modelpythontransformers
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import torch
model_id = "jinaai/jina-embeddings-v5-text-nano-retrieval"
# 1. Load tokenizer and ONNX model
# We specify the subfolder 'onnx' where the weights are located
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = ORTModelForFeatureExtraction.from_pretrained(
model_id,
subfolder="onnx",
file_name="model.onnx",
provider="CPUExecutionProvider", # Or "CUDAExecutionProvider" for GPU
trust_remote_code=True,
)
# 2. Prepare input
texts = ["Query: How do I use Jina ONNX models?", "Document: Information about semantic matching."]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
# 4. Inference
with torch.no_grad():
outputs = model(**inputs)
# 5. Pooling (Crucial for Jina-v5)
# Jina-v5 uses LAST-TOKEN pooling.
# We take the hidden state of the last non-padding token.
last_hidden_state = outputs.last_hidden_state
# Find the indices of the last token (usually the end of the sequence)
sequence_lengths = inputs.attention_mask.sum(dim=1) - 1
embeddings = last_hidden_state[torch.arange(last_hidden_state.size(0)), sequence_lengths]
print('embeddings shape:', embeddings.shape)
print('embeddings:', embeddings)Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.