jina-embeddings-v5-text-nano-clustering
2.0K
5
llama.cpp
by
jinaai
Embedding Model
OTHER
New
2K downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Code Examples
Optional: set truncate_dim in encode() to control embedding sizepythonpytorch
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer(
"jinaai/jina-embeddings-v5-text-nano-clustering",
trust_remote_code=True,
model_kwargs={"dtype": torch.bfloat16}, # Recommended for GPUs
config_kwargs={"_attn_implementation": "flash_attention_2"}, # Recommended but optional
)
# Optional: set truncate_dim in encode() to control embedding size
texts = [
"We propose a novel neural network architecture for image segmentation.",
"This paper analyzes the effects of monetary policy on inflation.",
"Our method achieves state-of-the-art results on object detection benchmarks.",
"We study the relationship between interest rates and housing prices.",
"A new attention mechanism is introduced for visual recognition tasks.",
]
# Encode texts
embeddings = model.encode(texts)
print(embeddings.shape)
# (5, 768)
similarity = model.similarity(embeddings, embeddings)
print(similarity)
# tensor([[1.0000, 0.2933, 0.9304, 0.2928, 0.8635],
# [0.2933, 1.0000, 0.3062, 0.8083, 0.3035],
# [0.9304, 0.3062, 1.0000, 0.2943, 0.8651],
# [0.2928, 0.8083, 0.2943, 1.0000, 0.2827],
# [0.8635, 0.3035, 0.8651, 0.2827, 1.0000]])1. Load tokenizer and ONNX modelpythontransformers
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import torch
model_id = "jinaai/jina-embeddings-v5-text-nano-clustering"
# 1. Load tokenizer and ONNX model
# We specify the subfolder 'onnx' where the weights are located
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = ORTModelForFeatureExtraction.from_pretrained(
model_id,
subfolder="onnx",
file_name="model.onnx",
provider="CPUExecutionProvider", # Or "CUDAExecutionProvider" for GPU
trust_remote_code=True,
)
# 2. Prepare input
texts = ["Document: How do I use Jina ONNX models?", "Document: Information about semantic matching."]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
# 4. Inference
with torch.no_grad():
outputs = model(**inputs)
# 5. Pooling (Crucial for Jina-v5)
# Jina-v5 uses LAST-TOKEN pooling.
# We take the hidden state of the last non-padding token.
last_hidden_state = outputs.last_hidden_state
# Find the indices of the last token (usually the end of the sequence)
sequence_lengths = inputs.attention_mask.sum(dim=1) - 1
embeddings = last_hidden_state[torch.arange(last_hidden_state.size(0)), sequence_lengths]
print('embeddings shape:', embeddings.shape)
print('embeddings:', embeddings)Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.