jina-embeddings-v5-text-nano-classification

5.3K
6
llama.cpp
by
jinaai
Language Model
OTHER
New
5K downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Code Examples

Optional: set truncate_dim in encode() to control embedding sizepythonpytorch
from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer(
    "jinaai/jina-embeddings-v5-text-nano-classification",
    trust_remote_code=True,
    model_kwargs={"dtype": torch.bfloat16},  # Recommended for GPUs
    config_kwargs={"_attn_implementation": "flash_attention_2"},  # Recommended but optional
)
# Optional: set truncate_dim in encode() to control embedding size

texts = [
    "My order hasn't arrived yet and it's been two weeks.",
    "How do I reset my password?",
    "I'd like a refund for my recent purchase.",
    "Your product exceeded my expectations. Great job!",
]

# Encode texts
embeddings = model.encode(texts)
print(embeddings.shape)
# (4, 768)

similarity = model.similarity(embeddings, embeddings)
print(similarity)
# tensor([[1.0000, 0.7152, 0.8378, 0.8101],
#         [0.7152, 1.0000, 0.7512, 0.6940],
#         [0.8378, 0.7512, 1.0000, 0.7741],
#         [0.8101, 0.6940, 0.7741, 1.0000]])
1. Load tokenizer and ONNX modelpythontransformers
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import torch

model_id = "jinaai/jina-embeddings-v5-text-nano-classification"

# 1. Load tokenizer and ONNX model
# We specify the subfolder 'onnx' where the weights are located
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = ORTModelForFeatureExtraction.from_pretrained(
    model_id,
    subfolder="onnx",
    file_name="model.onnx",
    provider="CPUExecutionProvider",  # Or "CUDAExecutionProvider" for GPU
    trust_remote_code=True,
)

# 2. Prepare input
texts = ["Document: How do I use Jina ONNX models?", "Document: Information about semantic matching."]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")


# 4. Inference
with torch.no_grad():
    outputs = model(**inputs)

# 5. Pooling (Crucial for Jina-v5)
# Jina-v5 uses LAST-TOKEN pooling.
# We take the hidden state of the last non-padding token.
last_hidden_state = outputs.last_hidden_state
# Find the indices of the last token (usually the end of the sequence)
sequence_lengths = inputs.attention_mask.sum(dim=1) - 1
embeddings = last_hidden_state[torch.arange(last_hidden_state.size(0)), sequence_lengths]

print('embeddings shape:', embeddings.shape)
print('embeddings:', embeddings)

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.