pplx-embed-v1-4b
15.8K
52
license:mit
by
perplexity-ai
Embedding Model
OTHER
4B params
Fair
16K downloads
Community-tested
Edge AI:
Mobile
Laptop
Server
9GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
4GB+ RAM
Code Examples
python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"perplexity-ai/pplx-embed-v1-4B",
trust_remote_code=True
)
texts = [
"Scientists explore the universe driven by curiosity.",
"Children learn through curious exploration.",
"Historical discoveries began with curious questions.",
"Animals use curiosity to adapt and survive.",
"Philosophy examines the nature of curiosity.",
]
embeddings = model.encode(texts) # Shape: (5, 2560), quantized to int8
embeddings = model.encode(texts, quantization="binary") # Shape: (5, 2560), quantized to binaryShape: (5, 2560), quantized to binarypythontransformers
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
tokenizer = AutoTokenizer.from_pretrained("perplexity-ai/pplx-embed-v1-4b", trust_remote_code=True)
session = ort.InferenceSession("onnx/model.onnx")
texts = [
"Scientists explore the universe driven by curiosity.",
"Children learn through curious exploration.",
"Historical discoveries began with curious questions.",
"Animals use curiosity to adapt and survive.",
"Philosophy examines the nature of curiosity.",
]
tokenized = tokenizer(
texts,
padding=True,
truncation=True,
return_tensors="np"
)
onnx_inputs = {
"input_ids": tokenized["input_ids"].astype(np.int64),
"attention_mask": tokenized["attention_mask"].astype(np.int64),
}
# Run inference
onnx_embeddings = session.run([out.name for out in session.get_outputs()], onnx_inputs)
# ONNX produces both int8 and binary precision embeddings:
int8_embeddings = onnx_embeddings[2]
binary_embeddings = onnx_embeddings[3]
packed_embeddings = np.packbits(binary_embeddings != -1, axis=-1)bash
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 --model-id perplexity-ai/pplx-embed-v1-4B --dtype float32Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.