pplx-embed-v1-4b

15.8K
52
license:mit
by
perplexity-ai
Embedding Model
OTHER
4B params
Fair
16K downloads
Community-tested
Edge AI:
Mobile
Laptop
Server
9GB+ RAM
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
4GB+ RAM

Code Examples

python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "perplexity-ai/pplx-embed-v1-4B",
    trust_remote_code=True
)

texts = [
    "Scientists explore the universe driven by curiosity.",
    "Children learn through curious exploration.",
    "Historical discoveries began with curious questions.",
    "Animals use curiosity to adapt and survive.",
    "Philosophy examines the nature of curiosity.",
]

embeddings = model.encode(texts) # Shape: (5, 2560), quantized to int8
embeddings = model.encode(texts, quantization="binary") # Shape: (5, 2560), quantized to binary
Shape: (5, 2560), quantized to binarypythontransformers
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("perplexity-ai/pplx-embed-v1-4b", trust_remote_code=True)
session = ort.InferenceSession("onnx/model.onnx")


texts = [
    "Scientists explore the universe driven by curiosity.",
    "Children learn through curious exploration.",
    "Historical discoveries began with curious questions.",
    "Animals use curiosity to adapt and survive.",
    "Philosophy examines the nature of curiosity.",
]

tokenized = tokenizer(
    texts,
    padding=True,
    truncation=True,
    return_tensors="np"
)

onnx_inputs = {
    "input_ids": tokenized["input_ids"].astype(np.int64),
    "attention_mask": tokenized["attention_mask"].astype(np.int64),
}

# Run inference
onnx_embeddings = session.run([out.name for out in session.get_outputs()], onnx_inputs)

# ONNX produces both int8 and binary precision embeddings:
int8_embeddings = onnx_embeddings[2]
binary_embeddings = onnx_embeddings[3]
packed_embeddings = np.packbits(binary_embeddings != -1, axis=-1)
bash
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 --model-id perplexity-ai/pplx-embed-v1-4B --dtype float32

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.