octen-embedding-0.6b-onnx-int4

98
license:apache-2.0
by
cstr
Embedding Model
OTHER
0.6B params
New
98 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
2GB+ RAM
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
1GB+ RAM

Code Examples

Inference (batch=1)pythononnx
import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_truncation(max_length=512)

# CPUExecutionProvider supports MatMulNBits 4-bit
session = ort.InferenceSession("model.int4.onnx", providers=["CPUExecutionProvider"])

text = "semantic search example"
enc  = tokenizer.encode(text)
ids  = np.array([enc.ids],             dtype=np.int64)
mask = np.array([enc.attention_mask],  dtype=np.int64)

lhs  = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]  # [1, seq, 1024]
emb  = lhs[0, mask[0].sum() - 1]   # last non-padding token
emb  = emb / np.linalg.norm(emb)
print(emb.shape)  # (1024,)

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.