octen-embedding-0.6b-onnx-int4
98
license:apache-2.0
by
cstr
Embedding Model
OTHER
0.6B params
New
98 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
2GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
1GB+ RAM
Code Examples
Inference (batch=1)pythononnx
import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_truncation(max_length=512)
# CPUExecutionProvider supports MatMulNBits 4-bit
session = ort.InferenceSession("model.int4.onnx", providers=["CPUExecutionProvider"])
text = "semantic search example"
enc = tokenizer.encode(text)
ids = np.array([enc.ids], dtype=np.int64)
mask = np.array([enc.attention_mask], dtype=np.int64)
lhs = session.run(None, {"input_ids": ids, "attention_mask": mask})[0] # [1, seq, 1024]
emb = lhs[0, mask[0].sum() - 1] # last non-padding token
emb = emb / np.linalg.norm(emb)
print(emb.shape) # (1024,)Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.