octen-embedding-0.6b-onnx-int4

Name: octen-embedding-0.6b-onnx-int4
Author: cstr

license:apache-2.0

cstr

Embedding Model

OTHER

0.6B params

New

98 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

2GB+ RAM

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

1GB+ RAM

Code Examples

Inference (batch=1)pythononnx

import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_truncation(max_length=512)

# CPUExecutionProvider supports MatMulNBits 4-bit
session = ort.InferenceSession("model.int4.onnx", providers=["CPUExecutionProvider"])

text = "semantic search example"
enc  = tokenizer.encode(text)
ids  = np.array([enc.ids],             dtype=np.int64)
mask = np.array([enc.attention_mask],  dtype=np.int64)

lhs  = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]  # [1, seq, 1024]
emb  = lhs[0, mask[0].sum() - 1]   # last non-padding token
emb  = emb / np.linalg.norm(emb)
print(emb.shape)  # (1024,)

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.