Devstral-Small-2-24B-Instruct-SINQ-4bit
66
1
license:apache-2.0
by
maxence-bouvier
Language Model
OTHER
24B params
New
66 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
54GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
23GB+ RAM
Code Examples
Load processor (handles tokenization and chat templates)python
def _build_unicode_to_bytes_map() -> dict[str, int]:
"""Build inverse of GPT-2's bytes_to_unicode mapping."""
bs = (
list(range(ord("!"), ord("~") + 1))
+ list(range(ord("¡"), ord("¬") + 1))
+ list(range(ord("®"), ord("ÿ") + 1))
)
cs = bs[:]
n = 0
for b in range(256):
if b not in bs:
bs.append(b)
cs.append(256 + n)
n += 1
return {chr(c): b for b, c in zip(bs, cs)}
_UNICODE_TO_BYTE = _build_unicode_to_bytes_map()
def fix_byte_encoding(text: str) -> str:
"""Fix byte-level BPE encoding for proper emoji/unicode display.
Example: "ð٤Ĺ" -> "🤗"
"""
try:
byte_values = bytes([_UNICODE_TO_BYTE.get(c, ord(c)) for c in text])
return byte_values.decode("utf-8")
except (UnicodeDecodeError, KeyError, ValueError):
return text
messages = [
{"role": "user", "content": "Write a Python function to check if a number is prime."}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = processor(text=text, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
pad_token_id=processor.tokenizer.eos_token_id,
)
response = processor.decode(outputs[0], skip_special_tokens=True)
response = fix_byte_encoding(response) # Fix emoji/unicode display
print(response)Quick Testpython
# Minimal test to verify the model works
messages = [{"role": "user", "content": "Say hello"}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to("cuda")
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=50, do_sample=False,
pad_token_id=processor.tokenizer.eos_token_id)
response = fix_byte_encoding(processor.decode(out[0], skip_special_tokens=True))
print(response)Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.