AI21-Jamba2-Mini
348
45
license:apache-2.0
by
ai21labs
Language Model
OTHER
New
348 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Code Examples
Quickstarttextvllm
vllm serve "ai21labs/AI21-Jamba2-Mini" --mamba-ssm-cache-dtype float32 --enable-auto-tool-choice --tool-call-parser hermes --enable-prefix-caching --quantization experts_int8Run with Transformerstext
pip install transformers>=4.54.0
pip install flash-attn --no-build-isolation
pip install causal-conv1d>=1.2.0
pip install mamba-ssmRun with Transformerstexttransformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ai21labs/AI21-Jamba2-Mini",
dtype=torch.bfloat16,
attn_implementation="flash_attention_2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("ai21labs/AI21-Jamba2-Mini")
messages = [
{"role": "system",
"content": "You are an HR Policy Assistant.
Answer employee questions using only the provided policy documents.
If the answer isn't in the documents, say so clearly.
Be concise and cite the specific policy section when possible."
},
{"role": "user",
"content": "Context documents: {retrieved_chunks}.
Employee question: {user_question}.
Answer:"
},
]
prompts = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
outputs = model.generate(**tokenizer(prompts, return_tensors="pt").to(model.device), do_sample=True, temperature=0.6)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.