Omni-R1

1.1K
by
ModalityDance
Image Model
OTHER
New
1K downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Code Examples

Usagepythontransformers
import torch
from PIL import Image
from transformers import ChameleonProcessor, ChameleonForConditionalGeneration

# 1) Import & load
model_id = "ModalityDance/Omni-R1"  # or "ModalityDance/Omni-R1-Zero"
processor = ChameleonProcessor.from_pretrained(model_id)
model = ChameleonForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

# 2) Prepare a single input (prompt contains <image>)
prompt = "What is the smiling man in the image wearing? <image>"
image = Image.open("image.png").convert("RGB")

inputs = processor(
    prompt,
    images=[image],
    padding=False,
    return_for_text_completion=True,
    return_tensors="pt",
).to(model.device)

# --- minimal image token preprocessing: replace <image> placeholder with image tokens ---
input_ids = inputs["input_ids"].long()
pixel_values = inputs["pixel_values"]

placeholder_id = processor.tokenizer.encode("<image>", add_special_tokens=False)[0]
image_tokens = model.get_image_tokens(pixel_values)  # shape: [1, N] (or compatible)

mask = (input_ids == placeholder_id)
input_ids = input_ids.clone()
input_ids[mask] = image_tokens.reshape(-1).to(dtype=torch.long, device=input_ids.device)

# 3) Call the model
outputs = model.generate(
    input_ids=input_ids,
    max_length=4096,
    do_sample=True,
    temperature=0.5,
    top_p=0.9,
    pad_token_id=1,
    multimodal_generation_mode="unrestricted",
)

# 4) Get results
text = processor.batch_decode(outputs, skip_special_tokens=False)[0]
print(text)

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.