MiMo-V2-Flash

477.4K
633
license:mit
by
XiaomiMiMo
Language Model
OTHER
Good
477K downloads
Production-ready
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Code Examples

6. Inference & Deploymentbash
pip install sglang

# Launch server
python3 -m sglang.launch_server \
        --model-path XiaomiMiMo/MiMo-V2-Flash \
        --served-model-name mimo-v2-flash \
        --pp-size 1 \
        --dp-size 2 \
        --enable-dp-attention \
        --tp-size 8 \
        --moe-a2a-backend deepep \
        --page-size 1 \
        --host 0.0.0.0 \
        --port 9001 \
        --trust-remote-code \
        --mem-fraction-static 0.75 \
        --max-running-requests 128 \
        --chunked-prefill-size 16384 \
        --reasoning-parser qwen3 \
        --tool-call-parser mimo \
        --context-length 262144 \
        --attention-backend fa3 \
        --speculative-algorithm EAGLE \
        --speculative-num-steps 3 \
        --speculative-eagle-topk 1 \
        --speculative-num-draft-tokens 4 \
        --enable-mtp

# Send request
curl -i http://localhost:9001/v1/chat/completions \
    -H 'Content-Type:application/json' \
    -d  '{
            "messages" : [{
                "role": "user",
                "content": "Nice to meet you MiMo"
            }],
            "model": "mimo-v2-flash",
            "max_tokens": 4096,
            "temperature": 0.8,
            "top_p": 0.95,
            "stream": true,
            "chat_template_kwargs": {
                "enable_thinking": true
            }
        }'

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.