MiniCPM4.1-8B-GGUF

Name: MiniCPM4.1-8B-GGUF
Author: Mungert

308

8.0B

2 languages

BF16

license:apache-2.0

Mungert

Language Model

OTHER

8B params

New

308 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

18GB+ RAM

Mobile

Laptop

Server

Quick Summary

This model was generated using llama.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

8GB+ RAM

Code Examples

bash

git clone -b feature_infer https://github.com/OpenBMB/infllmv2_cuda_impl.git
cd infllmv2_cuda_impl
git submodule update --init --recursive
pip install -e . # or python setup.py install

or python setup.py installjson

{
    ...,
    "sparse_config": {
        "kernel_size": 32,
        "kernel_stride": 16,
        "init_blocks": 1,
        "block_size": 64,
        "window_size": 2048,
        "topk": 64,
        "use_nope": false,
        "dense_len": 8192
    }
}

2. Install EAGLE3-Compatible SGLangbash

git clone https://github.com/LDLINGLINGLING/sglang.git
cd sglang
pip install -e "python[all]"

3. Launch SGLang Server with Speculative Decodingbash

python -m sglang.launch_server \
  --model-path "openbmb/MiniCPM4.1-8B" \
  --host "127.0.0.1" \
  --port 30002 \
  --mem-fraction-static 0.9 \
  --speculative-algorithm EAGLE3 \
  --speculative-draft-model-path "your/path/MiniCPM4_1-8B-Eagle3-bf16" \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 32 \
  --temperature 0.7

Standard Inference (Without Speculative Decoding)bash

git clone -b openbmb https://github.com/OpenBMB/sglang.git
cd sglang

pip install --upgrade pip
pip install -e "python[all]"

bash

python -m sglang.launch_server --model openbmb/MiniCPM4.1-8B --trust-remote-code --port 30000 --chat-template chatml

2. Install EAGLE3-Compatible vLLMbashvllm

git clone https://github.com/LDLINGLINGLING/vllm.git
cd vllm 
pip install -e .

Standard Inference (Without Speculative Decoding)bashvllm

pip install -U vllm \
    --pre \
    --extra-index-url https://wheels.vllm.ai/nightly

Ollamatext

ollama run openbmb/minicpm4.1

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.