GLM-4.7-REAP-218B-A32B-W4A16

Name: GLM-4.7-REAP-218B-A32B-W4A16
Author: 0xSero

477

license:apache-2.0

0xSero

Language Model

OTHER

218B params

New

477 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

488GB+ RAM

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

204GB+ RAM

Code Examples

Compression Pipelinetext

GLM-4.7 (358B, 700GB)
        |
        v  REAP 40% pruning (96/160 experts)
        |
GLM-4.7-REAP-218B-A32B (218B, 407GB)
        |
        v  AutoRound W4A16 quantization
        |
GLM-4.7-REAP-218B-A32B-W4A16 (218B, 108GB)  <-- This model

Total: 6.5x compression

AutoRound Quantization Detailsyaml

bits: 4
group_size: 128
format: auto_round
nsamples: 64
seqlen: 512
dataset: NeelNanda/pile-10k

Reproduce This Modelbash

# 1. Download the BF16 REAP model
huggingface-cli download 0xSero/GLM-4.7-REAP-218B-A32B --local-dir ./GLM-4.7-REAP-218B-A32B

# 2. Run AutoRound quantization
pip install auto-round

python -c "
from auto_round import AutoRound
ar = AutoRound(
    './GLM-4.7-REAP-218B-A32B',
    device='cuda',
    device_map='auto',
    nsamples=64,
    seqlen=512,
    batch_size=1
)
ar.quantize_and_save('./GLM-4.7-REAP-218B-A32B-W4A16', format='auto_round')
"

# Takes ~2 hours on 8x H200

Citationbibtex

@article{jones2025reap,
  title={REAP: Router-Experts Activation Pruning for Efficient Mixture-of-Experts},
  author={Jones, et al.},
  journal={arXiv preprint arXiv:2505.20877},
  year={2025}
}

@misc{autoround2024,
  title={AutoRound: Advanced Weight Quantization},
  author={Intel Corporation},
  year={2024},
  howpublished={\url{https://github.com/intel/auto-round}}
}

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.