GLM-4.7-REAP-218B-A32B-W4A16

477
24
license:apache-2.0
by
0xSero
Language Model
OTHER
218B params
New
477 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
488GB+ RAM
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
204GB+ RAM

Code Examples

Compression Pipelinetext
GLM-4.7 (358B, 700GB)
        |
        v  REAP 40% pruning (96/160 experts)
        |
GLM-4.7-REAP-218B-A32B (218B, 407GB)
        |
        v  AutoRound W4A16 quantization
        |
GLM-4.7-REAP-218B-A32B-W4A16 (218B, 108GB)  <-- This model

Total: 6.5x compression
AutoRound Quantization Detailsyaml
bits: 4
group_size: 128
format: auto_round
nsamples: 64
seqlen: 512
dataset: NeelNanda/pile-10k
Reproduce This Modelbash
# 1. Download the BF16 REAP model
huggingface-cli download 0xSero/GLM-4.7-REAP-218B-A32B --local-dir ./GLM-4.7-REAP-218B-A32B

# 2. Run AutoRound quantization
pip install auto-round

python -c "
from auto_round import AutoRound
ar = AutoRound(
    './GLM-4.7-REAP-218B-A32B',
    device='cuda',
    device_map='auto',
    nsamples=64,
    seqlen=512,
    batch_size=1
)
ar.quantize_and_save('./GLM-4.7-REAP-218B-A32B-W4A16', format='auto_round')
"

# Takes ~2 hours on 8x H200
Citationbibtex
@article{jones2025reap,
  title={REAP: Router-Experts Activation Pruning for Efficient Mixture-of-Experts},
  author={Jones, et al.},
  journal={arXiv preprint arXiv:2505.20877},
  year={2025}
}

@misc{autoround2024,
  title={AutoRound: Advanced Weight Quantization},
  author={Intel Corporation},
  year={2024},
  howpublished={\url{https://github.com/intel/auto-round}}
}

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.