GLM-4.6-REAP-218B-A32B-W4A16-AutoRound

Name: GLM-4.6-REAP-218B-A32B-W4A16-AutoRound
Author: 0xSero

282

license:mit

0xSero

Language Model

OTHER

218B params

New

282 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

488GB+ RAM

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

204GB+ RAM

Code Examples

The Science Behind Dataset Selectiontext

REAP Algorithm:
1. Forward pass calibration samples through model
2. Record which experts activate and their magnitudes
3. Compute saliency = router_weight × activation_norm
4. Prune lowest-saliency experts

Key Insight: Experts are TASK-SPECIFIC
├── Some experts specialize in natural language
├── Some experts specialize in code syntax
├── Some experts specialize in JSON/structured output
└── Some experts specialize in multi-turn context

If calibration lacks code → code-specialized experts appear "unused" → get pruned → model loses coding ability

Combined Datasetbibtex

@article{lasby2025reap,
  title={REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
  author={Lasby, Mike and Lazarevich, Ivan and Sinnadurai, Nish and Lie, Sean and Ioannou, Yani and Thangarasa, Vithursan},
  journal={arXiv preprint arXiv:2510.13999},
  year={2025},
  url={https://arxiv.org/abs/2510.13999}
}

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.