GLM-4.6-128GB-RAM-IK-GGUF
202
19
4.6B
2 languages
ik_llama.cpp
by
Downtown-Case
Language Model
OTHER
4.6B params
New
202 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
11GB+ RAM
Mobile
Laptop
Server
Quick Summary
Quantized for 128GB RAM + single GPU setups, with `IQK` quants for better quality/performance in the size than mainline llama.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
5GB+ RAM
Code Examples
V1 (Obsolete):text
# Attention (GPU)
blk\..*\.attn_q.*=iq5_ks
blk\..*\.attn_k.*=iq6_k
blk\..*\.attn_v.*=iq6_k
blk\..*\.attn_output.*=iq5_ks
# First 3 Dense Layers [0-2] (GPU)
blk\..*\.ffn_down\.weight=iq5_ks
blk\..*\.ffn_(gate|up)\.weight=iq5_ks
# Shared Expert Layers [3-92] (GPU)
blk\..*\.ffn_down_shexp\.weight=iq5_ks
blk\..*\.ffn_(gate|up)_shexp\.weight=iq5_ks
# Routed Experts Layers [3-6] (GPU)
blk\.[3-6]\.ffn_down_exps\.weight=iq3_kt
blk\.[3-6]\.ffn_(gate|up)_exps\.weight=iq3_kt
# Routed Experts Layers [7-19] (CPU)
blk\.[7-9]\.ffn_down_exps\.weight=iq3_ks
blk\.[7-9]\.ffn_(gate|up)_exps\.weight=iq3_ks
blk\.[1-1][0-9]\.ffn_down_exps\.weight=iq3_ks
blk\.[1-1][0-9]\.ffn_(gate|up)_exps\.weight=iq3_ks
# Routed Experts Layers [81-92] (CPU)
blk\.[8-8][1-9]\.ffn_down_exps\.weight=iq3_ks
blk\.[8-8][1-9]\.ffn_(gate|up)_exps\.weight=iq3_ks
blk\.[9-9][0-2]\.ffn_down_exps\.weight=iq3_ks
blk\.[9-9][0-2]\.ffn_(gate|up)_exps\.weight=iq3_ks
# Routed Experts Layers [20-80] (CPU)
blk\..*\.ffn_down_exps\.weight=iq2_kl
blk\..*\.ffn_(gate|up)_exps\.weight=iq2_kl
# NextN MTP Layer [92] (Unused, not loaded from disk)
blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
blk\..*\.nextn\.eh_proj\.weight=q8_0
# Non-Repeating Layers
token_embd\.weight=iq4_k
output\.weight=iq6_ktext
====== Perplexity statistics ======
Mean PPL(Q) : 8.894389 ± 0.158902
Mean PPL(base) : 8.453608 ± 0.150165
Cor(ln(PPL(Q)), ln(PPL(base))): 98.45%
Mean ln(PPL(Q)/PPL(base)) : 0.050827 ± 0.003141
Mean PPL(Q)/PPL(base) : 1.052141 ± 0.003305
Mean PPL(Q)-PPL(base) : 0.440781 ± 0.028588
====== KL divergence statistics ======
Mean KLD: 0.080672 ± 0.001829
Maximum KLD: 18.875971
99.9% KLD: 3.780314
99.0% KLD: 1.146402
95.0% KLD: 0.295799
90.0% KLD: 0.153155
Median KLD: 0.020809
10.0% KLD: 0.000088
5.0% KLD: 0.000023
1.0% KLD: 0.000002
0.1% KLD: -0.000002
Minimum KLD: -0.000217
====== Token probability statistics ======
Mean Δp: -1.231 ± 0.059 %
Maximum Δp: 99.351%
99.9% Δp: 48.984%
99.0% Δp: 19.863%
95.0% Δp: 6.112%
90.0% Δp: 2.490%
75.0% Δp: 0.155%
Median Δp: -0.009%
25.0% Δp: -0.731%
10.0% Δp: -5.427%
5.0% Δp: -11.712%
1.0% Δp: -44.734%
0.1% Δp: -88.132%
Minimum Δp: -99.827%
RMS Δp : 9.421 ± 0.179 %
Same top p: 86.479 ± 0.216 %Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.