ddh0
gemma-4-it-GGUF
Qwen3.5-GGUF
GLM-4.5-Air-GGUF
This repository contains several custom GGUF quantizations of GLM-4.5-Air, to be used with llama.cpp. The naming scheme for these custom quantizations is as follows: > `ModelName-DefaultType-FFN-UpType-GateType-DownType.gguf` Where `DefaultType` refers to the default tensor type, and `UpType`, `GateType`, and `DownType` refer to the tensor types used for the `ffnupexps`, `ffngateexps`, and `ffndownexps` tensors respectively. These quantizations use Q80 for all tensors by default - only the dense FFN block and conditional experts are downgraded. The shared expert is always kept in Q80. They were quantized using bartowski's imatrix. | Filename | Size (GB) | Size (GiB) | Average BPW | Direct link | | -------------------------------------------- | --------- | ---------- | ----------- | ------------------------------------------------------------------------------------------------------------------ | | GLM-4.5-Air-Q80-FFN-IQ3S-IQ3S-Q50.gguf | 61.66 | 57.43 | 4.47 | Download | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ4XS-Q50.gguf | 68.56 | 63.86 | 4.97 | Download | | GLM-4.5-Air-Q80-FFN-Q4K-Q4K-Q51.gguf | 72.82 | 67.82 | 5.27 | Download | | GLM-4.5-Air-Q80-FFN-Q4K-Q4K-Q80.gguf | 83.44 | 77.71 | 6.04 | Download | | GLM-4.5-Air-Q80-FFN-Q5K-Q5K-Q80.gguf | 91.94 | 85.63 | 6.66 | Download | | GLM-4.5-Air-Q80-FFN-Q6K-Q6K-Q80.gguf | 100.97 | 94.04 | 7.31 | Download | | GLM-4.5-Air-Q80.gguf | 117.45 | 109.39 | 8.50 | Download | | GLM-4.5-Air-bf16.gguf | 220.98 | 205.81 | 16.00 | Download | These quantizations use Q80 for all tensors by default, including the dense FFN block. Only the conditional experts are downgraded. The shared expert is always kept in Q80. They were quantized using my own imatrix (the calibration text corpus can be found here). | Filename | Size (GB) | Size (GiB) | Average BPW | Direct link | | ------------------------------------------------- | --------- | ---------- | ----------- | ----------------------------------------------------------------------------------------------------------------------- | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ3S-IQ4NL-v2.gguf | 60.94 | 56.76 | 4.41 | Download | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ4XS-IQ4NL-v2.gguf | 64.39 | 59.97 | 4.66 | Download | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ4XS-Q50-v2.gguf | 68.63 | 63.92 | 4.97 | Download | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ4XS-Q80-v2.gguf | 81.36 | 75.78 | 5.89 | Download | | GLM-4.5-Air-Q80-FFN-Q5K-Q5K-Q80-v2.gguf | 91.97 | 85.66 | 6.66 | Download | | GLM-4.5-Air-Q80-FFN-Q6K-Q6K-Q80-v2.gguf | 100.99 | 94.06 | 7.31 | Download |
L3.3-Electra-R1-70b-GGUF
Q4_K_X.gguf
NVIDIA-Nemotron-3-Super-120B-A12B-GGUF
GLM-4.5-Air-Derestricted-GGUF
Meta-Llama-3-8B-Instruct-bf16-GGUF
Tess-7B-v1.4-GGUF
GPT-2-GGUF
Thespis-13b-Alpha-v0.7-GGUF
Ling-flash-2.0-Q8_0.gguf
GLM-4.5-3.34bpw.gguf
openchat-3.5-0106-GGUF-fp16
gemma-3-it-GGUF
Mistral-Small-24B-Base-2501-GGUF
GLM-4.6-Derestricted-GGUF
Qwen3-4B
This repository provides Q80 GGUF quantizations of Qwen/Qwen3-4B and Qwen/Qwen3-4B-Base.
Phi-3-mini-4k-instruct-bf16-GGUF
Qwen2.5-72B-0.6x-Instruct-GGUF
llama-13b-Q8_0
Meta-Llama-3-70B-Instruct-bf16-GGUF
Qwen2.5-14B-All-Variants-q8_0-q6_K-GGUF
OpenHermes-2.5-Mistral-7B-GGUF-fp16
Mistral-7B-v0.1-GGUF-fp16
neural-chat-7b-v3-1-GGUF-fp16
Mistral-Large-Instruct-2407-q8_0-q8_0-GGUF
dots.llm1.inst-GGUF-Q4_0-EXPERIMENTAL
GPT-2-XL-GGUF
Cassiopeia-70B
Yi-6B-GGUF-fp16
una-cybertron-7b-v2-GGUF-fp16
Mixtral-8x7B-Instruct-v0.1-bf16-GGUF
Yi-6B-200K-GGUF-fp16
Mistral-7B-Instruct-v0.1-GGUF-fp16
rocket-3B-GGUF-fp16
StrawberryLemonade-L3-70B-v1.0-GGUF
GGUF quant(s) of sophosympatheia/StrawberryLemonade-L3-70B-v1.0.
Naberius-7B-GGUF-fp16
phi-2-GGUF-fp16
Mistral-Small-Instruct-2409-q8_0-q8_0-GGUF
AI21-Jamba-Mini-1.7-GGUF
dolphin-2.1-mistral-7b-GGUF-fp16
Mistral-10.7B-Instruct-v0.2
Tess-XS-v1.0-GGUF-fp16
OrcaMaidXL-17B-32k
openchat_3.5-GGUF-fp16
Mistral-7B-OpenOrca-GGUF-fp16
dolphin-2.2.1-mistral-7b-GGUF-fp16
Qwen2.5-72B-0.6x-Instruct
Andromeda-70B
Andromeda-70B is the result of an experimental SLERP merge of Cassiopeia-70B and Sao10K/Llama-3.3-70B-Vulpecula-r1. It is a coherent, unaligned model intended to be used for creative tasks such as storywriting, brainstorming, interactive roleplay, etc. After more thorough testing by myself and others, I don't think this model is very good. :( You should use Cassiopeia or Vulpecula instead. Feedback on this merge is very welcome, good or bad! Please leave a comment in this discussion with your thoughts: Andromeda-70B/discussions/1