ddh0

76,249

Qwen3.5-GGUF

51,181

GLM-4.5-Air-GGUF

This repository contains several custom GGUF quantizations of GLM-4.5-Air, to be used with llama.cpp. The naming scheme for these custom quantizations is as follows: > `ModelName-DefaultType-FFN-UpType-GateType-DownType.gguf` Where `DefaultType` refers to the default tensor type, and `UpType`, `GateType`, and `DownType` refer to the tensor types used for the `ffnupexps`, `ffngateexps`, and `ffndownexps` tensors respectively. These quantizations use Q80 for all tensors by default - only the dense FFN block and conditional experts are downgraded. The shared expert is always kept in Q80. They were quantized using bartowski's imatrix. | Filename | Size (GB) | Size (GiB) | Average BPW | Direct link | | -------------------------------------------- | --------- | ---------- | ----------- | ------------------------------------------------------------------------------------------------------------------ | | GLM-4.5-Air-Q80-FFN-IQ3S-IQ3S-Q50.gguf | 61.66 | 57.43 | 4.47 | Download | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ4XS-Q50.gguf | 68.56 | 63.86 | 4.97 | Download | | GLM-4.5-Air-Q80-FFN-Q4K-Q4K-Q51.gguf | 72.82 | 67.82 | 5.27 | Download | | GLM-4.5-Air-Q80-FFN-Q4K-Q4K-Q80.gguf | 83.44 | 77.71 | 6.04 | Download | | GLM-4.5-Air-Q80-FFN-Q5K-Q5K-Q80.gguf | 91.94 | 85.63 | 6.66 | Download | | GLM-4.5-Air-Q80-FFN-Q6K-Q6K-Q80.gguf | 100.97 | 94.04 | 7.31 | Download | | GLM-4.5-Air-Q80.gguf | 117.45 | 109.39 | 8.50 | Download | | GLM-4.5-Air-bf16.gguf | 220.98 | 205.81 | 16.00 | Download | These quantizations use Q80 for all tensors by default, including the dense FFN block. Only the conditional experts are downgraded. The shared expert is always kept in Q80. They were quantized using my own imatrix (the calibration text corpus can be found here). | Filename | Size (GB) | Size (GiB) | Average BPW | Direct link | | ------------------------------------------------- | --------- | ---------- | ----------- | ----------------------------------------------------------------------------------------------------------------------- | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ3S-IQ4NL-v2.gguf | 60.94 | 56.76 | 4.41 | Download | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ4XS-IQ4NL-v2.gguf | 64.39 | 59.97 | 4.66 | Download | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ4XS-Q50-v2.gguf | 68.63 | 63.92 | 4.97 | Download | | GLM-4.5-Air-Q80-FFN-IQ4XS-IQ4XS-Q80-v2.gguf | 81.36 | 75.78 | 5.89 | Download | | GLM-4.5-Air-Q80-FFN-Q5K-Q5K-Q80-v2.gguf | 91.97 | 85.66 | 6.66 | Download | | GLM-4.5-Air-Q80-FFN-Q6K-Q6K-Q80-v2.gguf | 100.99 | 94.06 | 7.31 | Download |

license:mit

5,366

Qwen3-4B

This repository provides Q80 GGUF quantizations of Qwen/Qwen3-4B and Qwen/Qwen3-4B-Base.

Phi-3-mini-4k-instruct-bf16-GGUF

license:llama3

Qwen2.5-72B-0.6x-Instruct-GGUF

llama-13b-Q8_0

base_model:huggyllama/llama-13b

Meta-Llama-3-70B-Instruct-bf16-GGUF

license:llama3

Qwen2.5-14B-All-Variants-q8_0-q6_K-GGUF

OpenHermes-2.5-Mistral-7B-GGUF-fp16

Mistral-7B-v0.1-GGUF-fp16

neural-chat-7b-v3-1-GGUF-fp16

Mistral-Large-Instruct-2407-q8_0-q8_0-GGUF

dots.llm1.inst-GGUF-Q4_0-EXPERIMENTAL

GPT-2-XL-GGUF

license:mit

Cassiopeia-70B

llama

Yi-6B-GGUF-fp16

una-cybertron-7b-v2-GGUF-fp16

Mixtral-8x7B-Instruct-v0.1-bf16-GGUF

Yi-6B-200K-GGUF-fp16

Mistral-7B-Instruct-v0.1-GGUF-fp16

rocket-3B-GGUF-fp16

license:cc-by-sa-4.0

StrawberryLemonade-L3-70B-v1.0-GGUF

GGUF quant(s) of sophosympatheia/StrawberryLemonade-L3-70B-v1.0.

license:llama3

Naberius-7B-GGUF-fp16

phi-2-GGUF-fp16

Mistral-Small-Instruct-2409-q8_0-q8_0-GGUF

AI21-Jamba-Mini-1.7-GGUF

dolphin-2.1-mistral-7b-GGUF-fp16

Mistral-10.7B-Instruct-v0.2

Tess-XS-v1.0-GGUF-fp16

OrcaMaidXL-17B-32k

llama

openchat_3.5-GGUF-fp16

Mistral-7B-OpenOrca-GGUF-fp16

dolphin-2.2.1-mistral-7b-GGUF-fp16

Qwen2.5-72B-0.6x-Instruct

Andromeda-70B

Andromeda-70B is the result of an experimental SLERP merge of Cassiopeia-70B and Sao10K/Llama-3.3-70B-Vulpecula-r1. It is a coherent, unaligned model intended to be used for creative tasks such as storywriting, brainstorming, interactive roleplay, etc. After more thorough testing by myself and others, I don't think this model is very good. :( You should use Cassiopeia or Vulpecula instead. Feedback on this merge is very welcome, good or bad! Please leave a comment in this discussion with your thoughts: Andromeda-70B/discussions/1