noctrex
Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF
gemma-4-26B-A4B-it-MXFP4_MOE-GGUF
Chandra-OCR-GGUF
Original model: https://huggingface.co/datalab-to/chandra Try to use the best quality you can run. For the mmproj, try to use the F32 version as it will produce the best results. F32 > BF16 > F16
GLM-4.7-Flash-MXFP4_MOE-GGUF
Qwen3-Coder-Next-MXFP4_MOE-GGUF
LightOnOCR-1B-1025-i1-GGUF
This are the imatrix quantizations of the model LightOnOCR-1B-1025 Original model: https://huggingface.co/lightonai/LightOnOCR-1B-1025 Try to use the best quality you can run. For the mmproj, try to use the F32 version as it will produce the best results. F32 > BF16 > F16
Huihui-Qwen3.5-35B-A3B-abliterated-MXFP4_MOE-GGUF
Qwen3.5-35B-A3B-MXFP4_MOE-GGUF
Huihui-Qwen3-VL-4B-Instruct-abliterated-GGUF
GLM-4.5-Air-REAP-82B-A12B-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model GLM-4.5-Air-REAP-82B-A12B Original model: https://huggingface.co/cerebras/GLM-4.5-Air-REAP-82B-A12B
Qwen3 VL 32B Thinking GGUF
Qwen3.5-122B-A10B-MXFP4_MOE-GGUF
Huihui-Qwen3-VL-8B-Instruct-abliterated-GGUF
Pixtral-12B-Captioner-Relaxed-GGUF
Huihui Qwen3 VL 30B A3B Instruct Abliterated MXFP4 MOE GGUF
Huihui-Mistral-Small-3.2-24B-Instruct-2506-abliterated-v2-GGUF
These are quantizations of the model Huihui-Mistral-Small-3.2-24B-Instruct-2506-abliterated-v2 Original model: https://huggingface.co/huihui-ai/Huihui-Mistral-Small-3.2-24B-Instruct-2506-abliterated-v2
Qwen3-Coder-REAP-25B-A3B-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Qwen3-Coder-REAP-25B-A3B Added an imatrix version, based on the imatrix from bartowski. I also created my own imatrix versions, which are marked as codetiny-exp and codemedium-exp. This is considered experimental. What I did, is that I took a very specific dataset, that is ONLY for coding and not for general knowledge. It's codetiny dataset from eaddario/imatrix-calibration And codemedium dataset from eaddario/imatrix-calibration I thought that would be better suited, as this a coding specific model. But further tests must be done. Please provide feedback. Original model: https://huggingface.co/cerebras/Qwen3-Coder-REAP-25B-A3B
MiniMax-M2-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model MiniMax-M2 Original model: https://huggingface.co/unsloth/MiniMax-M2 It seems that the original model I quantized had chat template problems, so I re-quantized the unsloth version of it that has template fixes. Please delete the old one and download the new quant.
Gelato-30B-A3B-i1-GGUF
Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE-GGUF
Gelato-30B-A3B-GGUF
These are quantizations of the model Gelato-30B-A3B The imatrix has been used from mradermacher. As most of the quants are available from the great mradermacher team, I will include here only the quants that are missing. Usage Notes: - Download the latest llama.cpp to use these quantizations. - Try to use the best quality you can run. - For the `mmproj` file, the F32 version is recommended for best results (F32 > BF16 > F16).
Huihui-gpt-oss-120b-abliterated-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Huihui-gpt-oss-120b-BF16-abliterated-v2 Original model: https://huggingface.co/huihui-ai/Huihui-gpt-oss-120b-BF16-abliterated-v2
Qwen3-VL-235B-A22B-Instruct-1M-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Qwen3-VL-235B-A22B-Instruct Original model: https://huggingface.co/unsloth/Qwen3-VL-235B-A22B-Instruct
Qwen3-VL-235B-A22B-Thinking-1M-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Qwen3-VL-235B-A22B-Thinking Original model: https://huggingface.co/unsloth/Qwen3-VL-235B-A22B-Thinking-1M This is the version from unloth that has expanded the context size from 256k to 1M.
Qwen3-VL-30B-A3B-Thinking-abliterated-GGUF
LightOnOCR-1B-1025-GGUF
This are the quantizations of the model LightOnOCR-1B-1025 Original model: https://huggingface.co/lightonai/LightOnOCR-1B-1025 Try to use the best quality you can run. For the mmproj, try to use the F32 version as it will produce the best results. F32 > BF16 > F16
Huihui-Qwen3-VL-8B-Thinking-abliterated-GGUF
Huihui-gpt-oss-20b-abliterated-v2-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Huihui-gpt-oss-20b-BF16-abliterated-v2 Original model: https://huggingface.co/huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-v2
InternVL3_5-30B-A3B-MXFP4_MOE-GGUF
Huihui-Qwen3-VL-2B-Instruct-abliterated-GGUF
These are quantizations of the model Huihui-Qwen3-VL-2B-Instruct-abliterated. These quantizations were created using an imatrix merged from combined\all\large and harmful.txt to leverage the abliterated nature of the model. Usage Notes: - Download the latest llama.cpp to use these quantizations. - Try to use the best quality you can run. - For the `mmproj` file, the F32 version is recommended for best results (F32 > BF16 > F16).
Nemotron-3-Nano-30B-A3B-MXFP4_MOE-GGUF
Qwen3-Next-80B-A3B-Thinking-1M-MXFP4_MOE-GGUF
Tongyi DeepResearch 30B A3B MXFP4 MOE GGUF
This is a MXFP4MOE quantization of the model Tongyi-DeepResearch-30B-A3B Original model: https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
Huihui-Qwen3-VL-4B-Thinking-abliterated-GGUF
cogito-v2-preview-llama-109B-MoE-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model cogito-v2-preview-llama-109B-MoE Model quantized with BF16 GGUF's from: https://huggingface.co/unsloth/cogito-v2-preview-llama-109B-MoE-GGUF Original model: https://huggingface.co/deepcogito/cogito-v2-preview-llama-109B-MoE
Qwen3-Next-80B-A3B-Instruct-1M-MXFP4_MOE-GGUF
DeepSeek-MoE-16B-Chat-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model DeepSeek-MoE-16B-Chat Original model: https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat
Qwen3 30B A3B CoderThinking YOYO Linear MXFP4 MOE GGUF
This is a MXFP4MOE quantization of the model Qwen3-30B-A3B-CoderThinking-YOYO-linear Original model: https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-CoderThinking-YOYO-linear
Ling-flash-2.0-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Ling-flash-2.0 Original model: https://huggingface.co/inclusionAI/Ling-flash-2.0
Qwen3-30B-A3B-Deepseek-Distill-Instruct-2507-MXFP4_MOE-GGUF
GLM-4.7-Flash-REAP-23B-A3B-MXFP4_MOE-GGUF
Kimi-VL-A3B-Thinking-2506-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Kimi-VL-A3B-Thinking-2506 Original model: https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking-2506
Qwen3.5-397B-A17B-MXFP4_MOE-GGUF
Qwen3-Next-80B-A3B-Instruct-MXFP4_MOE-GGUF
This is a MXFP4 quant of Qwen3-Next-80B-A3B-Instruct The context has been extended from 256k to 1M, with YaRN as seen on the repo To enable it, run llama.cpp with options like: `--ctx-size 0 --rope-scaling yarn --rope-scale 4` ctx-size 0 sets it to 1M context, else set a smaller number like 524288 for 512k You can use also as normal if you don't want the extended context.
Huihui-Ring-mini-2.0-abliterated-MXFP4_MOE-GGUF
Huihui-Tongyi-DeepResearch-30B-A3B-abliterated-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Huihui-Tongyi-DeepResearch-30B-A3B-abliterated Original model: https://huggingface.co/huihui-ai/Huihui-Tongyi-DeepResearch-30B-A3B-abliterated
Qwen3-Next-80B-A3B-Thinking-MXFP4_MOE-GGUF
This is a MXFP4 quant of Qwen3-Next-80B-A3B-Thinking The context has been extended from 256k to 1M, with YaRN as seen on the repo To enable it, run llama.cpp with options like: `--ctx-size 0 --rope-scaling yarn --rope-scale 4` ctx-size 0 sets it to 1M context, else set a smaller number like 524288 for 512k You can use also as normal if you don't want the extended context.
SmallThinker-21B-A3B-Instruct-MXFP4_MOE-GGUF
Llama-4-Scout-17B-16E-Instruct-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Llama-4-Scout-17B-16E-Instruct Model quantized with BF16 GGUF's from: https://huggingface.co/unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF Original model: https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct
Qwen3 30B A3B YOYO V4 MXFP4 MOE GGUF
This is a MXFP4MOE quantization of the model Qwen3-30B-A3B-YOYO-V4 Original model: https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-YOYO-V4
Qwen3-Coder-30B-A3B-Instruct-1M-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Qwen3-Coder-30B-A3B-Instruct-1M Original model: https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF Also added an imatrix version, based on the imatrix from unsloth. This is the version from unloth that has expanded the context size from 256k to 1M.
Qwen3-Coder-30B-A3B-Instruct-MXFP4_MOE-GGUF
MiniMax-M2-REAP-139B-A10B-MXFP4_MOE-GGUF
DavidAU-Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B Original model: https://huggingface.co/DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B
INTELLECT-3-MXFP4_MOE-GGUF
Huihui-Qwen3-VL-2B-Thinking-abliterated-GGUF
The-Philosopher-Zephyr-7B-GGUF
These are quantizations of the model The-Philosopher-Zephyr-7B Original model: https://huggingface.co/Hypersniper/ThePhilosopherZephyr7B It's an older model back from 2023, based on the older Mistral 7B. Why quantize it now in 2025? Its so old! Well, I'm experimenting with importance matrices, and I used the text\en\large set stitched together with various philosophical stuff. Turns out it's quite fun!
Huihui-Granite-4.0-H-Tiny-abliterated-MXFP4_MOE-GGUF
TheDrummer-GLM-Steam-106B-A12B-v1-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model GLM-Steam-106B-A12B-v1 Original model: https://huggingface.co/TheDrummer/GLM-Steam-106B-A12B-v1
Qwen3-Coder-REAP-246B-A35B-MXFP4_MOE-GGUF
Huihui-Ling-mini-2.0-abliterated-MXFP4_MOE-GGUF
ERNIE-4.5-21B-A3B-PT-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model ERNIE-4.5-21B-A3B-PT Model quantized with BF16 GGUF's from: https://huggingface.co/unsloth/ERNIE-4.5-21B-A3B-PT-GGUF Original model: https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-PT
P1-30B-A3B-MXFP4_MOE-GGUF
LFM2-8B-A1B-MXFP4_MOE-GGUF
Qwen3-Coder-Next-REAM-MXFP4_MOE-GGUF
LLaDA-MoE-7B-A1B-Instruct-TD-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model LLaDA-MoE-7B-A1B-Instruct-TD: A specialized instruction-tuned model, further optimized for accelerated inference using Trajectory Distillation. Also created a quant with an imatrix from mradermacher. Original model: https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Instruct-TD
Ring-flash-2.0-MXFP4_MOE-GGUF
Qwen3-30B-A3B-Mixture-2507-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Qwen3-30B-A3B-Mixture-2507 Original model: https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-Mixture-2507
Huihui-Ling-mini-2.0-abliterated-i1-GGUF
Huihui-MoE-60B-A3B-abliterated-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Huihui-MoE-60B-A3B-abliterated Model quantized with F16 GGUF’s from: https://huggingface.co/DevQuasar/huihui-ai.Huihui-MoE-60B-A3B-abliterated-GGUF Original model: https://huggingface.co/DevQuasar/huihui-ai.Huihui-MoE-60B-A3B-abliterated-GGUF
Granite-4.0-H-Small-MXFP4_MOE-GGUF
Qwen3-VL-30B-A3B-Instruct-1M-MXFP4_MOE-GGUF
These are quantizations of the model Qwen3-VL-30B-A3B-Instruct Original model: https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Instruct This is the 1M context length variant from unsloth, with their imatrix applied to it.
Mixtral-8x7B-Instruct-v0.1-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Mixtral-8x7B-Instruct-v0.1 Original model: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
Huihui-Granite-4.0-H-Micro-abliterated-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Huihui-Granite-4.0-H-Micro-abliterated Original model: https://huggingface.co/huihui-ai/Huihui-granite-4.0-h-Micro-abliterated
Nemotron-Cascade-14B-Thinking-MXFP4-GGUF
AI21-Jamba-Mini-1.7-MXFP4_MOE-GGUF
Qwen3 Yoyo V4 42B A3B Thinking TOTAL RECAL MXFP4 MOE GGUF
This is a MXFP4MOE quantization of the model Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL Original model: https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL
Huihui-MoE-23B-A4B-abliterated-MXFP4_MOE-GGUF
PromptCoT-2.0-SelfPlay-30B-A3B-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model PromptCoT-2.0-SelfPlay-30B-A3B Original model: https://huggingface.co/xl-zhao/PromptCoT-2.0-SelfPlay-30B-A3B
Ring-mini-2.0-MXFP4_MOE-GGUF
Ling-Coder-lite-MXFP4_MOE-GGUF
Qwen3-VL-30B-A3B-Thinking-1M-MXFP4_MOE-GGUF
These are quantizations of the model Qwen3-VL-30B-A3B-Thinking Original model: https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Thinking This is the 1M context length variant from unsloth, with their imatrix applied to it.
SimpleChat-30BA3B-V3-MXFP4_MOE-GGUF
Ling-mini-2.0-MXFP4_MOE-GGUF
dolphin-2.7-mixtral-8x7b-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model dolphin-2.7-mixtral-8x7b Original model: https://huggingface.co/dphn/dolphin-2.7-mixtral-8x7b
GroveMoE-Inst-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model GroveMoE-Inst Original model: https://huggingface.co/inclusionAI/GroveMoE-Inst
MiniMax-M2-REAP-172B-A10B-MXFP4_MOE-GGUF
Granite-4.0-H-Tiny-MXFP4_MOE-GGUF
Ling-Mini-2.0-Identity-GGUF
This is a MXFP4MOE quantization of the model Ling-Mini-2.0-Identity Original model: https://huggingface.co/qingy2024/Ling-Mini-2.0-Identity
aquif-3.5-A4B-Think-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model aquif-3.5-A4B-Think Original model: https://huggingface.co/aquif-ai/aquif-3.5-A4B-Think
Ling-lite-1.5-2507-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Ling-lite-1.5-2507 Original model: https://huggingface.co/inclusionAI/Ling-lite-1.5-2507
SERA-32B-GGUF
Huihui-MiroThinker-v1.0-30B-abliterated-MXFP4_MOE-GGUF
MiniMax-M2-REAP-162B-A10B-MXFP4_MOE-GGUF
SmallThinker-4B-A0.6B-Instruct-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model SmallThinker-4BA0.6B-Instruct A quantization with imatrix is also included Original model: https://huggingface.co/PowerInfer/SmallThinker-4BA0.6B-Instruct
Phi-mini-MoE-instruct-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Phi-mini-MoE-instruct Model quantized with F16 GGUF's from: https://huggingface.co/gabriellarson/Phi-mini-MoE-instruct-GGUF Original model: https://huggingface.co/microsoft/Phi-mini-MoE-instruct
aquif-3-moe-17B-A2.8B-Think-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model aquif-3-moe-17B-A2.8B-Think Original model: https://huggingface.co/aquif-ai/aquif-3-moe-17B-A2.8B-Think
LLaDA-MoE-7B-A1B-Instruct-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model LLaDA-MoE-7B-A1B-Instruct Original model: https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Instruct
ERNIE-4.5-21B-A3B-Thinking-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model ERNIE-4.5-21B-A3B-Thinking Model quantized with BF16 GGUF's from: https://huggingface.co/unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF Original model: https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking
Moonlight-16B-A3B-Instruct-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Moonlight-16B-A3B-Instruct Original model: https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct
Huihui-MoE-4.8B-A1.7B-abliterated-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Huihui-MoE-4.8B-A1.7B-abliterated Model quantized with F16 GGUF's from: https://huggingface.co/DevQuasar/huihui-ai.Huihui-MoE-4.8B-A1.7B-abliterated-GGUF Original model: https://huggingface.co/huihui-ai/Huihui-MoE-4.8B-A1.7B-abliterated
Pristine-8B-A1B-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Pristine-8B-A1B Also a imatrix quantization is included Original model: https://huggingface.co/qingy2024/Pristine-8B-A1B
Huihui-MoE-12B-A4B-abliterated-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Huihui-MoE-12B-A4B-abliterated Original model: https://huggingface.co/huihui-ai/Huihui-MoE-12B-A4B-abliterated
grok-2-MXFP4_MOE-GGUF
OLMoE-1B-7B-0125-Instruct-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model OLMoE-1B-7B-0125-Instruct Model quantized with F16 GGUF’s from: https://huggingface.co/DevQuasar/allenai.OLMoE-1B-7B-0125-Instruct-GGUF Original model: https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct
ERNIE-4.5-300B-A47B-PT-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model ERNIE-4.5-300B-A47B-PT Original model: https://huggingface.co/baidu/ERNIE-4.5-300B-A47B-PT This model's GGUF's have been removed, in order to conserve my repos use of space. If you want it, just message me, and I will make it available on demand.
Phi-3.5-MoE-instruct-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Phi-3.5-MoE-instruct Original model: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
DeepSeek-V3.1-Terminus-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model DeepSeek-V3.1-Terminus Model quantized with BF16 GGUF's from: https://huggingface.co/unsloth/DeepSeek-V3.1-Terminus-GGUF Original model: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus
Llama3.2-ColdBrew-4x3B-Argon-test1-MXFP4_MOE-GGUF
Intern-S1-MXFP4_MOE-GGUF
This is a MXFP4MOE quantization of the model Intern-S1 Original model: https://huggingface.co/internlm/Intern-S1