ZeroWw
NeuralDaredevil-8B-abliterated-GGUF
Test
Meta-Llama-3.1-8B-Instruct-abliterated-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
llama3-8B-DarkIdol-2.2-Uncensored-1048K-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Llama-3-8B-Lexi-Uncensored-GGUF
NSFW_DPO_Noromaid-7b-Mistral-7B-Instruct-v0.1-GGUF
GLM-Z1-9B-0414-GGUF
Mistral-Nemo-Base-2407-GGUF
DarkIdol-Llama-3.1-8B-Instruct-1.0-Uncensored-GGUF
Llama-3.2-3B-Instruct-abliterated-GGUF
TwinLlama-3.1-8B-GGUF
Phi-4-mini-instruct-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Llama-3.1-Storm-8B-GGUF
NuminaMath-7B-TIR-GGUF
Gemmasutra-Mini-2B-v1-GGUF
llama3-8B-DarkIdol-2.1-Uncensored-32K-GGUF
Phi-3-mini-4k-instruct-GGUF
Qwen3-8B-abliterated-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Llama-3.2-1B-Instruct-GGUF
Qwen3-8B-GGUF
Pythia-Chat-Base-7B-GGUF
Meta-Llama-3.1-8B-Claude-39fail-3000total-GGUF
Phi-3.5-mini-instruct_Uncensored-GGUF
L3-Aethora-15B-V2-GGUF
Llama3.1-8B-Enigma-GGUF
gemma-3-4b-it-abliterated-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Celeste-12B-V1.6-GGUF
Qwen3-4B-abliterated-GGUF
CodeQwen1.5-7B-Chat-GGUF
Replete-LLM-Qwen2-7b_Beta-Preview-GGUF
ghost-8b-beta-1608-GGUF
Hunyuan-7B-Instruct-GGUF
Mistral-NeMo-Minitron-8B-Instruct-GGUF
gemma-2-2b-it-GGUF
phi3-uncensored-chat-GGUF
Qwen3-4B-Thinking-2507-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Qwen3-4B-Esper3-GGUF
aya-23-8B-GGUF
gpt2-xl-GGUF
Mistral-Nemo-Instruct-2407-GGUF
Meta-Llama-3.1-8B-Instruct-GGUF
Qwen3-4B-GGUF
Llama3.1-8B-ShiningValiant2-GGUF
ghost-8b-beta-GGUF
Art-0-8B-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
glm-4-9b-chat-GGUF
Llama-3.1-Minitron-4B-Width-Base-GGUF
Phi-4-mini-reasoning-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
glm-4-9b-chat-1m-GGUF
Qwen2.5-1.5B-Instruct-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Meta-Llama-3-8B-Instruct-GGUF
L3.2-Rogue-Creative-Instruct-Uncensored-Abliterated-7B-D_AU-SILLY
Phi-3-medium-128k-instruct-GGUF
Hermes-2-Pro-Llama-3-8B-GGUF
L3-8b-Rosier-v1-GGUF
Mistral-7B-Instruct-v0.3-GGUF
llama-3-Nephilim-v3-8B-GGUF
Lumimaid-v0.2-12B-GGUF
Symbol-LLM-8B-Instruct-GGUF
codegeex4-all-9b-GGUF
microsoft_WizardLM-2-7B-GGUF
Seed-Coder-8B-Reasoning-GGUF
Smegmma-9B-v1-GGUF
Llama-3-8B-Instruct-Gradient-4194k-GGUF
llama3-turbcat-instruct-8b-GGUF
gemma-2-2b-it-abliterated-GGUF
Qwen3-4B-Instruct-2507-GGUF
Mistral-7B-Instruct-v0.3-SILLY
L3-8B-Celeste-v1-GGUF
Lumimaid-v0.2-8B-GGUF
Arcee-Spark-GGUF
L3-SthenoMaid-8B-V1-GGUF
Llama-3-8B-Instruct-Gradient-1048k-GGUF
Phi-3-mini-4k-geminified-GGUF
L3-8B-Celeste-V1.2-GGUF
L3.1-8B-Celeste-V1.5-GGUF
Llama-3.2-3B-Instruct-GGUF
aya-expanse-8b-GGUF
Qwen3-8B-Esper3-GGUF
Hathor_Stable-v0.2-L3-8B-GGUF
Meta-Llama-3-8B-Instruct-abliterated-v3-GGUF
L3-8B-Stheno-v3.3-32K-GGUF
Gemma-2-9B-It-SPPO-Iter3-GGUF
Gemmasutra-9B-v1b-GGUF
shieldgemma-2b-GGUF
L3-Blackfall-Summanus-v0.1-15B-GGUF
cogito-v1-preview-llama-8B-GGUF
DeepSeek-Coder-V2-Lite-Base-GGUF
LLaMAX3-8B-Alpaca-GGUF
Phi-3-mini-128k-instruct-GGUF
DeepSeek-V2-Lite-Chat-GGUF
LLaMAX3-8B-GGUF
Tiger-Gemma-9B-v1-GGUF
neural-chat-7b-v3-3-GGUF
Gemma-3-R1-4B-v1-GGUF
gemma-2-9b-it-GGUF
internlm2_5-7b-chat-GGUF
Qwen1.5-7B-Chat-GGUF
Smegmma-Deluxe-9B-v1-GGUF
TwinLlama-3.1-8B-SILLY
Qwen2.5-3B-Instruct-GGUF
Samantha-Qwen-2-7B-GGUF
Phi-3-song-lyrics-1.0-GGUF
DarkIdol-Llama-3.1-8B-Instruct-1.0-Uncensored-SILLY
h2ogpt-4096-llama2-13b-chat-GGUF
ghost-8b-beta-1608-SILLY
gemma-3-12b-it-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
granite-3.3-8b-instruct-GGUF
open_llama_7b_v2-GGUF
ghost-7b-alpha-GGUF
xLAM-1b-fc-r-GGUF
xLAM-7b-fc-r-GGUF
Qwen2.5-7B-Instruct-GGUF
EuroLLM-1.7B-Instruct-GGUF
internlm2_5-7b-chat-1m-GGUF
Mixtral_AI_Cyber_4.0-GGUF
Gemma-2-9B-It-SPPO-Iter3-SILLY
open_llama_3b_v2-GGUF
gemma-2-2b-it-SILLY
Gemmasutra-Mini-2B-v1-SILLY
Mistral-Nemo-12B-ArliAI-RPMax-v1.2-GGUF
Llama3.1-8B-Enigma-SILLY
Moistral-11B-v3-GGUF
ghost-8b-beta-SILLY
Meta-Llama-3.1-8B-Instruct-abliterated-SILLY
Gemmasutra-9B-v1b-SILLY
Lumimaid-v0.2-12B-SILLY
Llama-3.1-Storm-8B-SILLY
Mistral-Nemo-Instruct-2407-SILLY
L3.1-8B-Celeste-V1.5-SILLY
Llama3.1-8B-ShiningValiant2-SILLY
Phi-3.5-mini-instruct-GGUF
amoral-gemma3-4B-GGUF
DeepSeek-R1-0528-Qwen3-8B-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
gemma-3-270m-it-GGUF
Marco-o1-GGUF
DeepSeek-R1-Distill-Qwen-7B-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Josiefied-Qwen3-8B-abliterated-v1-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Meta-Llama-3.1-8B-Instruct-SILLY
Llama-3.2-3B-Instruct-abliterated-SILLY
Seed-Coder-8B-Instruct-GGUF
Mistroll-7B-v2.2-GGUF
phillama-3.8b-v0.1-GGUF
h2o-danube3-500m-chat-GGUF
h2o-danube3-4b-chat-GGUF
Lumimaid-v0.2-8B-SILLY
OpenELM-3B-Instruct-GGUF
palmer-004-turbo-GGUF
Phi-3-mini-128k-instruct-abliterated-v3-GGUF
Replete-LLM-Qwen2-7b_Beta-Preview-SILLY
aya-expanse-8b-SILLY
gemma-2-2b-it-abliterated-SILLY
NeuralPipe-7B-slerp-GGUF
SOLAR-10.7B-Instruct-v1.0-GGUF
Phi-3.5-mini-instruct-SILLY
Llama-3.2-1B-Instruct-SILLY
gemma-3-1b-it-abliterated-GGUF
Yi-1.5-9B-32K-GGUF
Yi-1.5-6B-Chat-GGUF
Llama-3.2-3B-Instruct-SILLY
Qwen2.5-1.5B-Instruct-SILLY
ZeroWw 'SILLY' version. The original model has been quantized (fq8 version) and a percentage of it's tensors have been modified adding some noise. Full colab: https://colab.research.google.com/drive/1a7seagBzu5l3k3FL4SFk0YJocl7nsDJw?usp=sharing Fast colab: https://colab.research.google.com/drive/1SDD7ox21di82Y9v68AUoy0PhkxwBVvN?usp=sharing Original reddit post: https://www.reddit.com/r/LocalLLaMA/comments/1ec0s8p/imadeasillytest/ I created a program to randomize the weights of a model. The program has 2 parameters: the percentage of weights to modify and the percentage of the original value to randmly apply to each weight. At the end I check the resulting GGUF file for binary differences. In this example I set to modify 100% of the weights of Mistral 7b Instruct v0.3 by a maximum of 15% deviation. Since the deviation is calculated on the F32 weights, when quantized to Q8\0 this changes. So, in the end I got a file that compared to the original has: The cool thing is that chatting with the model I see no apparent difference and the model still works nicely as the original. Since I am running everything on CPU, I could not run perplexity scores or anything computing intensive. As a small test, I asked the model a few questions (like the history of the roman empire) and then fact check its answer using a big model. No errors were detected.
internlm3-8b-instruct-GGUF
Mistral-Nemo-12B-ArliAI-RPMax-v1.2-SILLY
MixTAO-7Bx2-MoE-v8.1-GGUF
Qwen2.5-3B-Instruct-SILLY
EuroLLM-1.7B-Instruct-SILLY
ZeroWw 'SILLY' version. The original model has been quantized (fq8 version) and a percentage of it's tensors have been modified adding some noise. Full colab: https://colab.research.google.com/drive/1a7seagBzu5l3k3FL4SFk0YJocl7nsDJw?usp=sharing Fast colab: https://colab.research.google.com/drive/1SDD7ox21di82Y9v68AUoy0PhkxwBVvN?usp=sharing Original reddit post: https://www.reddit.com/r/LocalLLaMA/comments/1ec0s8p/imadeasillytest/ I created a program to randomize the weights of a model. The program has 2 parameters: the percentage of weights to modify and the percentage of the original value to randmly apply to each weight. At the end I check the resulting GGUF file for binary differences. In this example I set to modify 100% of the weights of Mistral 7b Instruct v0.3 by a maximum of 15% deviation. Since the deviation is calculated on the F32 weights, when quantized to Q8\0 this changes. So, in the end I got a file that compared to the original has: The cool thing is that chatting with the model I see no apparent difference and the model still works nicely as the original. Since I am running everything on CPU, I could not run perplexity scores or anything computing intensive. As a small test, I asked the model a few questions (like the history of the roman empire) and then fact check its answer using a big model. No errors were detected.
Moistral-11B-v4-GGUF
Phi-3.5-mini-instruct_Uncensored-SILLY
Phi3Unlocked-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Mistral-NeMo-Minitron-8B-Instruct-SILLY
Qwen3-0.6B-GGUF
Falcon-H1-7B-Instruct-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.
Qwen2.5-7B-Instruct-SILLY
granite-3.1-8b-instruct-GGUF
granite-3.1-8b-instruct-abliterated-GGUF
SOLAR-10.7B-Instruct-v1.0-SILLY
neural-chat-7b-v3-3-SILLY
gemma-3-1b-it-GGUF
Marco-o1-SILLY
ZeroWw 'SILLY' version. The original model has been quantized (fq8 version) and a percentage of it's tensors have been modified adding some noise. Full colab: https://colab.research.google.com/drive/1a7seagBzu5l3k3FL4SFk0YJocl7nsDJw?usp=sharing Fast colab: https://colab.research.google.com/drive/1SDD7ox21di82Y9v68AUoy0PhkxwBVvN?usp=sharing Original reddit post: https://www.reddit.com/r/LocalLLaMA/comments/1ec0s8p/imadeasillytest/ I created a program to randomize the weights of a model. The program has 2 parameters: the percentage of weights to modify and the percentage of the original value to randmly apply to each weight. At the end I check the resulting GGUF file for binary differences. In this example I set to modify 100% of the weights of Mistral 7b Instruct v0.3 by a maximum of 15% deviation. Since the deviation is calculated on the F32 weights, when quantized to Q8\0 this changes. So, in the end I got a file that compared to the original has: The cool thing is that chatting with the model I see no apparent difference and the model still works nicely as the original. Since I am running everything on CPU, I could not run perplexity scores or anything computing intensive. As a small test, I asked the model a few questions (like the history of the roman empire) and then fact check its answer using a big model. No errors were detected.
granite-3.1-3b-a800m-instruct-GGUF
Llama-Deepsync-1B-GGUF
EXAONE-Deep-2.4B-GGUF
Phi3Unlocked-SILLY
granite-3.1-2b-instruct-GGUF
EXAONE-Deep-7.8B-GGUF
Llama-3.1-Nemotron-Nano-8B-v1-GGUF
GLM-4-9B-0414-GGUF
granite-3.2-2b-instruct-GGUF
medgemma-4b-it-GGUF
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5k or q6k. Result: both f16.q6 and f16.q5 are smaller than q80 standard quantization and they perform as well as the pure f16.