mlabonne
gemma-3-27b-it-abliterated-GGUF
This is an uncensored version of google/gemma-3-27b-it created with a new abliteration technique. See this article to know more about abliteration. I was playing with model weights and noticed that Gemma 3 was much more resilient to abliteration than other models like Qwen 2.5. I experimented with a few recipes to remove refusals while preserving most of the model capabilities. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. In the original technique, a refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. Here, the model was abliterated by computing a refusal direction based on hidden states (inspired by Sumandora's repo) for each layer, independently. This is combined with a refusal weight of 1.5 to upscale the importance of this refusal direction in each layer. This created a very high acceptance rate (>90%) and still produced coherent outputs.
NeuralMonarch-7B
AlphaMonarch-7B
NeuralDaredevil-8B-abliterated
This is a DPO fine-tune of mlabonne/Daredevil-8-abliterated, trained on one epoch of mlabonne/orpo-dpo-mix-40k. The DPO fine-tuning successfully recovers the performance loss due to the abliteration process, making it an excellent uncensored model. NeuralDaredevil-8B-abliterated performs better than the Instruct model on my tests. You can use it for any application that doesn't require alignment, like role-playing. Tested on LM Studio using the "Llama 3" and "Llama 3 v2" presets. Thanks to QuantFactory, ZeroWw, Zoyd, solidrust, and tarruda for providing these quants. GGUF: https://huggingface.co/QuantFactory/NeuralDaredevil-8B-abliterated-GGUF GGUF (FP16): https://huggingface.co/ZeroWw/NeuralDaredevil-8B-abliterated-GGUF EXL2: https://huggingface.co/Zoyd/mlabonneNeuralDaredevil-8B-abliterated-40bpwexl2 AWQ: https://huggingface.co/solidrust/NeuralDaredevil-8B-abliterated-AWQ ollama: 16-bit: https://ollama.com/tarruda/neuraldaredevil-8b-abliterated 8-bit: https://ollama.com/lstep/neuraldaredevil-8b-abliterated 5-bit: https://ollama.com/closex/neuraldaredevil-8b-abliterated NeuralDaredevil-8B is the best-performing uncensored 8B model on the Open LLM Leaderboard (MMLU score). Evaluation performed using LLM AutoEval. See the entire leaderboard here. | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench | |---|---:|---:|---:|---:|---:| | mlabonne/NeuralDaredevil-8B-abliterated 📄 | 55.87 | 43.73 | 73.6 | 59.36 | 46.8 | | mlabonne/Daredevil-8B 📄 | 55.87 | 44.13 | 73.52 | 59.05 | 46.77 | | mlabonne/Daredevil-8B-abliterated 📄 | 55.06 | 43.29 | 73.33 | 57.47 | 46.17 | | NousResearch/Hermes-2-Theta-Llama-3-8B 📄 | 54.28 | 43.9 | 72.62 | 56.36 | 44.23 | | openchat/openchat-3.6-8b-20240522 📄 | 53.49 | 44.03 | 73.67 | 49.78 | 46.48 | | meta-llama/Meta-Llama-3-8B-Instruct 📄 | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 | | meta-llama/Meta-Llama-3-8B 📄 | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
ChimeraLlama-3-8B-v3
Beyonder-4x7B-v3
ChimeraLlama-3-8B-v2
Daredevil-8B-abliterated
Abliterated version of mlabonne/Daredevil-8B using failspy's notebook. It based on the technique described in the blog post "Refusal in LLMs is mediated by a single direction". Thanks to Andy Arditi, Oscar Balcells Obeso, Aaquib111, Wes Gurnee, Neel Nanda, and failspy. This is an uncensored model. You can use it for any application that doesn't require alignment, like role-playing. GGUF: https://huggingface.co/mlabonne/Daredevil-8B-abliterated-GGUF Daredevil-8B-abliterated is the second best-performing 8B model on the Open LLM Leaderboard in terms of MMLU score (27 May 24). Evaluation performed using LLM AutoEval. See the entire leaderboard here. | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench | |---|---:|---:|---:|---:|---:| | mlabonne/Daredevil-8B 📄 | 55.87 | 44.13 | 73.52 | 59.05 | 46.77 | | mlabonne/Daredevil-8B-abliterated 📄 | 55.06 | 43.29 | 73.33 | 57.47 | 46.17 | | mlabonne/Llama-3-8B-Instruct-abliterated-dpomix 📄 | 52.26 | 41.6 | 69.95 | 54.22 | 43.26 | | meta-llama/Meta-Llama-3-8B-Instruct 📄 | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 | | failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 📄 | 51.21 | 40.23 | 69.5 | 52.44 | 42.69 | | mlabonne/OrpoLlama-3-8B 📄 | 48.63 | 34.17 | 70.59 | 52.39 | 37.36 | | meta-llama/Meta-Llama-3-8B 📄 | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
Meta-Llama-3.1-8B-Instruct-abliterated
This is an uncensored version of Llama 3.1 8B Instruct created with abliteration (see this article to know more about it). Special thanks to @FailSpy for the original code and technique. Please follow him if you're interested in abliterated models. New GGUF: https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF ZeroWw GGUF: https://huggingface.co/ZeroWw/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF EXL2: https://huggingface.co/Apel-sin/llama-3.1-8B-abliterated-exl2 Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |23.13| |IFEval (0-Shot) |73.29| |BBH (3-Shot) |27.13| |MATH Lvl 5 (4-Shot)| 6.42| |GPQA (0-shot) | 0.89| |MuSR (0-shot) | 3.21| |MMLU-PRO (5-shot) |27.81|
Meta-Llama-3.1-8B-Instruct-abliterated-GGUF
gemma-3-27b-it-abliterated
Gemma 3 1B Abliterated • Gemma 3 4B Abliterated • Gemma 3 12B Abliterated This is an uncensored version of google/gemma-3-27b-it created with a new abliteration technique. See this article to know more about abliteration. I was playing with model weights and noticed that Gemma 3 was much more resilient to abliteration than other models like Qwen 2.5. I experimented with a few recipes to remove refusals while preserving most of the model capabilities. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. GGUF: https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated-GGUF In the original technique, a refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. Here, the model was abliterated by computing a refusal direction based on hidden states (inspired by Sumandora's repo) for each layer, independently. This is combined with a refusal weight of 1.5 to upscale the importance of this refusal direction in each layer. This created a very high acceptance rate (>90%) and still produced coherent outputs.
gemma-3-12b-it-abliterated-GGUF
gemma-3-12b-it-abliterated-v2-GGUF
This is an uncensored version of google/gemma-3-12b-it created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. GGUF: https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated-v2-GGUF The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
Qwen3-14B-abliterated
Qwen3 Abliterated 0.6B • 1.7B • 4B • 8B • 14B • 30B-A3B This is an uncensored version of Qwen/Qwen3-14B created with a new abliteration technique. See this article to know more about abliteration. This is a research project to understand how refusals and latent fine-tuning work in LLMs. I played with different sizes of Qwen3 and noticed there was no one-size-fits-all abliteration strategy. In addition, the reasoning mode interfered with non-reasoning refusals, which made it more challenging. This made me iterate over different recipes and significantly consolidate my scripts with accumulation and better evaluations. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=0.6`, `topk=20`, `topp=0.95`, `minp=0`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
Marcoro14-7B-slerp
gemma-3-4b-it-abliterated-v2-GGUF
Beagle14-7B
NeuralMarcoro14-7B
Hermes-3-Llama-3.1-8B-lorablated-GGUF
gemma-3-4b-it-abliterated-GGUF
gemma-2b-GGUF
gemma-3-27b-it-qat-abliterated-GGUF
This is an uncensored version of google/gemma-3-27b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
Beyonder-4x7b
GML-Mistral-merged-v1
NeuralQuant-9B
NeuralPipe-9B-merged
gemma-7b-it-GGUF
Beyonder-4x7B-v2
gemma-3-12b-it-abliterated
Daredevil-7B
gemma-3-12b-it-abliterated-v2
This is an uncensored version of google/gemma-3-12b-it created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. QAT: https://huggingface.co/mlabonne/gemma-3-12b-it-qat-abliterated GGUF: https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated-v2-GGUF The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
NeuralLlama-3-8B-Instruct-abliterated
NeuralBeagle14-7B-GGUF
Llama-3.1-70B-Instruct-lorablated-GGUF
gemma-3-4b-it-abliterated
gemma-2b-it-GGUF
gemma-3-4b-it-abliterated-v2
This is an uncensored version of google/gemma-3-4b-it created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. QAT: https://huggingface.co/mlabonne/gemma-3-4b-it-qat-abliterated GGUF: https://huggingface.co/mlabonne/gemma-3-4b-it-abliterated-v2-GGUF The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
FineLlama-3.1-8B-GGUF
gemma-3-12b-it-qat-abliterated-GGUF
This is an uncensored version of google/gemma-3-12b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
gemma-3-4b-it-qat-abliterated-GGUF
This is an uncensored version of google/gemma-3-4b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
gemma-3-1b-it-abliterated-GGUF
Llama-3.1-70B-Instruct-lorablated
NeuralDaredevil-8B-abliterated-GGUF
gemma-7b-GGUF
NeuralMonarch-7B-GGUF
Gemmalpaca-2B
gemma-3-1b-it-abliterated-v2-GGUF
Qwen3-0.6B-abliterated
Beyonder-4x7B-v3-GGUF
dummy-llama-2
Qwen3-8B-abliterated
Qwen3 Abliterated 0.6B • 1.7B • 4B • 8B • 14B • 30B-A3B This is an uncensored version of Qwen/Qwen3-8B created with a new abliteration technique. See this article to know more about abliteration. This is a research project to understand how refusals and latent fine-tuning work in LLMs. I played with different sizes of Qwen3 and noticed there was no one-size-fits-all abliteration strategy. In addition, the reasoning mode interfered with non-reasoning refusals, which made it more challenging. This made me iterate over different recipes and significantly consolidate my scripts with accumulation and better evaluations. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=0.6`, `topk=20`, `topp=0.95`, `minp=0`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
Daredevil-8B-abliterated-GGUF
AlphaMonarch-7B-GGUF
Meta-Llama-3-8B
Qwen3-30B-A3B-abliterated
Gemmalpaca-2B-GGUF
TwinLlama 3.1 8B
TwinLlama-3.1-8B is a model created for the LLM Engineer's Handbook, trained on mlabonne/llmtwin. It is designed to act as a digital twin, which is a clone of myself and my co-authors (Paul Iusztin and Alex Vesa), imitating our writing style and drawing knowledge from our articles. This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Hermes-3-Llama-3.1-70B-lorablated
Qwen3-4B-abliterated
Qwen3 Abliterated 0.6B • 1.7B • 4B • 8B • 14B • 30B-A3B This is an uncensored version of Qwen/Qwen3-4B created with a new abliteration technique. See this article to know more about abliteration. This is a research project to understand how refusals and latent fine-tuning work in LLMs. I played with different sizes of Qwen3 and noticed there was no one-size-fits-all abliteration strategy. In addition, the reasoning mode interfered with non-reasoning refusals, which made it more challenging. This made me iterate over different recipes and significantly consolidate my scripts with accumulation and better evaluations. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=0.6`, `topk=20`, `topp=0.95`, `minp=0`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
Qwen3-1.7B-abliterated
gemma-3-27b-it-qat-abliterated
EvolCodeLlama-7b-GGUF
gemma-3-1b-it-qat-abliterated-GGUF
This is an uncensored version of google/gemma-3-1b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.
chesspythia-70m
Qwen3-0.6B-abliterated-GGUF
gemma-3-4b-it-qat-abliterated
Daredevil-8B
NeuralHermes-2.5-Mistral-7B
OrpoLlama-3-8B
Hermes-3-Llama-3.1-8B-lorablated
BigQwen2.5-Echo-47B-Instruct
Llama-3-8B-Instruct-abliterated-dpomix-GGUF
NeuralHermes-2.5-Mistral-7B-GGUF
gemma-3-1b-it-abliterated
TwinLlama-3.1-8B-GGUF
NeuralHermes-2.5-Mistral-7B-laser-GGUF
phixtral-2x2_8
Monarch-7B-GGUF
TwinLlama-3.1-8B-DPO
BigLlama-3.1-1T-Instruct
NeuralDaredevil-7B
NeuralDaredevil-8B-abliterated-AWQ
NeuralBeagle14-7B
phixtral-4x2_8
Daredevil-8B-GGUF
gemma-3-12b-it-qat-abliterated
SmolGRPO-135M
gemma-3-1b-it-abliterated-v2
phi-2-orange-v2-GGUF
TwinLlama-3.1-8B-DPO-GGUF
DatacampLlama-3.1-8B-gguf
FineLlama-3.1-8B
codellama-2-7b
BigQwen2.5-52B-Instruct
NeuralMarcoro14-7B-GGUF
llama-2-13b-guanaco
Meta-Llama-3-225B-Instruct
gemma-3-1b-it-qat-abliterated
This is an uncensored version of google/gemma-3-1b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.