mlabonne

169 models • 17 total models in database
Sort by:

gemma-3-27b-it-abliterated-GGUF

This is an uncensored version of google/gemma-3-27b-it created with a new abliteration technique. See this article to know more about abliteration. I was playing with model weights and noticed that Gemma 3 was much more resilient to abliteration than other models like Qwen 2.5. I experimented with a few recipes to remove refusals while preserving most of the model capabilities. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. In the original technique, a refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. Here, the model was abliterated by computing a refusal direction based on hidden states (inspired by Sumandora's repo) for each layer, independently. This is combined with a refusal weight of 1.5 to upscale the importance of this refusal direction in each layer. This created a very high acceptance rate (>90%) and still produced coherent outputs.

NaNK
35,766
187

NeuralMonarch-7B

NaNK
license:cc-by-nc-4.0
28,827
12

AlphaMonarch-7B

NaNK
license:cc-by-nc-4.0
13,182
148

NeuralDaredevil-8B-abliterated

This is a DPO fine-tune of mlabonne/Daredevil-8-abliterated, trained on one epoch of mlabonne/orpo-dpo-mix-40k. The DPO fine-tuning successfully recovers the performance loss due to the abliteration process, making it an excellent uncensored model. NeuralDaredevil-8B-abliterated performs better than the Instruct model on my tests. You can use it for any application that doesn't require alignment, like role-playing. Tested on LM Studio using the "Llama 3" and "Llama 3 v2" presets. Thanks to QuantFactory, ZeroWw, Zoyd, solidrust, and tarruda for providing these quants. GGUF: https://huggingface.co/QuantFactory/NeuralDaredevil-8B-abliterated-GGUF GGUF (FP16): https://huggingface.co/ZeroWw/NeuralDaredevil-8B-abliterated-GGUF EXL2: https://huggingface.co/Zoyd/mlabonneNeuralDaredevil-8B-abliterated-40bpwexl2 AWQ: https://huggingface.co/solidrust/NeuralDaredevil-8B-abliterated-AWQ ollama: 16-bit: https://ollama.com/tarruda/neuraldaredevil-8b-abliterated 8-bit: https://ollama.com/lstep/neuraldaredevil-8b-abliterated 5-bit: https://ollama.com/closex/neuraldaredevil-8b-abliterated NeuralDaredevil-8B is the best-performing uncensored 8B model on the Open LLM Leaderboard (MMLU score). Evaluation performed using LLM AutoEval. See the entire leaderboard here. | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench | |---|---:|---:|---:|---:|---:| | mlabonne/NeuralDaredevil-8B-abliterated 📄 | 55.87 | 43.73 | 73.6 | 59.36 | 46.8 | | mlabonne/Daredevil-8B 📄 | 55.87 | 44.13 | 73.52 | 59.05 | 46.77 | | mlabonne/Daredevil-8B-abliterated 📄 | 55.06 | 43.29 | 73.33 | 57.47 | 46.17 | | NousResearch/Hermes-2-Theta-Llama-3-8B 📄 | 54.28 | 43.9 | 72.62 | 56.36 | 44.23 | | openchat/openchat-3.6-8b-20240522 📄 | 53.49 | 44.03 | 73.67 | 49.78 | 46.48 | | meta-llama/Meta-Llama-3-8B-Instruct 📄 | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 | | meta-llama/Meta-Llama-3-8B 📄 | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |

NaNK
llama
11,596
249

ChimeraLlama-3-8B-v3

NaNK
llama
8,241
15

Beyonder-4x7B-v3

NaNK
license:cc-by-nc-4.0
7,967
59

ChimeraLlama-3-8B-v2

NaNK
llama
7,944
14

Daredevil-8B-abliterated

Abliterated version of mlabonne/Daredevil-8B using failspy's notebook. It based on the technique described in the blog post "Refusal in LLMs is mediated by a single direction". Thanks to Andy Arditi, Oscar Balcells Obeso, Aaquib111, Wes Gurnee, Neel Nanda, and failspy. This is an uncensored model. You can use it for any application that doesn't require alignment, like role-playing. GGUF: https://huggingface.co/mlabonne/Daredevil-8B-abliterated-GGUF Daredevil-8B-abliterated is the second best-performing 8B model on the Open LLM Leaderboard in terms of MMLU score (27 May 24). Evaluation performed using LLM AutoEval. See the entire leaderboard here. | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench | |---|---:|---:|---:|---:|---:| | mlabonne/Daredevil-8B 📄 | 55.87 | 44.13 | 73.52 | 59.05 | 46.77 | | mlabonne/Daredevil-8B-abliterated 📄 | 55.06 | 43.29 | 73.33 | 57.47 | 46.17 | | mlabonne/Llama-3-8B-Instruct-abliterated-dpomix 📄 | 52.26 | 41.6 | 69.95 | 54.22 | 43.26 | | meta-llama/Meta-Llama-3-8B-Instruct 📄 | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 | | failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 📄 | 51.21 | 40.23 | 69.5 | 52.44 | 42.69 | | mlabonne/OrpoLlama-3-8B 📄 | 48.63 | 34.17 | 70.59 | 52.39 | 37.36 | | meta-llama/Meta-Llama-3-8B 📄 | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |

NaNK
llama
7,903
56

Meta-Llama-3.1-8B-Instruct-abliterated

This is an uncensored version of Llama 3.1 8B Instruct created with abliteration (see this article to know more about it). Special thanks to @FailSpy for the original code and technique. Please follow him if you're interested in abliterated models. New GGUF: https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF ZeroWw GGUF: https://huggingface.co/ZeroWw/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF EXL2: https://huggingface.co/Apel-sin/llama-3.1-8B-abliterated-exl2 Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |23.13| |IFEval (0-Shot) |73.29| |BBH (3-Shot) |27.13| |MATH Lvl 5 (4-Shot)| 6.42| |GPQA (0-shot) | 0.89| |MuSR (0-shot) | 3.21| |MMLU-PRO (5-shot) |27.81|

NaNK
llama
7,433
183

Meta-Llama-3.1-8B-Instruct-abliterated-GGUF

NaNK
base_model:meta-llama/Llama-3.1-8B-Instruct
6,480
154

gemma-3-27b-it-abliterated

Gemma 3 1B Abliterated • Gemma 3 4B Abliterated • Gemma 3 12B Abliterated This is an uncensored version of google/gemma-3-27b-it created with a new abliteration technique. See this article to know more about abliteration. I was playing with model weights and noticed that Gemma 3 was much more resilient to abliteration than other models like Qwen 2.5. I experimented with a few recipes to remove refusals while preserving most of the model capabilities. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. GGUF: https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated-GGUF In the original technique, a refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. Here, the model was abliterated by computing a refusal direction based on hidden states (inspired by Sumandora's repo) for each layer, independently. This is combined with a refusal weight of 1.5 to upscale the importance of this refusal direction in each layer. This created a very high acceptance rate (>90%) and still produced coherent outputs.

NaNK
5,085
233

gemma-3-12b-it-abliterated-GGUF

NaNK
3,098
48

gemma-3-12b-it-abliterated-v2-GGUF

This is an uncensored version of google/gemma-3-12b-it created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. GGUF: https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated-v2-GGUF The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
2,951
32

Qwen3-14B-abliterated

Qwen3 Abliterated 0.6B • 1.7B • 4B • 8B • 14B • 30B-A3B This is an uncensored version of Qwen/Qwen3-14B created with a new abliteration technique. See this article to know more about abliteration. This is a research project to understand how refusals and latent fine-tuning work in LLMs. I played with different sizes of Qwen3 and noticed there was no one-size-fits-all abliteration strategy. In addition, the reasoning mode interfered with non-reasoning refusals, which made it more challenging. This made me iterate over different recipes and significantly consolidate my scripts with accumulation and better evaluations. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=0.6`, `topk=20`, `topp=0.95`, `minp=0`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
license:apache-2.0
2,815
44

Marcoro14-7B-slerp

NaNK
license:cc-by-nc-4.0
2,712
30

gemma-3-4b-it-abliterated-v2-GGUF

NaNK
1,923
12

Beagle14-7B

NaNK
license:cc-by-nc-4.0
1,253
15

NeuralMarcoro14-7B

NaNK
license:cc-by-nc-4.0
1,241
39

Hermes-3-Llama-3.1-8B-lorablated-GGUF

NaNK
base_model:mlabonne/Hermes-3-Llama-3.1-8B-lorablated
1,124
26

gemma-3-4b-it-abliterated-GGUF

NaNK
1,052
23

gemma-2b-GGUF

NaNK
744
32

gemma-3-27b-it-qat-abliterated-GGUF

This is an uncensored version of google/gemma-3-27b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
716
8

Beyonder-4x7b

NaNK
license:apache-2.0
700
9

GML-Mistral-merged-v1

license:apache-2.0
682
1

NeuralQuant-9B

NaNK
license:apache-2.0
680
0

NeuralPipe-9B-merged

NaNK
license:apache-2.0
678
4

gemma-7b-it-GGUF

NaNK
667
51

Beyonder-4x7B-v2

NaNK
619
128

gemma-3-12b-it-abliterated

NaNK
572
21

Daredevil-7B

NaNK
license:cc-by-nc-4.0
566
12

gemma-3-12b-it-abliterated-v2

This is an uncensored version of google/gemma-3-12b-it created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. QAT: https://huggingface.co/mlabonne/gemma-3-12b-it-qat-abliterated GGUF: https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated-v2-GGUF The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
517
11

NeuralLlama-3-8B-Instruct-abliterated

NaNK
llama
501
10

NeuralBeagle14-7B-GGUF

NaNK
license:cc-by-nc-4.0
466
47

Llama-3.1-70B-Instruct-lorablated-GGUF

NaNK
base_model:meta-llama/Llama-3.1-70B-Instruct
442
46

gemma-3-4b-it-abliterated

NaNK
372
25

gemma-2b-it-GGUF

NaNK
305
13

gemma-3-4b-it-abliterated-v2

This is an uncensored version of google/gemma-3-4b-it created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. QAT: https://huggingface.co/mlabonne/gemma-3-4b-it-qat-abliterated GGUF: https://huggingface.co/mlabonne/gemma-3-4b-it-abliterated-v2-GGUF The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
300
12

FineLlama-3.1-8B-GGUF

NaNK
llama
296
7

gemma-3-12b-it-qat-abliterated-GGUF

This is an uncensored version of google/gemma-3-12b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
281
7

gemma-3-4b-it-qat-abliterated-GGUF

This is an uncensored version of google/gemma-3-4b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
254
1

gemma-3-1b-it-abliterated-GGUF

NaNK
240
7

Llama-3.1-70B-Instruct-lorablated

NaNK
llama
230
77

NeuralDaredevil-8B-abliterated-GGUF

NaNK
192
5

gemma-7b-GGUF

NaNK
189
4

NeuralMonarch-7B-GGUF

NaNK
177
1

Gemmalpaca-2B

NaNK
175
14

gemma-3-1b-it-abliterated-v2-GGUF

NaNK
171
2

Qwen3-0.6B-abliterated

NaNK
license:apache-2.0
166
9

Beyonder-4x7B-v3-GGUF

NaNK
license:cc-by-nc-4.0
154
23

dummy-llama-2

llama
146
8

Qwen3-8B-abliterated

Qwen3 Abliterated 0.6B • 1.7B • 4B • 8B • 14B • 30B-A3B This is an uncensored version of Qwen/Qwen3-8B created with a new abliteration technique. See this article to know more about abliteration. This is a research project to understand how refusals and latent fine-tuning work in LLMs. I played with different sizes of Qwen3 and noticed there was no one-size-fits-all abliteration strategy. In addition, the reasoning mode interfered with non-reasoning refusals, which made it more challenging. This made me iterate over different recipes and significantly consolidate my scripts with accumulation and better evaluations. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=0.6`, `topk=20`, `topp=0.95`, `minp=0`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
license:apache-2.0
128
18

Daredevil-8B-abliterated-GGUF

NaNK
126
10

AlphaMonarch-7B-GGUF

NaNK
124
33

Meta-Llama-3-8B

NaNK
llama
124
2

Qwen3-30B-A3B-abliterated

NaNK
license:apache-2.0
120
35

Gemmalpaca-2B-GGUF

NaNK
120
7

TwinLlama 3.1 8B

TwinLlama-3.1-8B is a model created for the LLM Engineer's Handbook, trained on mlabonne/llmtwin. It is designed to act as a digital twin, which is a clone of myself and my co-authors (Paul Iusztin and Alex Vesa), imitating our writing style and drawing knowledge from our articles. This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK
llama
113
24

Hermes-3-Llama-3.1-70B-lorablated

NaNK
llama
100
31

Qwen3-4B-abliterated

Qwen3 Abliterated 0.6B • 1.7B • 4B • 8B • 14B • 30B-A3B This is an uncensored version of Qwen/Qwen3-4B created with a new abliteration technique. See this article to know more about abliteration. This is a research project to understand how refusals and latent fine-tuning work in LLMs. I played with different sizes of Qwen3 and noticed there was no one-size-fits-all abliteration strategy. In addition, the reasoning mode interfered with non-reasoning refusals, which made it more challenging. This made me iterate over different recipes and significantly consolidate my scripts with accumulation and better evaluations. Note that this is fairly experimental, so it might not turn out as well as expected. I recommend using these generation parameters: `temperature=0.6`, `topk=20`, `topp=0.95`, `minp=0`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
license:apache-2.0
97
15

Qwen3-1.7B-abliterated

NaNK
license:apache-2.0
96
15

gemma-3-27b-it-qat-abliterated

NaNK
85
19

EvolCodeLlama-7b-GGUF

NaNK
84
2

gemma-3-1b-it-qat-abliterated-GGUF

This is an uncensored version of google/gemma-3-1b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
77
0

chesspythia-70m

license:apache-2.0
72
4

Qwen3-0.6B-abliterated-GGUF

NaNK
license:apache-2.0
60
3

gemma-3-4b-it-qat-abliterated

NaNK
59
5

Daredevil-8B

NaNK
llama
51
43

NeuralHermes-2.5-Mistral-7B

NaNK
license:apache-2.0
48
152

OrpoLlama-3-8B

NaNK
llama
48
53

Hermes-3-Llama-3.1-8B-lorablated

NaNK
llama
48
34

BigQwen2.5-Echo-47B-Instruct

NaNK
license:apache-2.0
48
3

Llama-3-8B-Instruct-abliterated-dpomix-GGUF

NaNK
33
1

NeuralHermes-2.5-Mistral-7B-GGUF

NaNK
license:apache-2.0
32
7

gemma-3-1b-it-abliterated

NaNK
31
8

TwinLlama-3.1-8B-GGUF

NaNK
llama
28
3

NeuralHermes-2.5-Mistral-7B-laser-GGUF

NaNK
27
7

phixtral-2x2_8

license:mit
26
149

Monarch-7B-GGUF

NaNK
26
0

TwinLlama-3.1-8B-DPO

NaNK
llama
23
19

BigLlama-3.1-1T-Instruct

llama
22
83

NeuralDaredevil-7B

NaNK
license:cc-by-nc-4.0
22
40

NeuralDaredevil-8B-abliterated-AWQ

NaNK
llama
20
0

NeuralBeagle14-7B

NaNK
license:cc-by-nc-4.0
18
157

phixtral-4x2_8

license:mit
16
209

Daredevil-8B-GGUF

NaNK
16
3

gemma-3-12b-it-qat-abliterated

NaNK
13
8

SmolGRPO-135M

llama
12
6

gemma-3-1b-it-abliterated-v2

NaNK
11
4

phi-2-orange-v2-GGUF

11
2

TwinLlama-3.1-8B-DPO-GGUF

NaNK
llama
11
2

DatacampLlama-3.1-8B-gguf

NaNK
llama
10
0

FineLlama-3.1-8B

NaNK
llama
9
9

codellama-2-7b

NaNK
llama
9
5

BigQwen2.5-52B-Instruct

NaNK
license:apache-2.0
8
8

NeuralMarcoro14-7B-GGUF

NaNK
8
5

llama-2-13b-guanaco

NaNK
llama
8
3

Meta-Llama-3-225B-Instruct

NaNK
llama
6
18

gemma-3-1b-it-qat-abliterated

This is an uncensored version of google/gemma-3-1b-it-qat-q40-unquantized created with a new abliteration technique. See this article to know more about abliteration. This is a new, improved version that targets refusals with enhanced accuracy. I recommend using these generation parameters: `temperature=1.0`, `topk=64`, `topp=0.95`. The refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples. The hidden states of target modules (e.g., oproj) are orthogonalized to subtract this refusal direction with a given weight factor. These weight factors follow a normal distribution with a certain spread and peak layer. Modules can be iteratively orthogonalized in batches, or the refusal direction can be accumulated to save memory. Finally, I used a hybrid evaluation with a dedicated test set to calculate the acceptance rate. This uses both a dictionary approach and NousResearch/Minos-v1. The goal is to obtain an acceptance rate >90% and still produce coherent outputs.

NaNK
6
2

Meta-Llama-3-120B-Instruct

NaNK
llama
5
201

Meta-Llama-3-12B

NaNK
llama
5
4

llama-2-13b-miniguanaco

NaNK
llama
5
2

Jambatypus-v0.1

NaNK
license:apache-2.0
4
39

NeuralPipe-7B-slerp

NaNK
license:apache-2.0
4
7

NeuralOmniBeagle-7B

NaNK
license:cc-by-4.0
4
6

BigLlama-3.1-681B-Instruct

NaNK
llama
3
11

PyLlama-7b

NaNK
llama
3
8

Gemmalpaca-7B

NaNK
3
7

LFM2-1.2B-Pirate

NaNK
3
6

FrankenLlama-3-12B-Instruct

NaNK
llama
3
5

gpt2-GPTQ-4bit

NaNK
license:apache-2.0
3
1

OmniBeagle-7B

NaNK
license:cc-by-nc-4.0
2
21

NeuralHermes-2.5-Mistral-7B-laser

NaNK
license:apache-2.0
2
16

UltraMerge-7B

NaNK
license:cc-by-nc-4.0
2
12

llama-2-7b-miniguanaco

NaNK
llama
2
7

AlphaMonarch-7B-2bit-HQQ

NaNK
license:cc-by-nc-4.0
2
7

zephyr-7b-beta-5.0bpw-exl2

NaNK
license:mit
2
6

OrcaGemma-2B

NaNK
2
6

Darewin-7B

NaNK
license:apache-2.0
2
3

alpagasus-2-7b

NaNK
llama
2
1

zephyr-7b-beta-4.0bpw-exl2

NaNK
license:mit
2
0

Monarch-7B

NaNK
license:cc-by-nc-4.0
1
11

Llama-3-12B

NaNK
llama
1
10

EvolCodeLlama-7b

NaNK
llama
1
6

phixtral-3x2_8

license:mit
1
3

NuminiLlama-3.1-8B

NaNK
llama
1
2

Darewin-7B-v2

NaNK
license:apache-2.0
1
1

OmniTruthyBeagle-7B-v0

NaNK
license:cc-by-4.0
1
1

Monarch-7B-dare

NaNK
license:cc-by-nc-4.0
1
1

Samantha-1.11-7b-4.0bpw-exl2

NaNK
llama
1
0

NeuralMix-2x7b

NaNK
license:apache-2.0
1
0

OmniTruthyBeagle-7B

NaNK
1
0

Monarch-7B-slerp

NaNK
license:cc-by-nc-4.0
1
0

AlphaMonarch-7B-5.0bpw-exl2

NaNK
license:cc-by-nc-4.0
1
0

Zebrafish-slerp-7B

NaNK
license:cc-by-nc-4.0
1
0

Zebrafish-dare-7B

NaNK
license:cc-by-nc-4.0
1
0

Zebrafish-linear-7B

NaNK
license:cc-by-nc-4.0
1
0

UltraMerge-v2-7B

NaNK
1
0

Llama-3.1-Twin-8B

NaNK
llama
1
0

llama-2-7b-guanaco

NaNK
llama
0
17

Llama-3-SLERP-8B

NaNK
llama
0
15

Zebrafish-7B

NaNK
license:cc-by-nc-4.0
0
14

BigQwen2.5-125B-Instruct

NaNK
0
11

ChimeraLlama-3-8B

NaNK
llama
0
8

Jambalpaca-v0.1

NaNK
license:apache-2.0
0
7

Llama-3-70B-Instruct-abliterated-LORA

NaNK
base_model:failspy/llama-3-70B-Instruct-abliterated
0
7

Llama-3-DARE-8B

NaNK
llama
0
6

Llama-3-linear-8B

NaNK
llama
0
6

Omnarch-7B

NaNK
license:cc-by-nc-4.0
0
5

llama-2-7b-miniplatypus

NaNK
llama
0
4

NeuralPipe-7B-ties

NaNK
license:apache-2.0
0
4

NeuralDarewin-7B

NaNK
license:apache-2.0
0
4

Meta-Llama-3-12B-Instruct

NaNK
llama
0
4

Qwerus-7B

NaNK
license:mit
0
4

drmistral-7b

NaNK
0
3

gemma-7b-dare

NaNK
license:cc-by-nc-4.0
0
3

grandpythia-200k-70m

license:apache-2.0
0
3

drllama-7b

NaNK
llama
0
2

FrankenBeagle14-11B

NaNK
license:cc-by-nc-4.0
0
2

Mistralpaca-7B

NaNK
license:apache-2.0
0
2

ArchBeagle-7B

NaNK
license:cc-by-nc-4.0
0
2

NeuBeagle-7B

NaNK
license:cc-by-nc-4.0
0
2

DatacampLlama-3.1-8B

NaNK
llama
0
2

BeagleB-7B

NaNK
license:cc-by-nc-4.0
0
1

NeuralOmni-7B

NaNK
0
1

NeuralOmniBeagle-7B-v2

NaNK
0
1

Beagle4

NaNK
license:cc-by-nc-4.0
0
1

FrankenMonarch-11b

NaNK
license:cc-by-nc-4.0
0
1

AlphaMonarch-7B-AWQ

NaNK
license:cc-by-nc-4.0
0
1

MergeSeek-R1-0528

0
1