lunahr

26 models • 1 total models in database

Sort by:

Qwen3-0.6B-Medical-Expert-abliterated

This project performs full fine-tuning on the Qwen3-0.6B language model to enhance its medical reasoning and clinical understanding capabilities. Training was conducted on the `FreedomIntelligence/medical-o1-reasoning-SFT` dataset using bfloat16 (bf16) precision for efficient optimization. Additionally, it has been abliterated to make it steer away from censorship. The `FreedomIntelligence/medical-o1-reasoning-SFT` dataset was used. Each example consists of medically relevant instructions or questions paired with detailed, step-by-step clinical reasoning responses. Prompts were structured to encourage safe, factual, and coherent medical reasoning chains. Qwen3 base model weights were loaded via the `unsloth` library in bf16 precision. All model layers were fully updated (`fullfinetuning=True`) to effectively adapt the model to medical reasoning and decision-making tasks. Fine-tuning was conducted using the Hugging Face TRL library with the Supervised Fine-Tuning (SFT) approach. The model was trained to follow clinical instructions, interpret symptoms, and generate reasoned diagnoses or treatment suggestions. The model’s ability to interpret medical instructions and generate step-by-step clinical reasoning has been significantly enhanced. It produces responses that combine factual accuracy with transparent reasoning, making it useful in educational and assistive medical AI contexts. This project is licensed under the Apache License 2.0. See the LICENSE file for details.

NaNK

license:apache-2.0

gemma-3-1b-it-abliterated

NaNK

—

SystemGemma2-9b-it

NaNK

—

thea-pro-2b-100r

NaNK

—

gemini-nano-pytorch

—

thea-3b-50r-u1

An uncensored reasoning Llama 3.2 3B model trained on reasoning data. It has been trained using improved training code, and gives an improved performance. This is a Thea 3B Update 1 model. The new features are: - Trained on more examples than the original Thea model. - Based off a different base model, with some of the lost accuracy points (hopefully) restored. This model has not been tested in a GGUF setting yet. Try it in a GGUF setting yourself by using the GGUF My Repo space. Intended Use This model is intended as an OpenAI o1 replacement for weaker hardware, mimicking o1 in the response formatting. Limitations - There may be a higher chance of getting hallucinations with this model due to its small size. - Some questions may be answered incorrectly. - This model is uncensored, exercise caution when generating sensitive content. - Trained by: Piotr Zalewski - License: llama3.2 - Architecture:: llama3.2 - Finetuned from model: CreitinGameplays/Llama-3.2-3b-Instruct-uncensored-refinetune - Dataset used: KingNish/reasoning-base-20k This Llama model was trained faster than Unsloth using custom training code. Visit https://www.kaggle.com/code/piotr25691/distributed-llama-training-with-2xt4 to find out how you can finetune your models using BOTH of the Kaggle provided GPUs.

NaNK

llama

Qwen3-0.6B-Code-Expert-abliterated

This project performs full fine-tuning on the Qwen3-0.6B language model to enhance its code reasoning and generation capabilities. Training was conducted exclusively on the `nvidia/OpenCodeReasoning` dataset, and the model was optimized using the bfloat16 (bf16) data type. Additionally, it has been abliterated to make it steer away from censorship. `nvidia/OpenCodeReasoning` dataset was used. Each example consists of code snippets paired with detailed step-by-step reasoning in Chain-of-Thought (CoT) style. Qwen3-0.6B base model weights were loaded via the `unsloth` library in bf16 precision. Full fine-tuning (`fullfinetuning=True`) was applied to all layers for optimal adaptation to code reasoning. Employed the Hugging Face TRL library with the Supervised Fine-Tuning (SFT) approach. The model was trained to generate correct code solutions along with the corresponding reasoning chains. The model’s capacity for understanding, reasoning about, and generating code was significantly improved through specialized, single-dataset training in bf16 precision. Outputs include both intermediate reasoning steps and final code solutions, enabling transparent and interpretable code generation. This project is licensed under the Apache License 2.0. See the LICENSE file for details.

NaNK

license:apache-2.0