Pinkstack
SuperThoughts-CoT-14B-16k-o1-QwQ-GGUF
DistilGPT-OSS-qwen3-4B
What it is DistilGPT-OSS-qwen3-4B is a Qwen3 4B-2507 thinking fine tune, it supports up to 262K tokens of input and output (aka total context) and can think for up to 65536 tokens when set to high reasoning effort. unlike the original qwen3, this model was fine-tuned on GPT-OSS reasoning outputs (unlike Deepseek r1 outputs which qwen3 was probably fine-tuned on for advanced reasoning). By fine-tuning on GPT-OSS outputs, the model was able to learn how to think efficiently, follow instructions better, and the new ability to think with a certain effort based on how much you want it to think. ⚠️This model is NOT as censored as the original GPT-OSS, we focused on performance rather than censorship. The model is still safety trained, it would just allow for more "creative" prompts, unlike GPT-OSS. We are not responsible for what the model generates. Keep in mind, this is a community project and we are NOT related to qwen by Alibaba nor GPT-OSS by OpenAi. Benefits of using this model over standard qwen3 4b thinking: - You decide how much it would think (low, medium, high) - completely different style of answers (more similar to ChatGPT) - Produces less emoji (qwen3 4b uses quite a lot in its responses which some may not like) - Less censored/limiting than qwen3 4b DistilGPT-OSS-qwen3-4B should be used for the following: - Local on device efficient assistance. - Code generation. - Math generation. - Summary generation. - General day to day use. - Anything law related due to hallucinations. - Medical questions. - Anything high risk which requires 1:1 accuracy. It is a small model thus general knowledge is limited to its size. Format This is the chat format of this model (you can also check the Jinja template file in "Files and versions"): As you can see, you set the reasoning effort via the system prompt. We recommend going 2 lines down and only then putting "Reasoning effort: [low,medium,high]. For your information that output was generated by our model. 1) "Is a banana an animal?" Reasoning was set to high. 2) "Make an HTML website about yourself" Reasoning was set to medium. 3) "translate this to chinese: Hello! I am ChatGPT. A large language model by OpenAi." Reasoning was set to low. As you can see, based on the reasoning effort of the model and your prompt, the model would think for a different amount of time. Keep in mind, these tests were done in LM Studio, GGUF q80 on a single consumer card (rtx 3080) where we got 95 - 80 Tokens/Second on 8192 context. We first started with some public datasets, removed almost all "I am sorry but.." for the dataset, filtered and skipped the first 25k samples, then mixed in outputs from the big 120B GPT OSS when we saw that the model was not as good at certain things. After doing that we formatted it into the proper qwen3 format, and did a few test runs using different optimizers, configurations etc. Keep in mind, we trained on about 15K samples, with each sample having 3 turns (the entire dataset was multi turn), the ademamix optimizer was chosen. We did a few test runs to see if it even learns anything, what it learns etc. We had runs where it was very censored, runs where it looped, and this one was the best. In addition, we added some outputs generated using the 120B gpt oss by us, to improve performance. The simplest way to explain the perforomance is like this: - Imagine the biggest GPT OSS (120B) is like GPT 5. - The official smallest GPT OSS (20B) is like GPT 5 mini. - And this one is like GPT-5 Nano. Obviously, no, these models do not compare to closed-sourced OpenAI models, but this comparison is just to explain it simply. This is how these models should be used, the biggest GPT OSS for the hard complicated tasks, the smaller 20B for average tasks and our "Open weights GPT 5 nano equivalent" for easier day-day tasks. (as a reminder it does NOT have the same performance as GPT 5 nano. Not even close to it.) The model was trained using unsloth, using a mix of private datasets and public datasets.
Superthoughts-lite-1.8B-experimental-o1-GGUF
Luau-coder-v2-3B-base-32k
Note: this is not a chat model, the chat model is coming soon but this is the base model for further fine-tuning, stay tuned for the chat model release! This page will be updated once that model is out. (The chat model will be under a different repo) print("Before we start") We are not related to Roblox in any way, any mention of Roblox is purely to help people understand what the model is about. As per the Roblox website, they use Meta's Llama 3 (we assume 70B) for their AI assistant. This model, while powerful, cannot come close to the performance of a 70B model. But unlike Llama 3, this model (luau-coder-v2-3b-32k) aka luaucoder for short is under an open apache 2.0 license. This model was continually pre-trained in 3 stages. (Note, allenai states that olmo 2 1B, which is the model this is based on was pre-trained on 4 trillion or so tokens.) - Stage 1: Pre-training on the Pinkstack/roblox-luau-corpus-text & Roblox/luaucorpus on 4096 context (the maximum olmo 2 can usually reach) - Stage 2: Pre-training on the boatbomber/roblox-info-dump with rope scaling set to 4, so stage 2 was for expanding the context of the model to 16384. !stage 3 and onwards were with added layers. the model started with 16 layers, then we merged another 20 to make the model bigger and deeper! - Stage 3: Training on a mix of Pinkstack/roblox-luau-corpus-text & Roblox/luaucorpus + wikimedia/wikipedia with rope scaling set to 8, aka 32768 tokens of context. We mixed the wikimedia/wikipedia to hopefully improve the general text and knowledge of the model. In total, the model was continually pre-trained on up to 1.3B tokens, final loss of 1.916400. print("Use cases") As this is a base model, there isn't much to do with it currently. But, you can fine-tune it on your own datasets to turn it into an instruct - chat type model. This stage-3 base model did not undergo saftey alignment by us, thus it can generate unethical content. Any outputs generated by the LLM are your responsibility. print("Additional information") This repo contains the stage 3 pre-trained/base model. unsloth was used for training (https://unsloth.ai/)
PARM-V2-QwQ-Qwen-2.5-o1-3B-GGUF
Syngen Reasoning 0.6b
👋 SYNGEN REASONING 0.6B This is a 0.6B parameter LLM designed for synthetic grounded reasoning generation between final model outputs, specifically for dataset modifications, but can be used for multiple use cases which require reasoning. For example, this model allows you to turn any chat dataset into a reasoning dataset as if it was generated by DeepSeek R1 or Openai's GPT OSS! 👀 EXAMPLE DATASET GENERATED WITH IT https://huggingface.co/datasets/Pinkstack/syngen-reasoning-example-80-smoltalk1 You should set the following: 1) ``MIN P = 0.05`` 2) 3) 4) (can be higher or lower, up to 32k) Syngen uses the CHATML format, but it is not a chat model. This is how you do interference with it: Liked the model? Need help with it? Do you know how to improve it further? Please make a post in the community tab.
PARM-Qwen2.5-o1-0.5B-GGUF
Superthoughts-lite-v2-MOE-Llama3.2-GGUF
DistilGPT-OSS-qwen3-4B-Q8_O-GGUF
This is a official q80 quantized version of https://huggingface.co/Pinkstack/DistilGPT-OSS-qwen3-4B What it is Distil-gpt-oss-qwen3-4B is a Qwen3 4B-2507 thinking fine tune, it supports up to 256K tokens of input and output (aka total context) and can think for up to 65536 tokens when set to high reasoning effort. unlike the original qwen3, this model was fine-tuned on GPT-OSS reasoning outputs (unlike Deepseek r1 outputs which qwen3 was probably fine-tuned on for advanced reasoning). By fine-tuning on GPT-OSS outputs, the model was able to learn how to think efficiently, follow instructions better, and the new ability to think with a certain effort based on how much you want it to think. ⚠️This model is NOT as censored as the original GPT-OSS, we focused on performance rather than censorship. The model is still safety trained, it would just allow for more "creative" prompts, unlike GPT-OSS. We are not responsible for what the model generates. Keep in mind, this is a community project and we are NOT related to qwen by Alibaba nor GPT-OSS by OpenAi. Format This is the chat format of this model (you can also check the Jinja template file in "Files and versions"): As you can see, you set the reasoning effort via the system prompt. We recommend going 2 lines down and only then putting "Reasoning effort: [low,medium,high]. For your information that output was generated by our model. The model was trained using unsloth, using a mix of private datasets and public datasets.
PARM-Phi-3.5-mini-o1-GGUF
Fijik1.5-2.6b-a380m
Fijik1.5-2.6b-a380m-base
PARM-QwQ-Qwen-2.5-o1-3B-GGUF
PARM-2-Tiny-Instruct-1.7B-QwQ-o1-GGUF
Fijik-3b-instruct-GGUF
Phi-4-mini-6b-merge
Parm-2-CoT-14B-16k-o1-QwQ
PGAM-WIT-Conversational-3B-PyTorch
This is a base/testing model. It is recommended to be used for further fine tuning or training. This model is, odd. Been trained on both Grok and hf ultrachat200k datasets, it acts oddly but is interesting to mess around with. WIT - weird & interesting transformer - Developed by: Pinkstack - License: apache-2.0 - Finetuned from model : Pinkstack/PGAM-WIT-Conversational-3B-vLLM (og version) This model was trained with Unsloth and Huggingface's TRL library.
Llama-3.2-3B-o1
Superthoughts-lite-v2-MOE-Llama3.2-experimental-0427-Q8_0-GGUF
Fijik-6b-Instruct-Llama3.2
PARM-v1-Qwen2.5-O.1-0.5B-VLLM
PARM-QwQ-Qwen-2.5-3B-o1-VLLM
PARM-v2-ULTRA-o1-3B-vLLM
PARM-2-TINY-1.7B-QwQ-o1
Superthoughts-lite-v2-MOE-Llama3.2-experimental-0427
Superthoughts-lite-v2-MOE-Llama3.2-bf16
BFloat16 version of Superthoughts lite v2 MoE. 3.91B parameters, 2 experts active, 4 in total. (recommended, float32 is heavy.) INFORMATION This is the non-experimental version of Superthoughts Lite v2. Offering better accuracy at all tasks, better performance and less looping while generating responses. We trained it by first creating a base model for all the experts, which was fine-tuned using GRPO techniques using Unsloth on top of meta-llama/Llama-3.2-1B-Instruct. After making the base model, we trained each potential expert using SFT. After doing SFT, we did GRPO again. in total there are 4 experts: - Chat reasoning expert, - Math reasoning expert, - Code reasoning expert, - Science reasoning expert. By doing this, we obtained a powerful, lite reasoning model that is very usable for its size. This model is a direct replacement of Pinkstack/Superthoughts-lite-v1. Pinkstack/Superthoughts-lite-v1 was not able to generate code, and had very poor text performance. V2 is much more usable. Model information The model can generate up to 16,380 tokens and has a context size of 131072 It has been fine tuned to generated thinking data in-between xml tags. note that it may still have some slight looping but they are rare. LIMITATIONS While some safety alignment was done by us, it was very minimal. Thus, the model can be uncensored at times. In addition, users and providers alike should be aware that all large language models (LLM's), including this one can hallucinate and output false information. Always double check responses. By using this model, you agree to the LLAMA 3.2 COMMUNITY LICENSE.
Fijik-6b-Instruct-Llama3.2-GGUF
Fijik-1b-DPO
What is it This is a 1.0 Fijik series with 1 billion parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 0.5B. After merging, we used a custom dataset mix meant for this model, to improve its performance even more. - Step 1 for fine-tuning via unsloth: SFT on an estimated 5 million tokens. (more or less) - Step 2 for the fine-tuning via unsloth: DPO for 2 epochs for even better instruction following. After these two steps, we got a powerful model which has less parameters than llama 3.2 3B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size. Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to 32768 input tokens and can generate up to 8192 tokens. What should Fijik be used for? Fijik 1.0 1B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss. - We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt. - Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it may hallucinate. - In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it. - Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is. Examples Limitations This model is not uncensored, yet it may produce erotic outputs. You are solely responsible for the outputs from the model. Like any other LLM, users and hosters alike should be aware that AI language models may hallucinate and produce inaccurate, dangerous, or even completly nonsensical outputs, all the information the model provides may seem accurate, but please, for important tasks always double check responses with credible sources. - Developed by: Pinkstack - License: Apache 2.0 - Finetuned from model : Pinkstack/Fijik-1b-v1-sft This Qwen2.5 model was trained with Unsloth and Huggingface's TRL library.