leeminwaan
gpt-oss-20b-pruned-10.2B-GGUF
Qwen3 MOE 4x0.6B 2.4B Reasoning V1 Full GGUF
๐ค Model Card for Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF This repo is packed with multiple quantized versions of leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full in GGUF format. ๐โจ Built for running efficiently on your everyday hardware - no need for enterprise-level specs to deploy these models. ๐ป๐ฏ๐ฅ | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Tiny ๐ญ | Lightning โก | Basic ๐ | Quick prototypes, potato hardware ๐งช | | Q3KS | Mini ๐น | Super fast ๐| Decent ๐ | Mobile devices, quick tests ๐ฑ | | Q3KM | Small ๐ฐ | Fast ๐จ | Good ๐ | Lightweight but better quality | | Q3KL | Small+ ๐ฑ | Fast โก | Good ๐ | Speed with acceptable quality | | Q40 | Medium ๐บ | Quick โก | Solid ๐ | Daily driver, casual chats ๐ฌ | | Q41 | Medium ๐ฆ | Quick ๐ | Solid+ ๐ | Slight upgrade from Q40 | | Q4KS | Medium ๐ป | Quick ๐จ | Nice โจ | Well-balanced choice โ๏ธ | | Q4KM | Medium ๐ฆ | Quick โก | Really nice ๐| The crowd favorite ๐ | | Q50 | Chunky ๐ | Chill ๐ถ | Great ๐ช | Chatbots that actually make sense ๐ค| | Q51 | Chunky ๐ฆ | Chill โณ | Great+ ๐ฅ | When you need quality responses ๐ผ | | Q5KS | Big ๐ณ | Chill ๐ | Great+ โญ | For the quality-conscious ๐ฏ | | Q5KM | Big ๐ฆฃ | Chill โ | Excellent ๐| High-end performance ๐ | | Q6K | Massive ๐ | Slow ๐ | Near perfect ๐| Enthusiasts only | | Q80 | Absolute unit ๐ฆ| Turtle ๐ข | Basically perfect ๐| Max settings gang ๐ฅ๏ธ | > ๐ Real talk: > - Lower numbers = smaller files ๐, runs faster โก, but quality takes a hit ๐ > - Q4KM hits different - it's the sweet spot most people actually want ๐ฅ > - Q6K/Q80 are for perfectionists with beefy hardware ๐๐งโโ๏ธ > - Everything here runs on regular consumer hardware ๐ป - pick what matches your vibe! ๐ฏ - Quantized by: leeminwaan ๐จโ๐ป - Funded by [optional]: Solo project, no corporate backing ๐ฐ - Shared by [optional]: leeminwaan ๐ค - Model type: Decoder-only transformer (the good stuff) ๐ง ๐ค - Language(s) (NLP): Base on Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full - License: Apache-2.0 (free to use, modify, distribute) ๐โ๏ธ - Repository: Hugging Face Repo ๐ค๐ฆ - Quantization Tool: AllQuants ๐ขโก - Paper [optional]: No research paper (this is practical, not academic) ๐โ - Demo [optional]: Demo coming soonโข ๐ฎ๐ Q2\K, Q3\K\S, Q3\K\M, Q3\K\L ๐โโ๏ธ๐จ (Speed demons - perfect for testing) Q4\0, Q4\1, Q4\K\S, Q4\K\M โ๏ธโจ (The goldilocks zone - just right) Q5\0, Q5\1, Q5\K\S, Q5\K\M ๐ช๐ฏ (For when you need that extra quality) Q6\K, Q8\0 ๐๐ (Maxed out settings - if your hardware can handle it) This is a straight quantization - no extra training or fine-tuning involved. โจ Took leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full and compressed it into these GGUF formats. ๐ llama.cpp for the heavy lifting ๐ฆ Python 3.10 + huggingfacehub for the workflow ๐ Quantization: Making models smaller by reducing number precision - trades some quality for efficiency. ๐ข GGUF: The file format that llama.cpp loves - optimized for fast inference. โก This is still a work in progress - expect some rough edges. ๐งช More updates and proper benchmarks coming when I get around to it. ๐
gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF
SmolLM3-3B-GGUF
This repository contains multiple quantized versions of the SmolLM3-3B model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible. - Developed by: leeminwaan - Funded by [optional]: Independent project - Shared by [optional]: leeminwaan - Model type: Decoder-only transformer language model - Language(s) (NLP): English (primary), multilingual capabilities not benchmarked - License: Apache-2.0 - Repository: Hugging Face Repo - Paper [optional]: Not available - Demo [optional]: To be released Q2\K, Q3\K\S, Q3\K\M, Q3\K\L Q4\0, Q4\1, Q4\K\S, Q4\K\M Q5\0, Q5\1, Q5\K\S, Q5\K\M Q6\K, Q8\0 Based on SmolLM3-3B pretraining corpus (public large-scale web text, open datasets). No additional fine-tuning was performed for this release. | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU | | Q3KS | Very Small | Very Fast | Low-Med | Lightweight devices, testing | | Q3KM | Small | Fast | Med | Lightweight, slightly better quality | | Q3KL | Small-Med | Fast | Med | Faster inference, fair quality | | Q40 | Medium | Fast | Good | General use, chats, low RAM | | Q41 | Medium | Fast | Good+ | Recommended, slightly better quality | | Q4KS | Medium | Fast | Good+ | Recommended, balanced | | Q4KM | Medium | Fast | Good++ | Recommended, best Q4 option | | Q50 | Larger | Moderate | Very Good | Chatbots, longer responses | | Q51 | Larger | Moderate | Very Good+ | More demanding tasks | | Q5KS | Larger | Moderate | Very Good+ | Advanced users, better accuracy | | Q5KM | Larger | Moderate | Excellent | Demanding tasks, high quality | | Q6K | Large | Slower | Near FP16 | Power users, best quantized quality | | Q80 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU | > Note: > - Lower quantization = smaller model, faster inference, but lower output quality. > - Q4KM is ideal for most users; Q6K/Q80 offer the highest quality, best for advanced use. > - All quantizations are suitable for consumer hardwareโselect based on your quality/speed needs. llama.cpp for quantization Python 3.10, huggingface\hub Quantization: Reducing precision of weights to lower memory usage. GGUF: Optimized format for llama.cpp inference. This project is experimental. Expect further updates and quantization benchmarks.
DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF
Model Card for DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF This repository contains multiple quantized versions of the DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible. - Developed by: leeminwaan - Funded by [optional]: Independent project - Shared by [optional]: leeminwaan - Model type: Decoder-only transformer language model - Language(s) (NLP): English (primary), multilingual capabilities not benchmarked - License: Apache-2.0 - Repository: Hugging Face Repo - Paper [optional]: Not available - Demo [optional]: To be released Q2\K, Q3\K\S, Q3\K\M, Q3\K\L Q4\0, Q4\1, Q4\K\S, Q4\K\M Q5\0, Q5\1, Q5\K\S, Q5\K\M Q6\K, Q8\0 Based on DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 pretraining corpus (public large-scale web text, open datasets). No additional fine-tuning was performed for this release. Original DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 โ quantized to GGUF formats. | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU | | Q3KS | Very Small | Very Fast | Low-Med | Lightweight devices, testing | | Q3KM | Small | Fast | Med | Lightweight, slightly better quality | | Q3KL | Small-Med | Fast | Med | Faster inference, fair quality | | Q40 | Medium | Fast | Good | General use, chats, low RAM | | Q41 | Medium | Fast | Good+ | Recommended, slightly better quality | | Q4KS | Medium | Fast | Good+ | Recommended, balanced | | Q4KM | Medium | Fast | Good++ | Recommended, best Q4 option | | Q50 | Larger | Moderate | Very Good | Chatbots, longer responses | | Q51 | Larger | Moderate | Very Good+ | More demanding tasks | | Q5KS | Larger | Moderate | Very Good+ | Advanced users, better accuracy | | Q5KM | Larger | Moderate | Excellent | Demanding tasks, high quality | | Q6K | Large | Slower | Near FP16 | Power users, best quantized quality | | Q80 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU | > Note: > - Lower quantization = smaller model, faster inference, but lower output quality. > - Q4KM is ideal for most users; Q6K/Q80 offer the highest quality, best for advanced use. > - All quantizations are suitable for consumer hardwareโselect based on your quality/speed needs. llama.cpp for quantization Python 3.10, huggingface\hub Quantization: Reducing precision of weights to lower memory usage. GGUF: Optimized format for llama.cpp inference. This project is experimental. Expect further updates and quantization benchmarks.
gpt-oss-10.8b-specialized-all-pruned-moe-only-15-experts-GGUF
olmoe-reasoning-v1
hybrid_reasoning_v0.1
- Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF
Model Card for gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF This repository contains multiple quantized versions of the gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible. - Developed by: leeminwaan - Funded by [optional]: Independent project - Shared by [optional]: leeminwaan - Model type: Decoder-only transformer language model - Language(s) (NLP): English (primary), multilingual capabilities not benchmarked - License: Apache-2.0 - Repository: Hugging Face Repo - Paper [optional]: Not available - Demo [optional]: To be released Q2\K, Q3\K\S, Q3\K\M, Q3\K\L Q4\0, Q4\1, Q4\K\S, Q4\K\M Q5\0, Q5\1, Q5\K\S, Q5\K\M Q6\K, Q8\0 Based on gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts pretraining corpus (public large-scale web text, open datasets). No additional fine-tuning was performed for this release. Original gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts โ quantized to GGUF formats. | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU | | Q3KS | Very Small | Very Fast | Low-Med | Lightweight devices, testing | | Q3KM | Small | Fast | Med | Lightweight, slightly better quality | | Q3KL | Small-Med | Fast | Med | Faster inference, fair quality | | Q40 | Medium | Fast | Good | General use, chats, low RAM | | Q41 | Medium | Fast | Good+ | Recommended, slightly better quality | | Q4KS | Medium | Fast | Good+ | Recommended, balanced | | Q4KM | Medium | Fast | Good++ | Recommended, best Q4 option | | Q50 | Larger | Moderate | Very Good | Chatbots, longer responses | | Q51 | Larger | Moderate | Very Good+ | More demanding tasks | | Q5KS | Larger | Moderate | Very Good+ | Advanced users, better accuracy | | Q5KM | Larger | Moderate | Excellent | Demanding tasks, high quality | | Q6K | Large | Slower | Near FP16 | Power users, best quantized quality | | Q80 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU | > Note: > - Lower quantization = smaller model, faster inference, but lower output quality. > - Q4KM is ideal for most users; Q6K/Q80 offer the highest quality, best for advanced use. > - All quantizations are suitable for consumer hardwareโselect based on your quality/speed needs. llama.cpp for quantization Python 3.10, huggingface\hub Quantization: Reducing precision of weights to lower memory usage. GGUF: Optimized format for llama.cpp inference. This project is experimental. Expect further updates and quantization benchmarks.
Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full
This is the model card of a ๐ค transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
idkconcepts
qwen3-4b-conscious-lora
Qwen3-MOE-4x0.6B-2.4B-reasoning-v1.1
This is the model card of a ๐ค transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]