leeminwaan

13 models • 1 total models in database

Sort by:

gpt-oss-20b-pruned-10.2B-GGUF

Qwen3 MOE 4x0.6B 2.4B Reasoning V1 Full GGUF

🤖 Model Card for Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF This repo is packed with multiple quantized versions of leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full in GGUF format. 🚀✨ Built for running efficiently on your everyday hardware - no need for enterprise-level specs to deploy these models. 💻🎯🔥 | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Tiny 🐭 | Lightning ⚡ | Basic 📉 | Quick prototypes, potato hardware 🧪 | | Q3KS | Mini 🐹 | Super fast 🚀| Decent 📊 | Mobile devices, quick tests 📱 | | Q3KM | Small 🐰 | Fast 💨 | Good 📈 | Lightweight but better quality | | Q3KL | Small+ 🐱 | Fast ⚡ | Good 📊 | Speed with acceptable quality | | Q40 | Medium 🐺 | Quick ⚡ | Solid 👍 | Daily driver, casual chats 💬 | | Q41 | Medium 🦊 | Quick 🚀 | Solid+ 👌 | Slight upgrade from Q40 | | Q4KS | Medium 🐻 | Quick 💨 | Nice ✨ | Well-balanced choice ⚖️ | | Q4KM | Medium 🦁 | Quick ⚡ | Really nice 🌟| The crowd favorite 🏅 | | Q50 | Chunky 🐘 | Chill 🚶 | Great 💪 | Chatbots that actually make sense 🤖| | Q51 | Chunky 🦏 | Chill ⏳ | Great+ 🔥 | When you need quality responses 💼 | | Q5KS | Big 🐳 | Chill 🕐 | Great+ ⭐ | For the quality-conscious 🎯 | | Q5KM | Big 🦣 | Chill ⌛ | Excellent 🏆| High-end performance 💎 | | Q6K | Massive 🐋 | Slow 🐌 | Near perfect 👑| Enthusiasts only | | Q80 | Absolute unit 🦕| Turtle 🐢 | Basically perfect 💎| Max settings gang 🖥️ | > 📝 Real talk: > - Lower numbers = smaller files 📉, runs faster ⚡, but quality takes a hit 📊 > - Q4KM hits different - it's the sweet spot most people actually want 👥 > - Q6K/Q80 are for perfectionists with beefy hardware 🏆🧙‍♂️ > - Everything here runs on regular consumer hardware 💻 - pick what matches your vibe! 🎯 - Quantized by: leeminwaan 👨‍💻 - Funded by [optional]: Solo project, no corporate backing 💰 - Shared by [optional]: leeminwaan 🤝 - Model type: Decoder-only transformer (the good stuff) 🧠🤖 - Language(s) (NLP): Base on Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full - License: Apache-2.0 (free to use, modify, distribute) 📄⚖️ - Repository: Hugging Face Repo 🤗📦 - Quantization Tool: AllQuants 🔢⚡ - Paper [optional]: No research paper (this is practical, not academic) 📝❌ - Demo [optional]: Demo coming soon™ 🎮🔜 Q2\K, Q3\K\S, Q3\K\M, Q3\K\L 🏃‍♂️💨 (Speed demons - perfect for testing) Q4\0, Q4\1, Q4\K\S, Q4\K\M ⚖️✨ (The goldilocks zone - just right) Q5\0, Q5\1, Q5\K\S, Q5\K\M 💪🎯 (For when you need that extra quality) Q6\K, Q8\0 🏆👑 (Maxed out settings - if your hardware can handle it) This is a straight quantization - no extra training or fine-tuning involved. ✨ Took leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full and compressed it into these GGUF formats. 🔄 llama.cpp for the heavy lifting 🦙 Python 3.10 + huggingfacehub for the workflow 🐍 Quantization: Making models smaller by reducing number precision - trades some quality for efficiency. 🔢 GGUF: The file format that llama.cpp loves - optimized for fast inference. ⚡ This is still a work in progress - expect some rough edges. 🧪 More updates and proper benchmarks coming when I get around to it. 📈

NaNK

license:apache-2.0

258

gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF

NaNK

license:apache-2.0

184

SmolLM3-3B-GGUF

This repository contains multiple quantized versions of the SmolLM3-3B model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible. - Developed by: leeminwaan - Funded by [optional]: Independent project - Shared by [optional]: leeminwaan - Model type: Decoder-only transformer language model - Language(s) (NLP): English (primary), multilingual capabilities not benchmarked - License: Apache-2.0 - Repository: Hugging Face Repo - Paper [optional]: Not available - Demo [optional]: To be released Q2\K, Q3\K\S, Q3\K\M, Q3\K\L Q4\0, Q4\1, Q4\K\S, Q4\K\M Q5\0, Q5\1, Q5\K\S, Q5\K\M Q6\K, Q8\0 Based on SmolLM3-3B pretraining corpus (public large-scale web text, open datasets). No additional fine-tuning was performed for this release. | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU | | Q3KS | Very Small | Very Fast | Low-Med | Lightweight devices, testing | | Q3KM | Small | Fast | Med | Lightweight, slightly better quality | | Q3KL | Small-Med | Fast | Med | Faster inference, fair quality | | Q40 | Medium | Fast | Good | General use, chats, low RAM | | Q41 | Medium | Fast | Good+ | Recommended, slightly better quality | | Q4KS | Medium | Fast | Good+ | Recommended, balanced | | Q4KM | Medium | Fast | Good++ | Recommended, best Q4 option | | Q50 | Larger | Moderate | Very Good | Chatbots, longer responses | | Q51 | Larger | Moderate | Very Good+ | More demanding tasks | | Q5KS | Larger | Moderate | Very Good+ | Advanced users, better accuracy | | Q5KM | Larger | Moderate | Excellent | Demanding tasks, high quality | | Q6K | Large | Slower | Near FP16 | Power users, best quantized quality | | Q80 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU | > Note: > - Lower quantization = smaller model, faster inference, but lower output quality. > - Q4KM is ideal for most users; Q6K/Q80 offer the highest quality, best for advanced use. > - All quantizations are suitable for consumer hardware—select based on your quality/speed needs. llama.cpp for quantization Python 3.10, huggingface\hub Quantization: Reducing precision of weights to lower memory usage. GGUF: Optimized format for llama.cpp inference. This project is experimental. Expect further updates and quantization benchmarks.

NaNK

license:apache-2.0

154

DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF

Model Card for DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF This repository contains multiple quantized versions of the DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible. - Developed by: leeminwaan - Funded by [optional]: Independent project - Shared by [optional]: leeminwaan - Model type: Decoder-only transformer language model - Language(s) (NLP): English (primary), multilingual capabilities not benchmarked - License: Apache-2.0 - Repository: Hugging Face Repo - Paper [optional]: Not available - Demo [optional]: To be released Q2\K, Q3\K\S, Q3\K\M, Q3\K\L Q4\0, Q4\1, Q4\K\S, Q4\K\M Q5\0, Q5\1, Q5\K\S, Q5\K\M Q6\K, Q8\0 Based on DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 pretraining corpus (public large-scale web text, open datasets). No additional fine-tuning was performed for this release. Original DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 → quantized to GGUF formats. | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU | | Q3KS | Very Small | Very Fast | Low-Med | Lightweight devices, testing | | Q3KM | Small | Fast | Med | Lightweight, slightly better quality | | Q3KL | Small-Med | Fast | Med | Faster inference, fair quality | | Q40 | Medium | Fast | Good | General use, chats, low RAM | | Q41 | Medium | Fast | Good+ | Recommended, slightly better quality | | Q4KS | Medium | Fast | Good+ | Recommended, balanced | | Q4KM | Medium | Fast | Good++ | Recommended, best Q4 option | | Q50 | Larger | Moderate | Very Good | Chatbots, longer responses | | Q51 | Larger | Moderate | Very Good+ | More demanding tasks | | Q5KS | Larger | Moderate | Very Good+ | Advanced users, better accuracy | | Q5KM | Larger | Moderate | Excellent | Demanding tasks, high quality | | Q6K | Large | Slower | Near FP16 | Power users, best quantized quality | | Q80 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU | > Note: > - Lower quantization = smaller model, faster inference, but lower output quality. > - Q4KM is ideal for most users; Q6K/Q80 offer the highest quality, best for advanced use. > - All quantizations are suitable for consumer hardware—select based on your quality/speed needs. llama.cpp for quantization Python 3.10, huggingface\hub Quantization: Reducing precision of weights to lower memory usage. GGUF: Optimized format for llama.cpp inference. This project is experimental. Expect further updates and quantization benchmarks.

NaNK

license:apache-2.0

133

gpt-oss-10.8b-specialized-all-pruned-moe-only-15-experts-GGUF

NaNK

license:apache-2.0

olmoe-reasoning-v1

NaNK

—

hybrid_reasoning_v0.1

- Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

NaNK

—

gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF

Model Card for gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF This repository contains multiple quantized versions of the gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible. - Developed by: leeminwaan - Funded by [optional]: Independent project - Shared by [optional]: leeminwaan - Model type: Decoder-only transformer language model - Language(s) (NLP): English (primary), multilingual capabilities not benchmarked - License: Apache-2.0 - Repository: Hugging Face Repo - Paper [optional]: Not available - Demo [optional]: To be released Q2\K, Q3\K\S, Q3\K\M, Q3\K\L Q4\0, Q4\1, Q4\K\S, Q4\K\M Q5\0, Q5\1, Q5\K\S, Q5\K\M Q6\K, Q8\0 Based on gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts pretraining corpus (public large-scale web text, open datasets). No additional fine-tuning was performed for this release. Original gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts → quantized to GGUF formats. | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU | | Q3KS | Very Small | Very Fast | Low-Med | Lightweight devices, testing | | Q3KM | Small | Fast | Med | Lightweight, slightly better quality | | Q3KL | Small-Med | Fast | Med | Faster inference, fair quality | | Q40 | Medium | Fast | Good | General use, chats, low RAM | | Q41 | Medium | Fast | Good+ | Recommended, slightly better quality | | Q4KS | Medium | Fast | Good+ | Recommended, balanced | | Q4KM | Medium | Fast | Good++ | Recommended, best Q4 option | | Q50 | Larger | Moderate | Very Good | Chatbots, longer responses | | Q51 | Larger | Moderate | Very Good+ | More demanding tasks | | Q5KS | Larger | Moderate | Very Good+ | Advanced users, better accuracy | | Q5KM | Larger | Moderate | Excellent | Demanding tasks, high quality | | Q6K | Large | Slower | Near FP16 | Power users, best quantized quality | | Q80 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU | > Note: > - Lower quantization = smaller model, faster inference, but lower output quality. > - Q4KM is ideal for most users; Q6K/Q80 offer the highest quality, best for advanced use. > - All quantizations are suitable for consumer hardware—select based on your quality/speed needs. llama.cpp for quantization Python 3.10, huggingface\hub Quantization: Reducing precision of weights to lower memory usage. GGUF: Optimized format for llama.cpp inference. This project is experimental. Expect further updates and quantization benchmarks.

NaNK

license:apache-2.0

Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

NaNK

—

leeminwaan

gpt-oss-20b-pruned-10.2B-GGUF

Qwen3 MOE 4x0.6B 2.4B Reasoning V1 Full GGUF

gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF

SmolLM3-3B-GGUF

DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF

gpt-oss-10.8b-specialized-all-pruned-moe-only-15-experts-GGUF

olmoe-reasoning-v1

hybrid_reasoning_v0.1

gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF

Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full

idkconcepts

qwen3-4b-conscious-lora

Qwen3-MOE-4x0.6B-2.4B-reasoning-v1.1