leeminwaan

13 models โ€ข 1 total models in database
Sort by:

gpt-oss-20b-pruned-10.2B-GGUF

NaNK
license:apache-2.0
655
3

Qwen3 MOE 4x0.6B 2.4B Reasoning V1 Full GGUF

๐Ÿค– Model Card for Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF This repo is packed with multiple quantized versions of leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full in GGUF format. ๐Ÿš€โœจ Built for running efficiently on your everyday hardware - no need for enterprise-level specs to deploy these models. ๐Ÿ’ป๐ŸŽฏ๐Ÿ”ฅ | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Tiny ๐Ÿญ | Lightning โšก | Basic ๐Ÿ“‰ | Quick prototypes, potato hardware ๐Ÿงช | | Q3KS | Mini ๐Ÿน | Super fast ๐Ÿš€| Decent ๐Ÿ“Š | Mobile devices, quick tests ๐Ÿ“ฑ | | Q3KM | Small ๐Ÿฐ | Fast ๐Ÿ’จ | Good ๐Ÿ“ˆ | Lightweight but better quality | | Q3KL | Small+ ๐Ÿฑ | Fast โšก | Good ๐Ÿ“Š | Speed with acceptable quality | | Q40 | Medium ๐Ÿบ | Quick โšก | Solid ๐Ÿ‘ | Daily driver, casual chats ๐Ÿ’ฌ | | Q41 | Medium ๐ŸฆŠ | Quick ๐Ÿš€ | Solid+ ๐Ÿ‘Œ | Slight upgrade from Q40 | | Q4KS | Medium ๐Ÿป | Quick ๐Ÿ’จ | Nice โœจ | Well-balanced choice โš–๏ธ | | Q4KM | Medium ๐Ÿฆ | Quick โšก | Really nice ๐ŸŒŸ| The crowd favorite ๐Ÿ… | | Q50 | Chunky ๐Ÿ˜ | Chill ๐Ÿšถ | Great ๐Ÿ’ช | Chatbots that actually make sense ๐Ÿค–| | Q51 | Chunky ๐Ÿฆ | Chill โณ | Great+ ๐Ÿ”ฅ | When you need quality responses ๐Ÿ’ผ | | Q5KS | Big ๐Ÿณ | Chill ๐Ÿ• | Great+ โญ | For the quality-conscious ๐ŸŽฏ | | Q5KM | Big ๐Ÿฆฃ | Chill โŒ› | Excellent ๐Ÿ†| High-end performance ๐Ÿ’Ž | | Q6K | Massive ๐Ÿ‹ | Slow ๐ŸŒ | Near perfect ๐Ÿ‘‘| Enthusiasts only | | Q80 | Absolute unit ๐Ÿฆ•| Turtle ๐Ÿข | Basically perfect ๐Ÿ’Ž| Max settings gang ๐Ÿ–ฅ๏ธ | > ๐Ÿ“ Real talk: > - Lower numbers = smaller files ๐Ÿ“‰, runs faster โšก, but quality takes a hit ๐Ÿ“Š > - Q4KM hits different - it's the sweet spot most people actually want ๐Ÿ‘ฅ > - Q6K/Q80 are for perfectionists with beefy hardware ๐Ÿ†๐Ÿง™โ€โ™‚๏ธ > - Everything here runs on regular consumer hardware ๐Ÿ’ป - pick what matches your vibe! ๐ŸŽฏ - Quantized by: leeminwaan ๐Ÿ‘จโ€๐Ÿ’ป - Funded by [optional]: Solo project, no corporate backing ๐Ÿ’ฐ - Shared by [optional]: leeminwaan ๐Ÿค - Model type: Decoder-only transformer (the good stuff) ๐Ÿง ๐Ÿค– - Language(s) (NLP): Base on Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full - License: Apache-2.0 (free to use, modify, distribute) ๐Ÿ“„โš–๏ธ - Repository: Hugging Face Repo ๐Ÿค—๐Ÿ“ฆ - Quantization Tool: AllQuants ๐Ÿ”ขโšก - Paper [optional]: No research paper (this is practical, not academic) ๐Ÿ“โŒ - Demo [optional]: Demo coming soonโ„ข ๐ŸŽฎ๐Ÿ”œ Q2\K, Q3\K\S, Q3\K\M, Q3\K\L ๐Ÿƒโ€โ™‚๏ธ๐Ÿ’จ (Speed demons - perfect for testing) Q4\0, Q4\1, Q4\K\S, Q4\K\M โš–๏ธโœจ (The goldilocks zone - just right) Q5\0, Q5\1, Q5\K\S, Q5\K\M ๐Ÿ’ช๐ŸŽฏ (For when you need that extra quality) Q6\K, Q8\0 ๐Ÿ†๐Ÿ‘‘ (Maxed out settings - if your hardware can handle it) This is a straight quantization - no extra training or fine-tuning involved. โœจ Took leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full and compressed it into these GGUF formats. ๐Ÿ”„ llama.cpp for the heavy lifting ๐Ÿฆ™ Python 3.10 + huggingfacehub for the workflow ๐Ÿ Quantization: Making models smaller by reducing number precision - trades some quality for efficiency. ๐Ÿ”ข GGUF: The file format that llama.cpp loves - optimized for fast inference. โšก This is still a work in progress - expect some rough edges. ๐Ÿงช More updates and proper benchmarks coming when I get around to it. ๐Ÿ“ˆ

NaNK
license:apache-2.0
258
3

gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF

NaNK
license:apache-2.0
184
1

SmolLM3-3B-GGUF

This repository contains multiple quantized versions of the SmolLM3-3B model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible. - Developed by: leeminwaan - Funded by [optional]: Independent project - Shared by [optional]: leeminwaan - Model type: Decoder-only transformer language model - Language(s) (NLP): English (primary), multilingual capabilities not benchmarked - License: Apache-2.0 - Repository: Hugging Face Repo - Paper [optional]: Not available - Demo [optional]: To be released Q2\K, Q3\K\S, Q3\K\M, Q3\K\L Q4\0, Q4\1, Q4\K\S, Q4\K\M Q5\0, Q5\1, Q5\K\S, Q5\K\M Q6\K, Q8\0 Based on SmolLM3-3B pretraining corpus (public large-scale web text, open datasets). No additional fine-tuning was performed for this release. | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU | | Q3KS | Very Small | Very Fast | Low-Med | Lightweight devices, testing | | Q3KM | Small | Fast | Med | Lightweight, slightly better quality | | Q3KL | Small-Med | Fast | Med | Faster inference, fair quality | | Q40 | Medium | Fast | Good | General use, chats, low RAM | | Q41 | Medium | Fast | Good+ | Recommended, slightly better quality | | Q4KS | Medium | Fast | Good+ | Recommended, balanced | | Q4KM | Medium | Fast | Good++ | Recommended, best Q4 option | | Q50 | Larger | Moderate | Very Good | Chatbots, longer responses | | Q51 | Larger | Moderate | Very Good+ | More demanding tasks | | Q5KS | Larger | Moderate | Very Good+ | Advanced users, better accuracy | | Q5KM | Larger | Moderate | Excellent | Demanding tasks, high quality | | Q6K | Large | Slower | Near FP16 | Power users, best quantized quality | | Q80 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU | > Note: > - Lower quantization = smaller model, faster inference, but lower output quality. > - Q4KM is ideal for most users; Q6K/Q80 offer the highest quality, best for advanced use. > - All quantizations are suitable for consumer hardwareโ€”select based on your quality/speed needs. llama.cpp for quantization Python 3.10, huggingface\hub Quantization: Reducing precision of weights to lower memory usage. GGUF: Optimized format for llama.cpp inference. This project is experimental. Expect further updates and quantization benchmarks.

NaNK
license:apache-2.0
154
0

DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF

Model Card for DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF This repository contains multiple quantized versions of the DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible. - Developed by: leeminwaan - Funded by [optional]: Independent project - Shared by [optional]: leeminwaan - Model type: Decoder-only transformer language model - Language(s) (NLP): English (primary), multilingual capabilities not benchmarked - License: Apache-2.0 - Repository: Hugging Face Repo - Paper [optional]: Not available - Demo [optional]: To be released Q2\K, Q3\K\S, Q3\K\M, Q3\K\L Q4\0, Q4\1, Q4\K\S, Q4\K\M Q5\0, Q5\1, Q5\K\S, Q5\K\M Q6\K, Q8\0 Based on DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 pretraining corpus (public large-scale web text, open datasets). No additional fine-tuning was performed for this release. Original DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1 โ†’ quantized to GGUF formats. | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU | | Q3KS | Very Small | Very Fast | Low-Med | Lightweight devices, testing | | Q3KM | Small | Fast | Med | Lightweight, slightly better quality | | Q3KL | Small-Med | Fast | Med | Faster inference, fair quality | | Q40 | Medium | Fast | Good | General use, chats, low RAM | | Q41 | Medium | Fast | Good+ | Recommended, slightly better quality | | Q4KS | Medium | Fast | Good+ | Recommended, balanced | | Q4KM | Medium | Fast | Good++ | Recommended, best Q4 option | | Q50 | Larger | Moderate | Very Good | Chatbots, longer responses | | Q51 | Larger | Moderate | Very Good+ | More demanding tasks | | Q5KS | Larger | Moderate | Very Good+ | Advanced users, better accuracy | | Q5KM | Larger | Moderate | Excellent | Demanding tasks, high quality | | Q6K | Large | Slower | Near FP16 | Power users, best quantized quality | | Q80 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU | > Note: > - Lower quantization = smaller model, faster inference, but lower output quality. > - Q4KM is ideal for most users; Q6K/Q80 offer the highest quality, best for advanced use. > - All quantizations are suitable for consumer hardwareโ€”select based on your quality/speed needs. llama.cpp for quantization Python 3.10, huggingface\hub Quantization: Reducing precision of weights to lower memory usage. GGUF: Optimized format for llama.cpp inference. This project is experimental. Expect further updates and quantization benchmarks.

NaNK
license:apache-2.0
133
0

gpt-oss-10.8b-specialized-all-pruned-moe-only-15-experts-GGUF

NaNK
license:apache-2.0
77
0

olmoe-reasoning-v1

NaNK
โ€”
31
0

hybrid_reasoning_v0.1

- Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

NaNK
โ€”
10
0

gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF

Model Card for gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF This repository contains multiple quantized versions of the gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible. - Developed by: leeminwaan - Funded by [optional]: Independent project - Shared by [optional]: leeminwaan - Model type: Decoder-only transformer language model - Language(s) (NLP): English (primary), multilingual capabilities not benchmarked - License: Apache-2.0 - Repository: Hugging Face Repo - Paper [optional]: Not available - Demo [optional]: To be released Q2\K, Q3\K\S, Q3\K\M, Q3\K\L Q4\0, Q4\1, Q4\K\S, Q4\K\M Q5\0, Q5\1, Q5\K\S, Q5\K\M Q6\K, Q8\0 Based on gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts pretraining corpus (public large-scale web text, open datasets). No additional fine-tuning was performed for this release. Original gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts โ†’ quantized to GGUF formats. | Quantization | Size (vs. FP16) | Speed | Quality | Recommended For | |--------------|-----------------|-----------|------------|--------------------------------------| | Q2K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU | | Q3KS | Very Small | Very Fast | Low-Med | Lightweight devices, testing | | Q3KM | Small | Fast | Med | Lightweight, slightly better quality | | Q3KL | Small-Med | Fast | Med | Faster inference, fair quality | | Q40 | Medium | Fast | Good | General use, chats, low RAM | | Q41 | Medium | Fast | Good+ | Recommended, slightly better quality | | Q4KS | Medium | Fast | Good+ | Recommended, balanced | | Q4KM | Medium | Fast | Good++ | Recommended, best Q4 option | | Q50 | Larger | Moderate | Very Good | Chatbots, longer responses | | Q51 | Larger | Moderate | Very Good+ | More demanding tasks | | Q5KS | Larger | Moderate | Very Good+ | Advanced users, better accuracy | | Q5KM | Larger | Moderate | Excellent | Demanding tasks, high quality | | Q6K | Large | Slower | Near FP16 | Power users, best quantized quality | | Q80 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU | > Note: > - Lower quantization = smaller model, faster inference, but lower output quality. > - Q4KM is ideal for most users; Q6K/Q80 offer the highest quality, best for advanced use. > - All quantizations are suitable for consumer hardwareโ€”select based on your quality/speed needs. llama.cpp for quantization Python 3.10, huggingface\hub Quantization: Reducing precision of weights to lower memory usage. GGUF: Optimized format for llama.cpp inference. This project is experimental. Expect further updates and quantization benchmarks.

NaNK
license:apache-2.0
5
0

Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full

This is the model card of a ๐Ÿค— transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

NaNK
โ€”
3
0

idkconcepts

llama
3
0

qwen3-4b-conscious-lora

NaNK
โ€”
0
1

Qwen3-MOE-4x0.6B-2.4B-reasoning-v1.1

This is the model card of a ๐Ÿค— transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

NaNK
โ€”
0
1