mlx-community

✓ VerifiedCommunity

Apple MLX framework community contributions

500 models • 27 total models in database
Sort by:

gpt-oss-20b-MXFP4-Q8

--- license: apache-2.0 pipeline_tag: text-generation library_name: mlx tags: - vllm - mlx base_model: openai/gpt-oss-20b ---

NaNK
license:apache-2.0
812,246
18

parakeet-tdt-0.6b-v3

--- library_name: mlx language: - en - es - fr - de - bg - hr - cs - da - nl - et - fi - el - hu - it - lv - lt - mt - pl - pt - ro - sk - sl - sv - ru - uk tags: - mlx - automatic-speech-recognition - speech - audio - FastConformer - Conformer - Parakeet license: cc-by-4.0 pipeline_tag: automatic-speech-recognition base_model: nvidia/parakeet-tdt-0.6b-v3 ---

NaNK
license:cc-by-4.0
764,711
17

parakeet-tdt-0.6b-v2

--- library_name: mlx tags: - mlx - automatic-speech-recognition - speech - audio - FastConformer - Conformer - Parakeet license: cc-by-4.0 pipeline_tag: automatic-speech-recognition base_model: nvidia/parakeet-tdt-0.6b-v2 ---

NaNK
license:cc-by-4.0
566,735
32

whisper-small-mlx

122,365
3

gemma-3-12b-it-qat-4bit

NaNK
116,754
17

gemma-3-27b-it-qat-4bit

mlx-community/gemma-3-27b-it-qat-4bit This model was converted to MLX format from [`google/gemma-3-27b-it-qat-q40-unquantized`]() using mlx-vlm version 0.1.23. Refer to the original model card for more details on the model. Use with mlx

NaNK
80,767
20

gemma-3-4b-it-qat-4bit

NaNK
46,432
5

Qwen3-30B-A3B-Instruct-2507-4bit

This model mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit was converted to MLX format from Qwen/Qwen3-30B-A3B-Instruct-2507 using mlx-lm version 0.26.3.

NaNK
license:apache-2.0
35,954
8

gemma-3-1b-it-qat-4bit

The Model mlx-community/gemma-3-1b-it-qat-4bit was converted to MLX format from google/gemma-3-1b-it-qat-q40 using mlx-lm version 0.22.5.

NaNK
31,179
3

Llama-3.2-1B-Instruct-4bit

NaNK
llama
20,722
15

Llama-3.2-3B-Instruct-4bit

NaNK
llama
13,646
32

gemma-2-2b-it-4bit

NaNK
12,651
3

whisper-large-v3-mlx

license:mit
8,397
62

Qwen3-1.7B-4bit

NaNK
license:apache-2.0
7,761
4

Meta-Llama-3.1-8B-Instruct-4bit

NaNK
llama
7,412
17

Qwen3-VL-4B-Instruct-4bit

NaNK
license:apache-2.0
6,966
2

DeepSeek-OCR-8bit

mlx-community/DeepSeek-OCR-8bit This model was converted to MLX format from [`deepseek-ai/DeepSeek-OCR`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
6,579
20

Qwen3-Embedding-0.6B-4bit-DWQ

NaNK
license:apache-2.0
6,330
2

Mistral-7B-Instruct-v0.3-4bit

NaNK
license:apache-2.0
6,195
7

gemma-3-1b-it-4bit

NaNK
5,765
3

gemma-3n-E4B-it-lm-4bit

NaNK
5,528
4

gemma-3n-E2B-it-lm-4bit

NaNK
5,251
1

Kokoro-82M-bf16

mlx-community/Kokoro-82M-bf16 This model was converted to MLX format from [`hexagrad/Kokoro-82M`]() using mlx-audio version 0.0.1. Refer to the original model card for more details on the model. Use with mlx

license:apache-2.0
5,087
26

Qwen3-4B-4bit

NaNK
license:apache-2.0
5,003
9

Qwen3-0.6B-4bit

NaNK
license:apache-2.0
4,446
5

MiniMax-M2-4bit

This model mlx-community/MiniMax-M2-4bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK
license:mit
4,381
9

Qwen2-VL-2B-Instruct-4bit

NaNK
license:apache-2.0
4,298
5

whisper-large-v3-turbo

whisper-large-v3-turbo This model was converted to MLX format from [`large-v3-turbo`]().

3,909
72

Qwen2.5-3B-Instruct-4bit

NaNK
3,781
1

bge-small-en-v1.5-bf16

license:mit
3,206
0

Qwen3-VL-2B-Instruct-4bit

mlx-community/Qwen3-VL-2B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
3,075
0

Qwen3-235B-A22B-4bit

NaNK
license:apache-2.0
2,917
7

Dolphin3.0-Llama3.1-8B-4bit

NaNK
llama
2,899
0

DeepSeek-OCR-4bit

NaNK
license:mit
2,836
6

SmolLM3-3B-4bit

NaNK
license:apache-2.0
2,725
4

Qwen2.5-0.5B-Instruct-4bit

NaNK
license:apache-2.0
2,718
4

Qwen3-235B-A22B-8bit

NaNK
license:apache-2.0
2,715
4

Qwen1.5-0.5B-Chat-4bit

NaNK
2,702
4

DeepSeek-R1-Distill-Qwen-1.5B-8bit

NaNK
2,701
3

Qwen3-4B-Thinking-2507-4bit

NaNK
license:apache-2.0
2,475
1

GLM-4.6-4bit

This model mlx-community/GLM-4.6-4bit was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.1.

NaNK
license:mit
2,440
12

DeepSeek-OCR-5bit

mlx-community/DeepSeek-OCR-5bit This model was converted to MLX format from [`deepseek-ai/DeepSeek-OCR`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
2,418
1

Qwen2.5-1.5B-Instruct-4bit

NaNK
license:apache-2.0
2,369
1

MiniMax-M2-3bit

NaNK
license:mit
2,368
3

DeepSeek-R1-Distill-Qwen-1.5B-4bit

NaNK
2,199
6

Kimi-Linear-48B-A3B-Instruct-4bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-4bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK
license:mit
2,023
4

whisper-medium-mlx

1,901
3

Qwen3-4B-Instruct-2507-4bit

NaNK
license:apache-2.0
1,898
5

granite-4.0-h-micro-4bit

NaNK
license:apache-2.0
1,889
0

MiniMax-M2-8bit

This model mlx-community/MiniMax-M2-8bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK
license:mit
1,865
7

Qwen3-8B-4bit

NaNK
license:apache-2.0
1,815
6

Llama-3.2-3B-Instruct-8bit

NaNK
llama
1,806
1

Meta-Llama-3-8B-Instruct-4bit

mlx-community/Meta-Llama-3-8B-Instruct-4bit This model was converted to MLX format from [`meta-llama/Meta-Llama-3-8B-Instruct`]() using mlx-lm version 0.9.0. Refer to the original model card for more details on the model. Use with mlx

NaNK
llama
1,792
79

SmolVLM2-500M-Video-Instruct-mlx

license:apache-2.0
1,649
15

Qwen3-Embedding-4B-4bit-DWQ

NaNK
license:apache-2.0
1,629
6

Josiefied-Qwen3-4B-abliterated-v1-4bit

mlx-community/Josiefied-Qwen3-4B-abliterated-v1-4bit This model mlx-community/Josiefied-Qwen3-4B-abliterated-v1-4bit was converted to MLX format from Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v1 using mlx-lm version 0.24.0.

NaNK
1,565
2

Qwen2.5-7B-Instruct-4bit

NaNK
license:apache-2.0
1,491
8

Phi-4-mini-instruct-4bit

NaNK
license:mit
1,490
0

phi-2

NaNK
1,461
54

MiniMax-M2-mlx-8bit-gs32

This model mlx-community/MiniMax-M2-mlx-8bit-gs32 was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.1. Recipe: 8-bit group-size 32 9 bits per weight (bpw) You can find more similar MLX model quants for a single Apple Mac Studio M3 Ultra with 512 GB at https://huggingface.co/bibproj

NaNK
license:mit
1,382
2

granite-3.3-2b-instruct-4bit

NaNK
license:apache-2.0
1,347
1

Llama-3.3-70B-Instruct-4bit

NaNK
llama
1,342
30

GLM-4.6-mlx-8bit-gs32

This model mlx-community/GLM-4.6-mlx-8bit-gs32 was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.1. Recipe: 8-bit group-size 32 9 bits per weight (bpw) You can find more similar MLX model quants for Apple Mac Studio with 512 GB at https://huggingface.co/bibproj

NaNK
license:mit
1,340
1

Qwen3-VL-235B-A22B-Instruct-3bit

mlx-community/Qwen3-VL-235B-A22B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-235B-A22B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
1,337
2

Llama-2-7b-chat-mlx

NaNK
llama
1,307
84

granite-4.0-h-tiny-4bit

NaNK
license:apache-2.0
1,253
0

DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx

NaNK
1,240
12

Qwen2.5-VL-3B-Instruct-4bit

NaNK
1,214
2

gemma-3-270m-it-8bit

NaNK
1,160
2

whisper-base-mlx

1,147
0

Qwen3-Embedding-8B-4bit-DWQ

NaNK
license:apache-2.0
1,145
5

parakeet-rnnt-0.6b

NaNK
license:cc-by-4.0
1,098
0

gpt-oss-120b-MXFP4-Q4

This model mlx-community/gpt-oss-120b-MXFP4-Q4 was converted to MLX format from openai/gpt-oss-120b using mlx-lm version 0.27.0.

NaNK
license:apache-2.0
1,080
4

Phi-3.5-mini-instruct-4bit

NaNK
license:mit
1,047
8

LFM2-2.6B-4bit

This model mlx-community/LFM2-2.6B-4bit was converted to MLX format from LiquidAI/LFM2-2.6B using mlx-lm version 0.28.0.

NaNK
1,026
1

LFM2-8B-A1B-4bit

NaNK
1,001
3

gpt-oss-20b-MXFP4-Q4

NaNK
license:apache-2.0
963
7

GLM-4.6-bf16

This model mlx-community/GLM-4.6-bf16 was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.2.

NaNK
license:mit
962
3

Qwen3-0.6B-8bit

NaNK
license:apache-2.0
953
5

LFM2-1.2B-4bit

NaNK
947
2

whisper-large-v2-mlx

939
1

gemma-3n-E4B-it-4bit

NaNK
921
6

Qwen3-VL-30B-A3B-Instruct-4bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
917
5

Dolphin3.0-Llama3.1-8B-8bit

NaNK
llama
916
0

MiniMax M2 6bit

This model mlx-community/MiniMax-M2-6bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK
license:mit
905
1

deepcogito-cogito-v1-preview-llama-3B-4bit

NaNK
llama
873
0

Hermes-3-Llama-3.2-3B-4bit

NaNK
llama
863
0

DeepSeek-R1-Distill-Qwen-32B-4bit

NaNK
853
42

Qwen3-VL-30B-A3B-Instruct-8bit

NaNK
license:apache-2.0
851
2

dolphin3.0-llama3.2-3B-4Bit

NaNK
llama
849
0

Phi-4-mini-instruct-8bit

NaNK
license:mit
844
4

GLM-4.5-Air-3bit

This model mlx-community/GLM-4.5-Air-3bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.1.

NaNK
license:mit
837
30

DeepSeek-V3-0324-4bit

NaNK
license:mit
826
38

GLM-4.6-5bit

This model mlx-community/GLM-4.6-5bit was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.1.

NaNK
license:mit
820
3

gemma-2-9b-it-4bit

NaNK
812
2

gpt-oss-120b-MXFP4-Q8

This model mlx-community/gpt-oss-120b-MXFP4-Q8 was converted to MLX format from openai/gpt-oss-120b using mlx-lm version 0.27.0.

NaNK
license:apache-2.0
803
3

DeepSeek-OCR-6bit

mlx-community/DeepSeek-OCR-6bit This model was converted to MLX format from [`deepseek-ai/DeepSeek-OCR`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
796
0

Josiefied-Qwen3-1.7B-abliterated-v1-4bit

NaNK
795
2

DeepSeek-R1-0528-Qwen3-8B-4bit

NaNK
license:mit
791
4

Gemma 3 4b It 4bit

mlx-community/gemma-3-4b-it-4bit This model was converted to MLX format from [`google/gemma-3-4b-it`]() using mlx-vlm version 0.1.18. Refer to the original model card for more details on the model. Use with mlx

NaNK
772
7

SmolLM-135M-Instruct-4bit

NaNK
llama
749
4

whisper-tiny

744
0

Mistral-Nemo-Instruct-2407-4bit

NaNK
license:apache-2.0
743
14

LFM2-8B-A1B-3bit-MLX

Maintainer / Publisher: Susant Achary Upstream model: LiquidAI/LFM2-8B-A1B This repo (MLX 3-bit): `mlx-community/LFM2-8B-A1B-3bit-MLX` This repository provides an Apple-Silicon-optimized MLX build of LFM2-8B-A1B at 3-bit quantization. 3-bit is an excellent size↔quality sweet spot on many Macs—very small memory footprint with surprisingly solid answer quality and snappy decoding. - Architecture: Mixture-of-Experts (MoE) Transformer. - Size: ~8B total parameters with ~1B active per token (the “A1B” naming commonly indicates ~1B active params). - Why MoE? Per token, only a subset of experts is activated → lower compute per token while retaining a larger parameter pool for expressivity. > Memory reality on a single device: Even though ~1B parameters are active at a time, all experts typically reside in memory in single-device runs. Plan RAM based on total parameters, not just the active slice. - `config.json` (MLX), `mlxmodel.safetensors` (3-bit shards) - Tokenizer: `tokenizer.json`, `tokenizerconfig.json` - Metadata: `modelindex.json` (and/or processor metadata as applicable) Target: macOS on Apple Silicon (M-series) using Metal/MPS. - General instruction following, chat, and summarization - RAG back-ends and long-context assistants on device - Schema-guided structured outputs (JSON) where low RAM is a priority - 3-bit is lossy: tiny improvements in latency/RAM come with some accuracy trade-off vs 6/8-bit. - For very long contexts and/or batching, KV-cache can dominate memory—tune `maxtokens` and batch size. - Add your own guardrails/safety for production deployments. You asked to assume and decide realistic ranges. The numbers below are practical starting points—verify on your machine. - Weights (3-bit): ≈ `totalparams × 0.375 byte` → for 8B params ≈ ~3.0 GB - Runtime overhead: MLX graph/tensors/metadata → ~0.6–1.0 GB - KV-cache: grows with context × layers × heads × dtype → ~0.8–2.5+ GB | Context window | Estimated peak RAM | |---|---:| | 4k tokens | ~4.4–5.5 GB | | 8k tokens | ~5.2–6.6 GB | | 16k tokens | ~6.5–8.8 GB | > For ≤2k windows you may see ~4.0–4.8 GB. Larger windows/batches increase KV-cache and peak RAM. 🧭 Precision choices for LFM2-8B-A1B (lineup planning) While this card is 3-bit, teams often publish multiple precisions. Use this table as a planning guide (8B MoE LM; actuals depend on context/batch/prompts): | Variant | Typical Peak RAM | Relative Speed | Typical Behavior | When to choose | |---|---:|:---:|---|---| | 3-bit (this repo) | ~4.4–8.8 GB | 🔥🔥🔥🔥 | Direct, concise, great latency | Default on 8–16 GB Macs | | 6-bit | ~7.5–12.5 GB | 🔥🔥 | Best quality under quant | Choose if RAM allows | | 8-bit | ~9.5–12+ GB | 🔥🔥 | Largest quantized size / highest fidelity | When you prefer simpler 8-bit workflows | > MoE caveat: MoE lowers compute per token; unless experts are paged/partitioned, memory still scales with total parameters on a single device. Deterministic generation ```bash python -m mlxlm.generate \ --model mlx-community/LFM2-8B-A1B-3bit-MLX \ --prompt "Summarize the following in 5 concise bullet points:\n " \ --max-tokens 256 \ --temperature 0.0 \ --device mps \ --seed 0

NaNK
742
1

MiniMax M2 5bit

This model mlx-community/MiniMax-M2-5bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK
license:mit
739
2

gemma-3-text-4b-it-4bit

NaNK
736
0

Qwen3-Coder-30B-A3B-Instruct-4bit

NaNK
license:apache-2.0
688
10

granite-4.0-micro-8bit

This model mlx-community/granite-4.0-micro-8bit was converted to MLX format from ibm-granite/granite-4.0-micro using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
662
0

Phi-3-mini-4k-instruct-4bit

NaNK
license:mit
661
12

Qwen3-VL-235B-A22B-Thinking-3bit

NaNK
license:apache-2.0
655
0

Llama-3.2-11B-Vision-Instruct-abliterated

NaNK
mllama
645
7

Kimi-Dev-72B-4bit-DWQ

NaNK
license:mit
636
18

Qwen3-VL-30B-A3B-Instruct-bf16

mlx-community/Qwen3-VL-30B-A3B-Instruct-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
634
3

whisper-small-mlx-8bit

NaNK
632
0

Llama-3.2-1B-Instruct-8bit

NaNK
llama
613
1

DeepSeek-R1-Distill-Qwen-7B-4bit

NaNK
609
17

GLM-4.5-Air-4bit

NaNK
license:mit
594
25

Qwen2.5-VL-7B-Instruct-4bit

NaNK
license:apache-2.0
564
3

LFM2-350M-8bit

NaNK
547
2

DeepSeek-R1-4bit

NaNK
544
36

Meta-Llama-3.1-70B-Instruct-4bit

NaNK
llama
544
4

Kimi-K2-Instruct-4bit

NaNK
541
9

phi-4-8bit

NaNK
license:mit
534
12

3b-de-ft-research_release-4bit

NaNK
llama
529
0

whisper-large-mlx

515
2

dolphin3.0-llama3.2-1B-4Bit

NaNK
llama
511
0

Qwen3-VL-8B-Instruct-4bit

mlx-community/Qwen3-VL-8B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
506
3

Llama-3.2-11B-Vision-Instruct-8bit

NaNK
mllama
505
10

Qwen3-4B-8bit

NaNK
license:apache-2.0
499
1

gemma-3-4b-it-8bit

NaNK
491
5

gemma-3n-E2B-it-4bit

NaNK
481
9

DeepSeek-V3.1-4bit

NaNK
license:mit
478
6

Qwen3-VL-8B-Thinking-8bit

mlx-community/Qwen3-VL-8B-Thinking-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
473
1

Ring-mini-linear-2.0-4bit

This model mlx-community/Ring-mini-linear-2.0-4bit was converted to MLX format from inclusionAI/Ring-mini-linear-2.0 using mlx-lm version 0.28.1.

NaNK
license:mit
472
3

Qwen3-VL-30B-A3B-Thinking-4bit

NaNK
license:apache-2.0
463
0

Qwen3-4B-Instruct-2507-4bit-DWQ-2510

This model mlx-community/Qwen3-4B-Instruct-2507-4bit-DWQ-2510 was converted to MLX format from Qwen/Qwen3-4B-Instruct-2507 using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
455
1

Qwen3-Coder-30B-A3B-Instruct-4bit-dwq-v2

NaNK
license:apache-2.0
452
7

Qwen3-Coder-30B-A3B-Instruct-8bit

This model mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit was converted to MLX format from Qwen/Qwen3-Coder-30B-A3B-Instruct using mlx-lm version 0.26.1.

NaNK
license:apache-2.0
449
2

Qwen3-VL-30B-A3B-Thinking-3bit

NaNK
license:apache-2.0
449
1

Qwen3-VL-30B-A3B-Thinking-8bit

NaNK
license:apache-2.0
447
0

Qwen3-Coder-480B-A35B-Instruct-4bit

NaNK
license:apache-2.0
444
18

DeepSeek-V3.1-Terminus-4bit

This model mlx-community/DeepSeek-V3.1-Terminus-4bit was converted to MLX format from deepseek-ai/DeepSeek-V3.1-Terminus using mlx-lm version 0.27.1.

NaNK
license:mit
442
2

whisper-large-v3-turbo-q4

437
7

Qwen3-VL-30B-A3B-Thinking-bf16

mlx-community/Qwen3-VL-30B-A3B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
435
1

Granite-4.0-H-Tiny-4bit-DWQ

This model mlx-community/granite-4.0-h-Tiny-4bit-DWQ was converted to MLX format from ibm-granite/granite-4.0-h-small using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
430
2

Llama-3.2-3B-Instruct

NaNK
llama
428
7

Qwen3-Next-80B-A3B-Instruct-4bit

NaNK
license:apache-2.0
425
17

parakeet-tdt_ctc-0.6b-ja

This model was converted to MLX format from nvidia/parakeet-tdtctc-0.6b-ja using the conversion script. Please refer to original model card for more details on the model.

NaNK
license:cc-by-4.0
424
4

Mistral-7B-Instruct-v0.2-4-bit

NaNK
license:apache-2.0
423
24

Qwen3-30B-A3B-4bit

NaNK
license:apache-2.0
419
11

Llama-3.2-3B-Instruct-uncensored-6bit

NaNK
llama
415
3

Kimi-K2-Instruct-0905-mlx-DQ3_K_M

This model mlx-community/Kimi-K2-Instruct-0905-mlx-DQ3KM was converted to MLX format from moonshotai/Kimi-K2-Instruct-0905 using mlx-lm version 0.26.3. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. The 4-bit version of Kimi K2 does not fit. Using research results, we aim to get 4-bit performance from a slightly smaller and smarter quantization. It should also not be so large that it leaves no memory for a useful context window. You can find more similar MLX model quants for Apple Mac Studio with 512 GB at https://huggingface.co/bibproj In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like Should you wish to squeeze more out of your quant, and you do not need to use a larger context window, you can change the last part of the above code to

NaNK
414
7

Qwen2.5-Coder-7B-Instruct-bf16

NaNK
license:apache-2.0
413
2

Mixtral-8x22B-4bit

NaNK
license:apache-2.0
412
54

Qwen3 VL 4B Instruct 8bit

NaNK
license:apache-2.0
411
3

Kimi-Linear-48B-A3B-Instruct-8bit

NaNK
license:mit
411
0

Llama-3.3-70B-Instruct-8bit

NaNK
llama
410
14

nvidia_Llama-3.1-Nemotron-70B-Instruct-HF_4bit

NaNK
llama
409
12

Huihui-GLM-4.5V-abliterated-mxfp4

mlx-community/Huihui-GLM-4.5V-abliterated-mxfp4 This model was converted to MLX format from [`huihui-ai/Huihui-GLM-4.5V-abliterated`]() using `mlx-vlm` with MXFP4 support. Refer to the original model card for more details on the model. Use with mlx

license:mit
407
2

gemma-3-1b-pt-4bit

NaNK
406
1

embeddinggemma-300m-8bit

NaNK
403
2

DeepSeek-R1-Distill-Llama-70B-8bit

NaNK
llama
391
10

chandra-8bit

NaNK
386
1

DeepSeek-Coder-V2-Lite-Instruct-8bit

NaNK
385
5

embeddinggemma-300m-bf16

The Model mlx-community/embeddinggemma-300m-bf16 was converted to MLX format from google/embeddinggemma-300m using mlx-lm version 0.0.4.

384
1

Qwen3-0.6B-bf16

NaNK
license:apache-2.0
382
4

Qwen2.5-3B-Instruct-8bit

NaNK
380
0

Nanonets-OCR2-3B-4bit

mlx-community/Nanonets-OCR2-3B-4bit This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
379
0

Meta-Llama-3-8B-Instruct

NaNK
llama
378
2

Qwen3-VL-8B-Instruct-bf16

mlx-community/Qwen3-VL-8B-Instruct-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
377
3

GLM-4.5-Air-bf16

This model mlx-community/GLM-4.5-Air-bf16 was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.28.2.

license:mit
375
0

Qwen3-VL-32B-Instruct-8bit

mlx-community/Qwen3-VL-32B-Instruct-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
374
1

Ling-1T-mlx-3bit

This model mlx-community/Ling-1T-mlx-3bit/ was converted to MLX format from inclusionAI/Ling-1T using mlx-lm version 0.28.1. You can find more similar MLX model quants for Apple Mac Studio with 512 GB at https://huggingface.co/bibproj

NaNK
license:mit
373
3

Llama-4-Scout-17B-16E-Instruct-4bit

NaNK
llama4
371
9

deepseek-r1-distill-qwen-1.5b

NaNK
370
23

Qwen2.5-VL-7B-Instruct-8bit

NaNK
license:apache-2.0
360
18

Apriel-1.5-15b-Thinker-4bit

mlx-community/Apriel-1.5-15b-Thinker-4bit This model was converted to MLX format from [`ServiceNow-AI/Apriel-1.5-15b-Thinker`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
356
2

SmolVLM-Instruct-4bit

NaNK
license:apache-2.0
355
5

dolphin-vision-72b-4bit

NaNK
353
7

Codestral-22B-v0.1-4bit

NaNK
352
13

gemma-3-270m-it-4bit

NaNK
352
8

Qwen3-Embedding-0.6B-8bit

NaNK
license:apache-2.0
350
0

CodeLlama-70b-Instruct-hf-4bit-MLX

NaNK
llama
345
25

Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ

NaNK
license:apache-2.0
343
5

Nanonets-OCR2-3B-bf16

mlx-community/Nanonets-OCR2-3B-bf16 This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
341
0

Qwen2.5-7B-Instruct-Uncensored-4bit

NaNK
license:gpl-3.0
340
4

gemma-3-1b-it-8bit

NaNK
340
3

exaone-4.0-1.2b-4bit

NaNK
339
0

LFM2-8B-A1B-8bit-MLX

Maintainer / Publisher: Susant Achary Upstream model: LiquidAI/LFM2-8B-A1B This repo (MLX 8-bit): `mlx-community/LFM2-8B-A1B-8bit-MLX` This repository provides an Apple-Silicon-optimized MLX build of LFM2-8B-A1B at 8-bit quantization for fast, on-device inference. - Architecture: Mixture-of-Experts (MoE) Transformer. - Size: ~8B total parameters with ~1B active per token (the “A1B” suffix commonly denotes ~1B active params). - Why MoE? During generation, only a subset of experts is activated per token, reducing compute per token while keeping a larger total parameter pool for expressivity. > Important memory note (single-device inference): > Although compute per token benefits from MoE (fewer active parameters), the full set of experts still resides in memory for typical single-GPU/CPU deployments. In practice this means RAM usage scales with total parameters, not with the smaller active count. - `config.json` (MLX), `mlxmodel.safetensors` (8-bit shards) - Tokenizer files: `tokenizer.json`, `tokenizerconfig.json` - Model metadata (e.g., `modelindex.json`) Target platform: macOS on Apple Silicon (M-series) using Metal/MPS. - General instruction-following, chat, and summarization - RAG back-ends and long-context workflows on device - Function-calling / structured outputs with schema-style prompts - Even at 8-bit, long contexts (KV-cache) can dominate memory at high `maxtokens` or large batch sizes. - As with any quantization, small regressions vs FP16 can appear on intricate math/code or edge-formatting. You asked to assume and decide RAM usage in absence of your measurements. Below are practical planning numbers derived from first-principles + experience with MLX and similar MoE models. Treat them as starting points and validate on your hardware. - Weights: `~ totalparams × 1 byte` (8-bit). For 8B params → ~8.0 GB baseline. - Runtime overhead: MLX graph + tensors + metadata → ~0.5–1.0 GB typical. - KV cache: grows with contextlength × layers × heads × dtype; often 1–3+ GB for long contexts. | Context window | Estimated peak RAM | |---|---:| | 4k tokens | ~9.5–10.5 GB | | 8k tokens | ~10.5–11.8 GB | | 16k tokens | ~12.0–14.0 GB | > These ranges assume 8-bit weights, A1B MoE (all experts resident), batch size = 1, and standard generation settings. > On lower windows (≤2k), you may see ~9–10 GB. Larger windows or batches will increase KV-cache and peak RAM. While this card is 8-bit, teams often want a consistent lineup. If you later produce 6/5/4/3/2-bit MLX builds, here’s a practical guide (RAM figures are indicative for an 8B MoE LM; your results depend on context/batch): | Variant | Typical Peak RAM | Relative Speed | Typical Behavior | When to choose | |---|---:|:---:|---|---| | 4-bit | ~7–8 GB | 🔥🔥🔥 | Better detail retention | If 3-bit drops too much fidelity | | 6-bit | ~9–10.5 GB | 🔥🔥 | Near-max MLX quality | If you want accuracy under quant | | 8-bit (this repo) | ~9.5–12+ GB | 🔥🔥 | Highest quality among quant tiers | When RAM allows and you want the most faithful outputs | > MoE caveat: MoE reduces compute per token, but unless experts are paged/partitioned across devices and loaded on demand, memory still follows total parameters. On a single Mac, plan RAM as if the whole 8B parameter set is resident. Deterministic generation ```bash python -m mlxlm.generate \ --model mlx-community/LFM2-8B-A1B-8bit-MLX \ --prompt "Summarize the following in 5 bullet points:\n " \ --max-tokens 256 \ --temperature 0.0 \ --device mps \ --seed 0

NaNK
338
2

gemma-3-12b-it-qat-abliterated-lm-4bit

NaNK
338
0

FastVLM-0.5B-bf16

NaNK
337
1

DeepSeek-R1-Distill-Qwen-32B-MLX-8Bit

NaNK
336
16

Qwen3-8B-6bit

NaNK
license:apache-2.0
335
4

gemma-3-27b-it-4bit

NaNK
333
9

Nanonets-OCR2-3B-8bit

mlx-community/Nanonets-OCR2-3B-8bit This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
331
0

GLM-4.5-Air-mxfp4

This model mlx-community/GLM-4.5-Air-mxfp4 was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.28.0.

license:mit
328
2

SmolVLM2-256M-Video-Instruct-mlx

license:apache-2.0
326
10

Qwen3-0.6B-4bit-DWQ-05092025

NaNK
license:apache-2.0
325
0

Dolphin-Mistral-24B-Venice-Edition-mlx-8Bit

NaNK
license:apache-2.0
323
4

LFM2-700M-8bit

NaNK
318
1

Kimi-VL-A3B-Thinking-4bit

NaNK
316
7

DeepSeek-R1-Distill-Llama-8B-4bit

NaNK
llama
311
10

Phi-3.5-vision-instruct-4bit

NaNK
license:mit
310
5

deepseek-vl2-8bit

NaNK
306
6

Qwen3-30B-A3B-4bit-DWQ

NaNK
license:apache-2.0
305
28

DeepSeek-V3-4bit

NaNK
305
8

Qwen3-VL-4B-Instruct-3bit

mlx-community/Qwen3-VL-4B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
303
0

Meta-Llama-3.1-8B-Instruct-8bit

NaNK
llama
302
10

embeddinggemma-300m-4bit

NaNK
300
2

DeepSeek-R1-Distill-Qwen-1.5B-3bit

NaNK
298
1

whisper-tiny.en-mlx

298
0

Llama-3.2-8X4B-MOE-V2-Dark-Champion-Instruct-uncensored-abliterated-21B-Q_6-MLX

NaNK
Llama 3.2
295
3

nomicai-modernbert-embed-base-4bit

NaNK
license:apache-2.0
295
0

GLM-4.5-4bit

NaNK
license:mit
294
16

Kimi-Linear-48B-A3B-Instruct-6bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-6bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK
license:mit
294
2

Qwen2.5-Coder-7B-Instruct-4bit

NaNK
license:apache-2.0
291
5

Llama-4-Maverick-17B-16E-Instruct-4bit

NaNK
llama4
290
7

phi-2-hf-4bit-mlx

NaNK
license:mit
289
1

Qwen3-VL-8B-Thinking-4bit

mlx-community/Qwen3-VL-8B-Thinking-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
286
0

Qwen2.5-0.5B-Instruct-8bit

NaNK
license:apache-2.0
285
0

granite-4.0-h-micro-8bit

This model mlx-community/granite-4.0-h-micro-8bit was converted to MLX format from ibm-granite/granite-4.0-h-micro using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
283
2

Ling-1T-mlx-DQ3_K_M

This model mlx-community/Ling-1T-mlx-DQ3KM was converted to MLX format from inclusionAI/Ling-1T using mlx-lm version 0.28.1. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. The 4-bit version of Ling 1T does not fit. Using research results, we aim to get 4-bit performance from a slightly smaller and smarter quantization. It should also not be so large that it leaves no memory for a useful context window. In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like

NaNK
280
0

OlmOCR 2 7B 1025 Bf16

mlx-community/olmOCR-2-7B-1025-bf16 This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
273
2

DeepSeek-R1-Distill-Qwen-14B-4bit

NaNK
270
7

GLM-4-9B-0414-4bit

NaNK
license:mit
270
1

embeddinggemma-300m-qat-q4_0-unquantized-bf16

mlx-community/embeddinggemma-300m-qat-q40-unquantized-bf16 The Model mlx-community/embeddinggemma-300m-qat-q40-unquantized-bf16 was converted to MLX format from google/embeddinggemma-300m-qat-q40-unquantized using mlx-lm version 0.0.4.

270
0

GLM-Z1-9B-0414-4bit

NaNK
license:mit
267
3

gemma-3-12b-it-4bit

NaNK
266
6

gemma-3-12b-it-bf16

NaNK
264
1

DeepSeek-R1-Distill-Qwen-32B-abliterated-4bit

NaNK
261
5

Qwen3-VL-32B-Instruct-4bit

mlx-community/Qwen3-VL-32B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
261
1

whisper-turbo

260
7

GLM-4-32B-0414-8bit

NaNK
license:mit
259
6

Apertus-8B-Instruct-2509-bf16

This model mlx-community/Apertus-8B-Instruct-2509-bf16 was converted to MLX format from swiss-ai/Apertus-8B-Instruct-2509 using mlx-lm version 0.27.0.

NaNK
license:apache-2.0
255
4

Qwen3-VL-8B-Thinking-bf16

mlx-community/Qwen3-VL-8B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
255
0

olmOCR-2-7B-1025-4bit

NaNK
license:apache-2.0
254
0

Qwen3-VL-4B-Thinking-bf16

mlx-community/Qwen3-VL-4B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
253
1

Meta-Llama-3.1-8B-Instruct-bf16

NaNK
llama
248
3

granite-4.0-h-tiny-3bit-MLX

Granite-4.0-H-Tiny — MLX 3-bit (Apple Silicon) Maintainer / Publisher: Susant Achary This repository provides an Apple-Silicon-optimized MLX build of IBM Granite-4.0-H-Tiny with 3-bit weight quantization (plus usage guidance for 2/4/5/6-bit variants if RAM allows). Granite 4.0 is IBM’s latest hybrid Mamba-2/Transformer family with selective Mixture-of-Experts (MoE), designed for long-context, hyper-efficient inference and enterprise use. :contentReference[oaicite:0]{index=0} 🔎 What’s Granite 4.0? - Architecture. Hybrid Mamba-2 + softmax attention; H variants add MoE routing (sparse activation). Aims to keep expressivity while dramatically reducing memory footprint. :contentReference[oaicite:1]{index=1} - Efficiency claims. Up to ~70% lower memory and ~2× faster inference vs. comparable models, especially for multi-session and long-context scenarios. :contentReference[oaicite:2]{index=2} - Context window. 128k tokens (Tiny/Base preview cards). :contentReference[oaicite:3]{index=3} - Licensing. Apache-2.0 for public/commercial use. :contentReference[oaicite:4]{index=4} > This MLX build targets Granite-4.0-H-Tiny (≈ 7B total, ≈ 1B active parameters). For reference, the family also includes H-Small (≈32B total / 9B active) and Micro/Micro-H (≈3B dense/hybrid) tiers. :contentReference[oaicite:5]{index=5} 📦 What’s in this repo (MLX format) - `config.json` (MLX), `mlxmodel.safetensors` (3-bit shards), tokenizer files, and processor metadata. - Ready for macOS on M-series chips via Metal/MPS. > The upstream Hugging Face model cards for Granite 4.0 (Tiny/Small) provide additional training details, staged curricula and alignment workflow. Start here for Tiny: ibm-granite/granite-4.0-h-tiny. :contentReference[oaicite:6]{index=6} ✅ Intended use - General instruction-following and chat with long context (128k). :contentReference[oaicite:7]{index=7} - Enterprise assistant patterns (function calling, structured outputs) and RAG backends that benefit from efficient, large windows. :contentReference[oaicite:8]{index=8} - On-device development on Macs (MLX), low-latency local prototyping and evaluation. ⚠️ Limitations - As a quantized, decoder-only LM, it can produce confident but wrong outputs—review for critical use. - 2–4-bit quantization may reduce precision on intricate tasks (math/code, tiny-text parsing); prefer higher bit-widths if RAM allows. - Follow your organization’s safety/PII/guardrail policies (Granite is “open-weight,” not a full product). :contentReference[oaicite:9]{index=9} 🧠 Model family at a glance | Tier | Arch | Params (total / active) | Notes | |---|---|---:|---| | H-Small | Hybrid + MoE | ~32B / 9B | Workhorse for enterprise agent tasks; strong function-calling & instruction following. :contentReference[oaicite:10]{index=10} | | H-Tiny (this repo) | Hybrid + MoE | ~7B / 1B | Long-context, efficiency-first; great for local dev. :contentReference[oaicite:11]{index=11} | | Micro / H-Micro | Dense / Hybrid | ~3B | Edge/low-resource alternatives; when hybrid runtime isn’t optimized. :contentReference[oaicite:12]{index=12} | Context Window: up to 128k tokens for Tiny/Base preview lines. :contentReference[oaicite:13]{index=13} License: Apache-2.0. :contentReference[oaicite:14]{index=14} 🧪 Observed on-device behavior (MLX) Empirically on M-series Macs: - 3-bit often gives crisp, direct answers with good latency and modest RAM. - Higher bit-widths (4/5/6-bit) improve faithfulness on fine-grained tasks (tiny OCR, structured parsing), at higher memory cost. > Performance varies by Mac model, image/token lengths, and temperature; validate on your workload. 🔢 Choosing a quantization level (Apple Silicon) | Variant | Typical Peak RAM (7B-class) | Relative speed | Typical behavior | When to choose | |---|---:|:---:|---|---| | 2-bit | ~3–4 GB | 🔥🔥🔥🔥 | Smallest footprint; most lossy | Minimal RAM devices / smoke tests | | 3-bit (this build) | ~5–6 GB | 🔥🔥🔥🔥 | Direct, concise, great latency | Default for local dev on M1/M2/M3/M4 | | 4-bit | ~6–7.5 GB | 🔥🔥🔥 | Better detail retention | When you need stronger faithfulness | | 5-bit | ~8–9 GB | 🔥🔥☆ | Higher fidelity | For heavy docs / structured outputs | | 6-bit | ~9.5–11 GB | 🔥🔥 | Max quality under MLX quant | If RAM headroom is ample | > Figures are indicative for language-only Tiny (no vision), and will vary with context length and KV cache size. 🚀 Quickstart (CLI — MLX) ```bash Plain generation (deterministic) python -m mlxlm.generate \ --model \ --prompt "Summarize the following notes into 5 bullet points:\n " \ --max-tokens 200 \ --temperature 0.0 \ --device mps \ --seed 0

NaNK
license:apache-2.0
246
2

GLM-4-32B-0414-4bit

NaNK
license:mit
244
5

CodeLlama-13b-Instruct-hf-4bit-MLX

NaNK
llama
244
2

Nanonets-OCR-s-bf16

NaNK
241
2

Qwen3-VL-32B-Thinking-4bit

mlx-community/Qwen3-VL-32B-Thinking-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
239
1

Qwen3-VL-2B-Instruct-3bit

mlx-community/Qwen3-VL-2B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
239
0

distil-whisper-large-v3

236
15

DeepSeek-R1-0528-4bit

NaNK
235
17

GLM-4.5-Air-2bit

This model mlx-community/GLM-4.5-Air-2bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.1.

NaNK
license:mit
235
4

InternVL3_5-GPT-OSS-20B-A4B-Preview-4bit

mlx-community/InternVL35-GPT-OSS-20B-A4B-Preview-4bit This model was converted to MLX format from [`OpenGVLab/InternVL35-GPT-OSS-20B-A4B-Preview-HF`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
235
0

plamo-2-translate

NaNK
230
12

Llama-3.2-11B-Vision-Instruct-4bit

NaNK
mllama
230
6

Kokoro-82M-4bit

NaNK
license:apache-2.0
230
4

CodeLlama-7b-Python-4bit-MLX

NaNK
llama
229
14

gemma-3-12b-it-8bit

NaNK
229
2

Qwen2.5-1.5B-Instruct-8bit

NaNK
license:apache-2.0
228
1

Qwen3-VL-4B-Thinking-4bit

mlx-community/Qwen3-VL-4B-Thinking-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
228
0

Qwen2.5-14B-Instruct-4bit

NaNK
license:apache-2.0
227
10

Mixtral-8x7B-Instruct-v0.1

NaNK
license:apache-2.0
226
23

parakeet-tdt-1.1b

NaNK
license:cc-by-4.0
226
1

Qwen3-Next-80B-A3B-Instruct-8bit

NaNK
license:apache-2.0
225
8

Llama-4-Scout-17B-16E-Instruct-8bit

NaNK
llama4
224
3

Qwen3 VL 8B Thinking 6bit

mlx-community/Qwen3-VL-8B-Thinking-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
224
2

Qwen2-VL-7B-Instruct-4bit

NaNK
license:apache-2.0
223
2

gemma-3-27b-it-qat-8bit

NaNK
222
9

DeepSeek-V3.1-8bit

NaNK
license:mit
222
3

GLM-4.5V-8bit

NaNK
license:mit
222
2

Hermes-3-Llama-3.1-8B-4bit

NaNK
llama
221
4

Qwen3-VL-32B-Thinking-bf16

NaNK
license:apache-2.0
217
0

parakeet-ctc-0.6b

NaNK
license:cc-by-4.0
216
2

Llama-4-Scout-17B-16E-Instruct-6bit

NaNK
llama4
214
5

deepcogito-cogito-v1-preview-llama-8B-4bit

NaNK
llama
214
0

Qwen3-VL-30B-A3B-Instruct-6bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
213
0

mxbai-embed-large-v1

NaNK
license:apache-2.0
211
3

Llama-3-8B-Instruct-1048k-4bit

NaNK
llama
210
25

OpenELM-270M-Instruct

210
5

GLM-4.5-Air-8bit

This model mlx-community/GLM-4.5-Air-8bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.0.

NaNK
license:mit
209
6

Qwen3-VL-4B-Instruct-5bit

mlx-community/Qwen3-VL-4B-Instruct-5bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
208
0

Mistral-7B-Instruct-v0.2

NaNK
license:apache-2.0
207
20

DeepSeek-R1-Distill-Llama-70B-4bit

NaNK
llama
206
8

Qwen3-VL-32B-Thinking-8bit

NaNK
license:apache-2.0
206
0

GLM-4.5-Air-3bit-DWQ-v2

NaNK
license:mit
202
3

Qwen3-VL-8B-Instruct-8bit

mlx-community/Qwen3-VL-8B-Instruct-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
202
2

Nous-Hermes-2-Mixtral-8x7B-DPO-4bit

NaNK
license:apache-2.0
201
18

Phi-3-mini-128k-instruct-4bit

NaNK
license:mit
200
12

Qwen2.5-VL-72B-Instruct-4bit

NaNK
198
7

Meta-Llama-3.1-405B-4bit

NaNK
llama
198
5

Qwen3-Next-80B-A3B-Thinking-4bit

This model mlx-community/Qwen3-Next-80B-A3B-Thinking-4bit was converted to MLX format from Qwen/Qwen3-Next-80B-A3B-Thinking using mlx-lm version 0.27.1.

NaNK
license:apache-2.0
197
3

Jinx-gpt-oss-20b-mxfp4-mlx

This model mlx-community/Jinx-gpt-oss-20b-mxfp4-mlx was converted to MLX format from Jinx-org/Jinx-gpt-oss-20b-mxfp4 using mlx-lm version 0.27.1.

NaNK
license:apache-2.0
197
1

Llama-4-Scout-17B-16E-4bit

NaNK
llama4
194
2

Qwen3-14B-4bit

NaNK
license:apache-2.0
194
1

NVIDIA-Nemotron-Nano-9B-v2-4bits

NaNK
193
2

Kimi-K2-Instruct-0905-mlx-3bit

mlx-community/moonshotaiKimi-K2-Instruct-0905-mlx-3bit This model mlx-community/moonshotaiKimi-K2-Instruct-0905-mlx-3bit was converted to MLX format from moonshotai/Kimi-K2-Instruct-0905 using mlx-lm version 0.26.3.

NaNK
191
1

Llama-3_3-Nemotron-Super-49B-v1_5-mlx-4Bit

mlx-community/Llama-33-Nemotron-Super-49B-v15-mlx-4Bit The Model mlx-community/Llama-33-Nemotron-Super-49B-v15-mlx-4Bit was converted to MLX format from unsloth/Llama-33-Nemotron-Super-49B-v15 using mlx-lm version 0.26.4.

NaNK
unsloth - llama-3 - pytorch
189
2

gemma-2-27b-it-4bit

NaNK
188
8

Qwen3-VL-30B-A3B-Instruct-3bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
186
0

DeepSeek-Coder-V2-Lite-Instruct-4bit-AWQ

NaNK
185
0

chandra-bf16

185
0

Qwen3-1.7B-MLX-MXFP4

This model mlx-community/Qwen3-1.7B-MLX-MXFP4 was converted to MLX format from Qwen/Qwen3-1.7B using mlx-lm version 0.28.3.

NaNK
license:apache-2.0
183
1

Phi-3-mini-4k-instruct-4bit-no-q-embed

NaNK
license:mit
182
3

gemma-3-27b-it-8bit

NaNK
180
7

Qwen3-VL-30B-A3B-Thinking-6bit

mlx-community/Qwen3-VL-30B-A3B-Thinking-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
179
0

NousResearch_Hermes-4-14B-BF16-abliterated-mlx

NaNK
license:apache-2.0
178
1

gemma-3-4b-it-5bit

This model mlx-community/gemma-3-4b-it-5bit was converted to MLX format from google/gemma-3-4b-it using mlx-lm version 0.28.2.

NaNK
178
0

Chandra 4bit

mlx-community/chandra-4bit This model was converted to MLX format from [`datalab-to/chandra`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
177
4

olmOCR-2-7B-1025-8bit

mlx-community/olmOCR-2-7B-1025-8bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
176
0

Llama-3.1-Nemotron-70B-Instruct-HF-bf16

NaNK
llama
175
1

Qwen3-4B-6bit

NaNK
license:apache-2.0
174
0

Mistral-7B-Instruct-v0.2-4bit

NaNK
license:apache-2.0
172
1

Llama-3.2-90B-Vision-Instruct-4bit

NaNK
mllama
171
4

GLM-4.5V-abliterated-4bit

mlx-community/GLM-4.5V-abliterated-4bit This model was converted to MLX format from [`huihui-ai/Huihui-GLM-4.5V-abliterated`]() using mlx-vlm. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
171
1

quantized-gemma-2b-it

NaNK
170
10

Meta-Llama-3-70B-Instruct-4bit

NaNK
llama
168
7

olmOCR-2-7B-1025-mlx-8bit

mlx-community/olmOCR-2-7B-1025-mlx-8bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
165
1

TinyLlama-1.1B-Chat-v1.0-4bit

NaNK
llama
164
0

Unsloth-Phi-4-4bit

NaNK
llama
162
5

Qwen2.5-Coder-14B-Instruct-4bit

NaNK
license:apache-2.0
162
4

GLM-4.5V-abliterated-8bit

mlx-community/GLM-4.5V-abliterated-8bit This model was converted to MLX format from [`huihui-ai/Huihui-GLM-4.5V-abliterated`]() using mlx-vlm. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
162
1

jinaai-ReaderLM-v2

NaNK
license:mit
161
23

Apertus-8B-Instruct-2509-4bit

NaNK
license:apache-2.0
161
1

Meta-Llama-3.1-70B-Instruct-bf16-CORRECTED

NaNK
llama
161
0

Qwen3-VL-4B-Thinking-8bit

mlx-community/Qwen3-VL-4B-Thinking-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
160
0

paligemma-3b-mix-448-8bit

NaNK
159
7

whisper-tiny-mlx

159
2

phi-4-4bit

NaNK
license:mit
158
19

llava-phi-3-mini-4bit

NaNK
license:apache-2.0
158
9

GLM-4.5-Air-3bit-DWQ

NaNK
license:mit
158
4

Qwen2.5-Coder-1.5B-Instruct-4bit

NaNK
license:apache-2.0
157
1

granite-4.0-h-1b-6bit

This model mlx-community/granite-4.0-h-1b-6bit was converted to MLX format from ibm-granite/granite-4.0-h-1b using mlx-lm version 0.28.4.

NaNK
license:apache-2.0
157
0

Qwen2.5-32B-Instruct-4bit

NaNK
license:apache-2.0
156
4

Mistral-Large-Instruct-2407-4bit

NaNK
156
1

Apriel-1.5-15b-Thinker-8bit

NaNK
license:mit
156
0

Qwen3-14B-4bit-AWQ

NaNK
license:apache-2.0
155
4

DeepSeek-R1-Qwen3-0528-8B-4bit-AWQ

NaNK
license:mit
155
4

granite-4.0-h-1b-8bit

This model mlx-community/granite-4.0-h-1b-8bit was converted to MLX format from ibm-granite/granite-4.0-h-1b using mlx-lm version 0.28.4.

NaNK
license:apache-2.0
153
1

Qwen3-4B-Thinking-2507-fp16

NaNK
license:apache-2.0
153
0

granite-4.0-h-350m-8bit

NaNK
license:apache-2.0
153
0

Qwen2.5-Coder-32B-Instruct-4bit

NaNK
license:apache-2.0
152
10

Huihui-gemma-3n-E4B-it-abliterated-lm-8bit

NaNK
149
1

Phi-3-vision-128k-instruct-4bit

NaNK
license:mit
148
8

Nous-Hermes-2-Mistral-7B-DPO-4bit-MLX

NaNK
license:apache-2.0
148
5

Josiefied Qwen3 30B A3B Abliterated V2 4bit

NaNK
145
2

AI21-Jamba-Reasoning-3B-4bit

This model mlx-community/AI21-Jamba-Reasoning-3B-4bit was converted to MLX format from ai21labs/AI21-Jamba-Reasoning-3B using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
145
0

DeepSeek-Coder-V2-Instruct-AQ4_1

144
3

Josiefied-Qwen3-4B-Instruct-2507-abliterated-v1-8bit

NaNK
143
0

Ministral-8B-Instruct-2410-4bit

NaNK
142
9

Josiefied-Qwen3-8B-abliterated-v1-4bit

NaNK
142
2

UTENA-7B-NSFW-V2-4bit

NaNK
142
1

olmOCR-2-7B-1025-mlx-4bit

mlx-community/olmOCR-2-7B-1025-mlx-4bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
142
1

parakeet-tdt_ctc-1.1b

NaNK
license:cc-by-4.0
142
0

DeepSeek-Coder-V2-Lite-Instruct-4bit

NaNK
142
0

SmolVLM2-2.2B-Instruct-mlx

NaNK
license:apache-2.0
141
8

Mistral-7B-v0.1-LoRA-Text2SQL

NaNK
license:mit
141
2

gemma-3n-E2B-it-lm-bf16

NaNK
141
0

Kimi-Linear-48B-A3B-Instruct-3bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-3bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK
license:mit
141
0

csm-1b

NaNK
license:apache-2.0
139
20

Llama-4-Maverick-17B-16E-Instruct-6bit

NaNK
llama4
139
2

SmolLM-135M-4bit

NaNK
llama
139
1

DeepSeek-V3.1-mlx-DQ5_K_M

This model mlx-community/DeepSeek-V3.1-mlx-DQ5KM was converted to MLX format from deepseek-ai/DeepSeek-V3.1 using mlx-lm version 0.26.3. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. With 512 GB, we can do better than the 4-bit version of DeepSeek V3.1. Using research results, we aim to get better than 5-bit performance using smarter quantization. We aim to not have the quant so large that it leaves no memory for a useful context window. The temperature of 1.3 is DeepSeek's recommendation for translations. For coding, you should probably use a temperature of 0.6 or lower. In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In this case we did not want a improved 3-bit quant, but rather the best possible "5-bit" quant. We therefore modified the `DQ3KM` quantization by replacing 3-bit by 5-bit, 4-bit by 6-bit, and 6-bit by 8-bit to create a new `DQ5KM` quant. This produces a quantization of 5.638 bpw (bits per weight). In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like Should you wish to squeeze more out of your quant, and you do not need to use a larger context window, you can change the last part of the above code to

NaNK
license:mit
139
1

Ring-flash-linear-2.0-128k-4bit

This model mlx-community/Ring-flash-linear-2.0-128k-4bit was converted to MLX format from inclusionAI/Ring-flash-linear-2.0-128k using mlx-lm version 0.28.2.

NaNK
license:mit
139
1

Qwen3-Coder-30B-A3B-Instruct-3bit

This model mlx-community/Qwen3-Coder-30B-A3B-Instruct-3bit was converted to MLX format from Qwen/Qwen3-Coder-30B-A3B-Instruct using mlx-lm version 0.26.1.

NaNK
license:apache-2.0
139
0

whisper-large-v3-mlx-8bit

NaNK
138
5

Qwen3-30B-A3B-bf16

NaNK
license:apache-2.0
138
2

Qwen3-30B-A3B-Instruct-2507-6bit

NaNK
license:apache-2.0
137
0

meta-llama-Llama-4-Scout-17B-16E-4bit

NaNK
llama4
136
7

Qwen3-235B-A22B-Thinking-2507-3bit-DWQ

mlx-community/Qwen3-235B-A22B-Thinking-2507-3bit-DWQ This model mlx-community/Qwen3-235B-A22B-Thinking-2507-3bit-DWQ was converted to MLX format from Qwen/Qwen3-235B-A22B-Thinking-2507 using mlx-lm version 0.26.0.

NaNK
license:apache-2.0
136
6

DeepSeek-R1-Distill-Qwen-14B-8bit

NaNK
136
5

gemma-3-27b-it-qat-bf16

NaNK
136
5

GLM-4.5-Air-2bit-DWQ

This model mlx-community/GLM-4.5-Air-2bit-DWQ was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.2.

NaNK
license:mit
136
2

GLM-4-9B-0414-8bit

NaNK
license:mit
135
0

DeepSeek-V3.1-Base-4bit

NaNK
license:mit
134
3

deepseek-coder-33b-instruct-hf-4bit-mlx

NaNK
llama
134
1

Qwen3-VL-30B-A3B-Instruct-5bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-5bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
134
0

Qwen3-Next-80B-A3B-Thinking-8bit

NaNK
license:apache-2.0
133
2

moonshotai_Kimi-K2-Instruct-mlx-3bit

This model mlx-community/moonshotaiKimi-K2-Instruct-mlx-3bit was converted to MLX format from moonshotai/Kimi-K2-Instruct using mlx-lm version 0.26.3.

NaNK
133
0

UserLM-8b-8bit

NaNK
llama
133
0

Qwen2.5-7B-Instruct-1M-4bit

NaNK
license:apache-2.0
132
10

Llama-3.1-8B-Instruct

NaNK
llama
132
5

Llama-4-Maverick-17B-128E-Instruct-4bit

NaNK
llama4
132
2

Apriel 1.5 15b Thinker 6bit MLX

Apriel-1.5-15B-Thinker — MLX Quantized (Apple Silicon) Format: MLX (Apple Silicon) Variants: 6-bit (recommended) Base model: ServiceNow-AI/Apriel-1.5-15B-Thinker Architecture: Pixtral-style LLaVA (vision encoder → 2-layer projector → decoder) Intended use: image understanding & grounded reasoning; document/chart/OCR-style tasks; math/coding Q&A with visual context. > This repository provides MLX-format weights for Apple Silicon (M-series) built from the original Apriel-1.5-15B-Thinker release. It is optimized for on-device inference with small memory footprints and fast startup on macOS. Apriel-1.5-15B-Thinker is a 15B open-weights multimodal reasoning model trained via a data-centric mid-training recipe rather than RLHF/RM. Starting from Pixtral-12B as the base, the authors apply: 1) Depth Upscaling (capacity expansion without pretraining from scratch), 2) Two-stage multimodal continual pretraining (CPT) to build text + visual reasoning, and 3) High-quality SFT with explicit reasoning traces across math, coding, science, and tool use. This approach delivers frontier-level capability on compact compute. :contentReference[oaicite:0]{index=0} Key reported results (original model) - AAI Index: 52, matching DeepSeek-R1-0528 at far lower compute. :contentReference[oaicite:1]{index=1} - Multimodal: On 10 image benchmarks, within ~5 points of Gemini-2.5-Flash and Claude Sonnet-3.7 on average. :contentReference[oaicite:2]{index=2} - Designed for single-GPU / constrained deployment scenarios. :contentReference[oaicite:3]{index=3} > Notes above summarize the upstream paper; MLX quantization can slightly affect absolute scores. Always validate on your use case. - Backbone: Pixtral-12B-Base-2409 adapted to a larger 15B decoder via depth upscaling (layers 40 → 48), then re-aligned with a 2-layer projection network connecting the vision encoder and decoder. :contentReference[oaicite:4]{index=4} - Training stack: - CPT Stage-1: mixed tokens (≈50% text, 20% replay, 30% multimodal) for foundational reasoning & image understanding; 32k context; cosine LR with warmup; all components unfrozen; checkpoint averaging. :contentReference[oaicite:5]{index=5} - CPT Stage-2: targeted synthetic visual tasks (reconstruction, visual matching, detection, counting) to strengthen spatial/compositional/fine-grained reasoning; vision encoder frozen; loss on responses for instruct data; 16k context. :contentReference[oaicite:6]{index=6} - SFT: curated instruction-response pairs with explicit reasoning traces (math, coding, science, tools). :contentReference[oaicite:7]{index=7} - Why MLX? Native Apple-Silicon inference with small binaries, fast load, and low memory overhead. - What’s included: `config.json`, `mlxmodel.safetensors` (sharded), tokenizer & processor files, and metadata for VLM pipelines. - Quantization options: - 6-bit (recommended): best balance of quality & memory. > Tip: If you’re capacity-constrained on an M1/M2, try 6-bit first; ```bash Basic image caption python -m mlxvlm.generate \ --model \ --image /path/to/image.jpg \ --prompt "Describe this image." \ --max-tokens 128 --temperature 0.0 --device mps

NaNK
license:mit
132
1

DeepSeek-R1-0528-Qwen3-8B-4bit-DWQ

NaNK
license:mit
131
8

all-MiniLM-L6-v2-4bit

NaNK
license:apache-2.0
131
1

InternVL3_5-30B-A3B-4bit

mlx-community/InternVL35-30B-A3B-4bit This model was converted to MLX format from [`OpenGVLab/InternVL35-30B-A3B-HF`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
131
0

mistral-7B-v0.1

NaNK
license:apache-2.0
130
10

LFM2-8B-A1B-8bit

NaNK
130
1

Qwen3-VL-30B-A3B-Thinking-5bit

NaNK
license:apache-2.0
130
0

DeepSeek-R1-Distill-Qwen-14B-6bit

NaNK
129
6

Codestral-22B-v0.1-8bit

NaNK
128
8

GLM-Z1-32B-0414-4bit

NaNK
license:mit
128
2

Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8

NaNK
license:apache-2.0
128
1

bge-small-en-v1.5-4bit

NaNK
license:mit
128
0

DeepSeek-R1-3bit

NaNK
127
15

Nanonets-OCR2-3B-6bit

mlx-community/Nanonets-OCR2-3B-6bit This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
127
0

DeepSeek-v3-0324-8bit

NaNK
license:mit
126
1

Ring-1T-mlx-DQ3_K_M

This model mlx-community/Ring-1T-mlx-DQ3KM was converted to MLX format from inclusionAI/Ring-1T using mlx-lm version 0.28.1. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. The 4-bit version of Ring 1T does not fit. Using research results, we aim to get 4-bit performance from a slightly smaller and smarter quantization. It should also not be so large that it leaves no memory for a useful context window. In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like

NaNK
126
1

olmOCR-2-7B-1025-5bit

mlx-community/olmOCR-2-7B-1025-5bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
126
0

DeepSeek-R1-Distill-Qwen-7B-8bit

NaNK
125
8

plamo-2-1b

NaNK
license:apache-2.0
125
4

Llama-3.2-3B-Instruct-abliterated-6bit

NaNK
llama
125
0

embeddinggemma-300m-qat-q8_0-unquantized-bf16

mlx-community/embeddinggemma-300m-qat-q80-unquantized-bf16 The Model mlx-community/embeddinggemma-300m-qat-q80-unquantized-bf16 was converted to MLX format from google/embeddinggemma-300m-qat-q80-unquantized using mlx-lm version 0.0.4.

125
0

Qwen3-4B-Instruct-2507-8bit

This model mlx-community/Qwen3-4B-Instruct-2507-8bit was converted to MLX format from Qwen/Qwen3-4B-Instruct-2507 using mlx-lm version 0.26.2.

NaNK
license:apache-2.0
124
4

Llama-3.3-70B-Instruct-bf16

NaNK
llama
124
1

Qwen3-VL-32B-Instruct-bf16

mlx-community/Qwen3-VL-32B-Instruct-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
124
0

codegemma-7b-it-8bit

NaNK
123
5

Llama-3.1-8B-Instruct-4bit

NaNK
llama
123
2

Qwen3-Next-80B-A3B-Instruct-5bit

NaNK
license:apache-2.0
123
2

granite-3.3-8b-instruct-4bit

NaNK
license:apache-2.0
122
1

Qwen3-8B-4bit-DWQ-053125

NaNK
license:apache-2.0
122
1

c4ai-command-r-plus-4bit

NaNK
license:cc-by-nc-4.0
121
49

Qwen2.5-72B-Instruct-4bit

NaNK
121
5

gemma-3-27b-it-4bit-DWQ

NaNK
121
3

dolphin-2.9-llama3-70b-4bit

NaNK
llama
120
5

Mistral-Small-24B-Instruct-2501-4bit

NaNK
license:apache-2.0
119
14

llava-v1.6-mistral-7b-4bit

NaNK
license:apache-2.0
119
5

gemma-3-1b-it-bf16

NaNK
119
1

dac-speech-24khz-1.5kbps

NaNK
119
1

Llama-OuteTTS-1.0-1B-4bit

NaNK
llama
119
1

LongCat-Flash-Chat-4bit

NaNK
license:mit
119
1

granite-4.0-h-1b-base-8bit

This model mlx-community/granite-4.0-h-1b-base-8bit was converted to MLX format from ibm-granite/granite-4.0-h-1b-base using mlx-lm version 0.28.4.

NaNK
license:apache-2.0
119
1

Llama-3.3-70B-Instruct-3bit

NaNK
llama
117
7

deepseek-coder-33b-instruct

NaNK
llama
117
0

bitnet-b1.58-2B-4T-4bit

NaNK
license:mit
116
0

Kimi-Linear-48B-A3B-Instruct-5bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-5bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK
license:mit
116
0

MinerU2.5-2509-1.2B-bf16

mlx-community/MinerU2.5-2509-1.2B-bf16 This model was converted to MLX format from [`opendatalab/MinerU2.5-2509-1.2B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:agpl-3.0
115
0

Mistral-Small-3.1-24B-Instruct-2503-4bit

NaNK
license:apache-2.0
114
9

Mixtral-8x7B-Instruct-v0.1-hf-4bit-mlx

NaNK
license:apache-2.0
114
7

Llama-3.1-Nemotron-Nano-4B-v1.1-4bit

NaNK
llama
114
0

Apriel-1.5-15b-Thinker-bf16

mlx-community/Apriel-1.5-15b-Thinker-bf16 This model was converted to MLX format from [`ServiceNow-AI/Apriel-1.5-15b-Thinker`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
114
0

Qwen3-30B-A3B-Thinking-2507-4bit

This model mlx-community/Qwen3-30B-A3B-Thinking-2507-4bit was converted to MLX format from Qwen/Qwen3-30B-A3B-Thinking-2507 using mlx-lm version 0.26.3.

NaNK
license:apache-2.0
113
3

LFM2-8B-A1B-fp16

NaNK
113
2

Qwen2.5-VL-32B-Instruct-4bit

NaNK
license:apache-2.0
112
4

Qwen3-14B-4bit-DWQ-053125

NaNK
license:apache-2.0
112
4

meta-llama-Llama-4-Scout-17B-16E-fp16

NaNK
llama4
112
3

gemma-3-4b-it-bf16

NaNK
112
1

deepseek-coder-6.7b-instruct-hf-4bit-mlx

NaNK
llama
112
0

gemma-3-1b-it-4bit-DWQ

NaNK
112
0

gemma-3n-E4B-it-bf16

NaNK
111
12

LongCat-Flash-Chat-mlx-DQ6_K_M

NaNK
111
1

gemma-3-270m-it-bf16

111
1

whisper-medium-mlx-4bit

NaNK
111
0

Qwen3-14B-6bit

NaNK
license:apache-2.0
111
0

gpt2-base-mlx

NaNK
license:mit
111
0

LFM2-VL-450M-8bit

NaNK
110
10

starcoder2-7b-4bit

NaNK
110
2

Ling-mini-2.0-4bit

This model mlx-community/Ling-mini-2.0-4bit was converted to MLX format from inclusionAI/Ling-mini-2.0 using mlx-lm version 0.27.1.

NaNK
license:mit
110
1

LLaDA2.0-mini-preview-4bit

This model mlx-community/LLaDA2.0-mini-preview-4bit was converted to MLX format from inclusionAI/LLaDA2.0-mini-preview using mlx-lm version 0.28.4.

NaNK
license:apache-2.0
110
1

Qwen3-4B-4bit-DWQ-053125

NaNK
license:apache-2.0
109
2

Dolphin-Mistral-24B-Venice-Edition-4bit

NaNK
license:apache-2.0
109
1

Llama-3-8B-Instruct-1048k-8bit

NaNK
llama
108
17

conikeec-deepseek-coder-6.7b-instruct

NaNK
llama
108
1

Josiefied-DeepSeek-R1-0528-Qwen3-8B-abliterated-v1-4bit

NaNK
108
1

Apertus-8B-Instruct-2509-8bit

NaNK
license:apache-2.0
108
0

Gemma-3-Glitter-12B-8bit

NaNK
108
0

gemma-3-12b-it-4bit-DWQ

NaNK
107
2

Gabliterated-Qwen3-0.6B-4bit

NaNK
license:apache-2.0
107
0

gemma-3-270m-4bit

NaNK
107
0

Qwen3-VL-2B-Thinking-bf16

mlx-community/Qwen3-VL-2B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
107
0

gemma-2-27b-bf16

NaNK
106
0

Qwen3-VL-4B-Instruct-6bit

mlx-community/Qwen3-VL-4B-Instruct-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
106
0

Mistral-7B-Instruct-v0.3-8bit

NaNK
license:apache-2.0
105
3

LFM2-VL-3B-8bit

NaNK
105
2

nomicai-modernbert-embed-base-bf16

license:apache-2.0
105
0

bitnet-b1.58-2B-4T-8bit

NaNK
license:mit
105
0

Qwen3-Coder-30B-A3B-Instruct-bf16

This model mlx-community/Qwen3-Coder-30B-A3B-Instruct-bf16 was converted to MLX format from Qwen/Qwen3-Coder-30B-A3B-Instruct using mlx-lm version 0.26.2.

NaNK
license:apache-2.0
105
0

LFM2-8B-A1B-6bit

This model mlx-community/LFM2-8B-A1B-6bit was converted to MLX format from LiquidAI/LFM2-8B-A1B using mlx-lm version 0.28.2.

NaNK
105
0

gemma-3n-E4B-it-lm-bf16

NaNK
103
4

Qwen2.5-Coder-1.5B-4bit

NaNK
license:apache-2.0
103
2

gemma-3-270m-it-qat-4bit

This model mlx-community/gemma-3-270m-it-qat-4bit was converted to MLX format from google/gemma-3-270m-it-qat using mlx-lm version 0.26.3.

NaNK
103
1

DeepSeek-R1-Distill-Qwen-1.5B-6bit

NaNK
103
0

medgemma-27b-it-8bit

NaNK
103
0

gemma-3-27b-it-bf16

NaNK
102
4

orpheus-3b-0.1-ft-4bit

NaNK
llama
102
3

meta-llama-Llama-4-Scout-17B-16E-Instruct-bf16

NaNK
llama4
102
0

c4ai-command-r-v01-4bit

NaNK
101
23

Llama-3.2-8X4B-MOE-V2-Dark-Champion-Instruct-uncensored-abliterated-21B-MLX

NaNK
Llama 3.2
101
1

Qwen3-1.7B-4bit-DWQ-053125

NaNK
license:apache-2.0
100
2

Qwen3-4B-Instruct-2507-5bit

NaNK
license:apache-2.0
100
1

LFM2-8B-A1B-6bit-MLX

Maintainer / Publisher: Susant Achary Upstream model: LiquidAI/LFM2-8B-A1B This repo (MLX 6-bit): `mlx-community/LFM2-8B-A1B-6bit-MLX` This repository provides an Apple-Silicon-optimized MLX build of LFM2-8B-A1B at 6-bit quantization. Among quantized tiers, 6-bit is a strong fidelity sweet-spot for many Macs—noticeably smaller than FP16/8-bit while preserving answer quality for instruction following, summarization, and structured extraction. - Architecture: Mixture-of-Experts (MoE) Transformer. - Size: ~8B total parameters with ~1B active per token (A1B ≈ “~1B active”). - Why MoE? At each token, a subset of experts is activated, reducing compute per token while keeping a larger parameter pool for expressivity. > Single-device memory reality: Even though only ~1B are active per token, all experts typically reside in memory during inference on one device. That means RAM planning should track total parameters, not just the active slice. - `config.json` (MLX), `mlxmodel.safetensors` (6-bit shards) - Tokenizer files: `tokenizer.json`, `tokenizerconfig.json` - Model metadata (e.g., `modelindex.json`) Target: macOS on Apple Silicon (M-series) with Metal/MPS. - General instruction following, chat, and summarization - RAG and long-context assistants on device - Schema-guided structured outputs (JSON) - Quantization can cause small regressions vs FP16 on tricky math/code or tight formatting. - For very long contexts and/or batching, the KV-cache can dominate memory—tune `maxtokens` and batch size. - Add your own safety/guardrails for sensitive deployments. You asked to assume and decide realistic ranges. The following are practical starting points for a single-device MLX run; validate on your hardware. Rule-of-thumb components - Weights (6-bit): ≈ `totalparams × 0.75 byte` → for 8B params ≈ ~6.0 GB

NaNK
100
0

Josiefied-Qwen2.5-7B-Instruct-abliterated-v2

NaNK
license:apache-2.0
99
3

deepseek-coder-1.3b-instruct-mlx

NaNK
llama
99
1

Qwen2.5-Coder-32B-Instruct-8bit

NaNK
license:apache-2.0
98
11

Qwen2.5-VL-3B-Instruct-bf16

NaNK
97
4

gemma-3-4b-it-4bit-DWQ

NaNK
97
1

Qwen3-1.7B-8bit

NaNK
license:apache-2.0
97
0

Huihui-gemma-3n-E4B-it-abliterated-lm-6bit

mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-6bit The Model mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-6bit was converted to MLX format from huihui-ai/Huihui-gemma-3n-E4B-it-abliterated using mlx-lm version 0.26.4.

NaNK
97
0

Qwen3-VL-2B-Instruct-8bit

mlx-community/Qwen3-VL-2B-Instruct-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
97
0

GLM-4-32B-0414-4bit-DWQ

NaNK
license:mit
96
4

granite-4.0-h-tiny-5bit-MLX

NaNK
license:apache-2.0
96
2

Josiefied-Qwen3-30B-A3B-abliterated-v2-8bit

NaNK
96
0

Huihui-gemma-3n-E4B-it-abliterated-lm-4bit

mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-4bit The Model mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-4bit was converted to MLX format from huihui-ai/Huihui-gemma-3n-E4B-it-abliterated using mlx-lm version 0.26.4.

NaNK
96
0

CodeLlama-7b-mlx

NaNK
llama
95
10

Qwen3-0.6B-4bit-AWQ

NaNK
license:apache-2.0
95
2

Josiefied-Qwen3-8B-abliterated-v1-8bit

NaNK
95
1

Llama-4-Scout-17B-16E-8bit

NaNK
llama4
95
0

Qwen3-4B-Instruct-2507-4bit-g32

This model mlx-community/Qwen3-4B-Instruct-2507-4bit-g32 was converted to MLX format from Qwen/Qwen3-4B-Instruct-2507 using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
95
0

Qwen3-VL-8B-Thinking-5bit

mlx-community/Qwen3-VL-8B-Thinking-5bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
95
0

Qwen2.5-VL-3B-Instruct-8bit

NaNK
94
8

Qwen2.5-Coder-0.5B-Instruct-4bit

NaNK
license:apache-2.0
94
5

Llama-4-Maverick-17B-128E-Instruct-6bit

NaNK
llama4
94
1

Llama-160M-Chat-v1-4bit-mlx

NaNK
llama
94
0

olmOCR-7B-0225-preview-bf16

NaNK
license:apache-2.0
93
4

dolphin-2.9-llama3-8b-4bit-mlx

NaNK
llama
93
3