mlx-community

✓ VerifiedCommunity

Apple MLX framework community contributions

500 models • 27 total models in database
Sort by:

gpt-oss-20b-MXFP4-Q8

--- license: apache-2.0 pipeline_tag: text-generation library_name: mlx tags: - vllm - mlx base_model: openai/gpt-oss-20b ---

NaNK
license:apache-2.0
812,246
18

parakeet-tdt-0.6b-v3

--- library_name: mlx language: - en - es - fr - de - bg - hr - cs - da - nl - et - fi - el - hu - it - lv - lt - mt - pl - pt - ro - sk - sl - sv - ru - uk tags: - mlx - automatic-speech-recognition - speech - audio - FastConformer - Conformer - Parakeet license: cc-by-4.0 pipeline_tag: automatic-speech-recognition base_model: nvidia/parakeet-tdt-0.6b-v3 ---

NaNK
license:cc-by-4.0
764,711
17

parakeet-tdt-0.6b-v2

--- library_name: mlx tags: - mlx - automatic-speech-recognition - speech - audio - FastConformer - Conformer - Parakeet license: cc-by-4.0 pipeline_tag: automatic-speech-recognition base_model: nvidia/parakeet-tdt-0.6b-v2 ---

NaNK
license:cc-by-4.0
566,735
32

whisper-small-mlx

122,365
3

gemma-3-12b-it-qat-4bit

NaNK
116,754
17

gemma-3-27b-it-qat-4bit

mlx-community/gemma-3-27b-it-qat-4bit This model was converted to MLX format from [`google/gemma-3-27b-it-qat-q40-unquantized`]() using mlx-vlm version 0.1.23. Refer to the original model card for more details on the model. Use with mlx

NaNK
80,767
20

gemma-3-4b-it-qat-4bit

NaNK
46,432
5

Qwen3-30B-A3B-Instruct-2507-4bit

This model mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit was converted to MLX format from Qwen/Qwen3-30B-A3B-Instruct-2507 using mlx-lm version 0.26.3.

NaNK
license:apache-2.0
35,954
8

gemma-3-1b-it-qat-4bit

The Model mlx-community/gemma-3-1b-it-qat-4bit was converted to MLX format from google/gemma-3-1b-it-qat-q40 using mlx-lm version 0.22.5.

NaNK
31,179
3

Llama-3.2-1B-Instruct-4bit

NaNK
llama
20,722
15

Llama-3.2-3B-Instruct-4bit

NaNK
llama
13,646
32

gemma-2-2b-it-4bit

NaNK
12,651
3

whisper-large-v3-mlx

license:mit
8,397
62

Qwen3-1.7B-4bit

NaNK
license:apache-2.0
7,761
4

Meta-Llama-3.1-8B-Instruct-4bit

NaNK
llama
7,412
17

Qwen3-VL-4B-Instruct-4bit

NaNK
license:apache-2.0
6,966
2

DeepSeek-OCR-8bit

mlx-community/DeepSeek-OCR-8bit This model was converted to MLX format from [`deepseek-ai/DeepSeek-OCR`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
6,579
20

Qwen3-Embedding-0.6B-4bit-DWQ

NaNK
license:apache-2.0
6,330
2

Mistral-7B-Instruct-v0.3-4bit

NaNK
license:apache-2.0
6,195
7

gemma-3-1b-it-4bit

NaNK
5,765
3

Kokoro-82M-bf16

mlx-community/Kokoro-82M-bf16 This model was converted to MLX format from [`hexagrad/Kokoro-82M`]() using mlx-audio version 0.0.1. Refer to the original model card for more details on the model. Use with mlx

license:apache-2.0
5,529
27

gemma-3n-E4B-it-lm-4bit

NaNK
5,528
4

gemma-3n-E2B-it-lm-4bit

NaNK
5,251
1

Qwen3-4B-4bit

NaNK
license:apache-2.0
5,003
9

Qwen3-0.6B-4bit

This model mlx-community/Qwen3-0.6B-4bit was converted to MLX format from Qwen/Qwen3-0.6B using mlx-lm version 0.24.0.

NaNK
license:apache-2.0
4,446
5

MiniMax-M2-4bit

This model mlx-community/MiniMax-M2-4bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK
license:mit
4,381
9

Qwen2-VL-2B-Instruct-4bit

NaNK
license:apache-2.0
4,298
5

whisper-large-v3-turbo

whisper-large-v3-turbo This model was converted to MLX format from [`large-v3-turbo`]().

3,909
72

Qwen2.5-3B-Instruct-4bit

NaNK
3,781
1

DeepSeek-V3.2-4bit

NaNK
license:mit
3,637
3

bge-small-en-v1.5-bf16

license:mit
3,206
0

Qwen3-VL-2B-Instruct-4bit

mlx-community/Qwen3-VL-2B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
3,075
0

Qwen3-235B-A22B-4bit

NaNK
license:apache-2.0
2,917
7

Dolphin3.0-Llama3.1-8B-4bit

NaNK
llama
2,899
0

Qwen3-4B-Instruct-2507-4bit

NaNK
license:apache-2.0
2,867
7

DeepSeek-OCR-4bit

NaNK
license:mit
2,836
6

MiniMax-M2.1-4bit

NaNK
2,751
5

SmolLM3-3B-4bit

NaNK
license:apache-2.0
2,725
4

Qwen2.5-0.5B-Instruct-4bit

NaNK
license:apache-2.0
2,718
4

Qwen3-235B-A22B-8bit

NaNK
license:apache-2.0
2,715
4

Qwen1.5-0.5B-Chat-4bit

NaNK
2,702
4

DeepSeek-R1-Distill-Qwen-1.5B-8bit

NaNK
2,701
3

Qwen3-4B-Thinking-2507-4bit

NaNK
license:apache-2.0
2,475
1

GLM-4.6-4bit

This model mlx-community/GLM-4.6-4bit was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.1.

NaNK
license:mit
2,440
12

DeepSeek-OCR-5bit

mlx-community/DeepSeek-OCR-5bit This model was converted to MLX format from [`deepseek-ai/DeepSeek-OCR`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
2,418
1

Qwen2.5-1.5B-Instruct-4bit

NaNK
license:apache-2.0
2,369
1

MiniMax-M2-3bit

NaNK
license:mit
2,368
3

DeepSeek-R1-Distill-Qwen-1.5B-4bit

NaNK
2,199
6

GLM-4.7-4bit

NaNK
license:mit
2,055
4

whisper-medium-mlx

1,901
3

granite-4.0-h-micro-4bit

NaNK
license:apache-2.0
1,889
0

MiniMax-M2-8bit

This model mlx-community/MiniMax-M2-8bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK
license:mit
1,865
7

Qwen3-8B-4bit

NaNK
license:apache-2.0
1,815
6

Llama-3.2-3B-Instruct-8bit

NaNK
llama
1,806
1

Meta-Llama-3-8B-Instruct-4bit

mlx-community/Meta-Llama-3-8B-Instruct-4bit This model was converted to MLX format from [`meta-llama/Meta-Llama-3-8B-Instruct`]() using mlx-lm version 0.9.0. Refer to the original model card for more details on the model. Use with mlx

NaNK
llama
1,792
79

SmolVLM2-500M-Video-Instruct-mlx

license:apache-2.0
1,649
15

Qwen3-Embedding-4B-4bit-DWQ

NaNK
license:apache-2.0
1,629
6

Josiefied-Qwen3-4B-abliterated-v1-4bit

mlx-community/Josiefied-Qwen3-4B-abliterated-v1-4bit This model mlx-community/Josiefied-Qwen3-4B-abliterated-v1-4bit was converted to MLX format from Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v1 using mlx-lm version 0.24.0.

NaNK
1,565
2

Qwen2.5-7B-Instruct-4bit

NaNK
license:apache-2.0
1,491
8

Phi-4-mini-instruct-4bit

NaNK
license:mit
1,490
0

phi-2

NaNK
1,461
54

MiniMax-M2-mlx-8bit-gs32

This model mlx-community/MiniMax-M2-mlx-8bit-gs32 was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.1. Recipe: 8-bit group-size 32 9 bits per weight (bpw) You can find more similar MLX model quants for a single Apple Mac Studio M3 Ultra with 512 GB at https://huggingface.co/bibproj

NaNK
license:mit
1,382
2

granite-3.3-2b-instruct-4bit

NaNK
license:apache-2.0
1,347
1

Llama-3.3-70B-Instruct-4bit

NaNK
llama
1,342
30

GLM-4.6-mlx-8bit-gs32

This model mlx-community/GLM-4.6-mlx-8bit-gs32 was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.1. Recipe: 8-bit group-size 32 9 bits per weight (bpw) You can find more similar MLX model quants for Apple Mac Studio with 512 GB at https://huggingface.co/bibproj

NaNK
license:mit
1,340
1

Qwen3-VL-235B-A22B-Instruct-3bit

mlx-community/Qwen3-VL-235B-A22B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-235B-A22B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
1,337
2

Llama-2-7b-chat-mlx

NaNK
llama
1,307
84

granite-4.0-h-tiny-4bit

NaNK
license:apache-2.0
1,253
0

DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx

NaNK
1,240
12

Qwen2.5-VL-3B-Instruct-4bit

mlx-community/Qwen2.5-VL-3B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen2.5-VL-3B-Instruct`]() using mlx-vlm version 0.1.11. Refer to the original model card for more details on the model. Use with mlx

NaNK
1,214
2

gemma-3-270m-it-8bit

This model mlx-community/gemma-3-270m-it-8bit was converted to MLX format from google/gemma-3-270m using mlx-lm version 0.26.3.

NaNK
1,160
2

NVIDIA-Nemotron-3-Nano-30B-A3B-4bit

NaNK
1,154
3

whisper-base-mlx

1,147
0

Qwen3-Embedding-8B-4bit-DWQ

This model mlx-community/Qwen3-Embedding-8B-4bit-DWQ was converted to MLX format from Qwen/Qwen3-Embedding-8B using mlx-lm version 0.25.1.

NaNK
license:apache-2.0
1,145
5

parakeet-rnnt-0.6b

NaNK
license:cc-by-4.0
1,098
0

gpt-oss-120b-MXFP4-Q4

This model mlx-community/gpt-oss-120b-MXFP4-Q4 was converted to MLX format from openai/gpt-oss-120b using mlx-lm version 0.27.0.

NaNK
license:apache-2.0
1,080
4

Phi-3.5-mini-instruct-4bit

NaNK
license:mit
1,047
8

IQuest-Coder-V1-40B-Loop-Instruct-4bit

NaNK
1,044
7

LFM2-2.6B-4bit

This model mlx-community/LFM2-2.6B-4bit was converted to MLX format from LiquidAI/LFM2-2.6B using mlx-lm version 0.28.0.

NaNK
1,026
1

LFM2-8B-A1B-4bit

NaNK
1,001
3

gpt-oss-20b-MXFP4-Q4

NaNK
license:apache-2.0
963
7

GLM-4.6-bf16

This model mlx-community/GLM-4.6-bf16 was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.2.

NaNK
license:mit
962
3

Qwen3-0.6B-8bit

NaNK
license:apache-2.0
953
5

LFM2-1.2B-4bit

This model mlx-community/LFM2-1.2B-4bit was converted to MLX format from LiquidAI/LFM2-1.2B using mlx-lm version 0.26.0.

NaNK
947
2

whisper-large-v2-mlx

939
1

gemma-3n-E4B-it-4bit

NaNK
921
6

Qwen3-VL-30B-A3B-Instruct-4bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
917
5

Dolphin3.0-Llama3.1-8B-8bit

NaNK
llama
916
0

MiniMax M2 6bit

This model mlx-community/MiniMax-M2-6bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK
license:mit
905
1

deepcogito-cogito-v1-preview-llama-3B-4bit

NaNK
llama
873
0

Hermes-3-Llama-3.2-3B-4bit

NaNK
llama
863
0

DeepSeek-R1-Distill-Qwen-32B-4bit

NaNK
853
42

Qwen3-VL-30B-A3B-Instruct-8bit

NaNK
license:apache-2.0
851
2

dolphin3.0-llama3.2-3B-4Bit

NaNK
llama
849
0

Phi-4-mini-instruct-8bit

The Model mlx-community/Phi-4-mini-instruct-8bit was converted to MLX format from microsoft/Phi-4-mini-instruct using mlx-lm version 0.21.5.

NaNK
license:mit
844
4

GLM-4.5-Air-3bit

This model mlx-community/GLM-4.5-Air-3bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.1.

NaNK
license:mit
837
30

DeepSeek-V3-0324-4bit

This model mlx-community/DeepSeek-v3-0324-4bit was converted to MLX format from deepseek-ai/DeepSeek-v3-0324-4bit using mlx-lm version 0.26.3.

NaNK
license:mit
826
38

GLM-4.6-5bit

This model mlx-community/GLM-4.6-5bit was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.1.

NaNK
license:mit
820
3

gemma-2-9b-it-4bit

NaNK
812
2

gpt-oss-120b-MXFP4-Q8

This model mlx-community/gpt-oss-120b-MXFP4-Q8 was converted to MLX format from openai/gpt-oss-120b using mlx-lm version 0.27.0.

NaNK
license:apache-2.0
803
3

DeepSeek-OCR-6bit

mlx-community/DeepSeek-OCR-6bit This model was converted to MLX format from [`deepseek-ai/DeepSeek-OCR`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
796
0

Josiefied-Qwen3-1.7B-abliterated-v1-4bit

NaNK
795
2

DeepSeek-R1-0528-Qwen3-8B-4bit

This model mlx-community/DeepSeek-R1-0528-Qwen3-8B-4bit was converted to MLX format from deepseek-ai/DeepSeek-R1-0528-Qwen3-8B using mlx-lm version 0.24.1.

NaNK
license:mit
791
4

Gemma 3 4b It 4bit

mlx-community/gemma-3-4b-it-4bit This model was converted to MLX format from [`google/gemma-3-4b-it`]() using mlx-vlm version 0.1.18. Refer to the original model card for more details on the model. Use with mlx

NaNK
772
7

SmolLM-135M-Instruct-4bit

NaNK
llama
749
4

whisper-tiny

744
0

Mistral-Nemo-Instruct-2407-4bit

NaNK
license:apache-2.0
743
14

LFM2-8B-A1B-3bit-MLX

Maintainer / Publisher: Susant Achary Upstream model: LiquidAI/LFM2-8B-A1B This repo (MLX 3-bit): `mlx-community/LFM2-8B-A1B-3bit-MLX` This repository provides an Apple-Silicon-optimized MLX build of LFM2-8B-A1B at 3-bit quantization. 3-bit is an excellent size↔quality sweet spot on many Macs—very small memory footprint with surprisingly solid answer quality and snappy decoding. - Architecture: Mixture-of-Experts (MoE) Transformer. - Size: ~8B total parameters with ~1B active per token (the “A1B” naming commonly indicates ~1B active params). - Why MoE? Per token, only a subset of experts is activated → lower compute per token while retaining a larger parameter pool for expressivity. > Memory reality on a single device: Even though ~1B parameters are active at a time, all experts typically reside in memory in single-device runs. Plan RAM based on total parameters, not just the active slice. - `config.json` (MLX), `mlxmodel.safetensors` (3-bit shards) - Tokenizer: `tokenizer.json`, `tokenizerconfig.json` - Metadata: `modelindex.json` (and/or processor metadata as applicable) Target: macOS on Apple Silicon (M-series) using Metal/MPS. - General instruction following, chat, and summarization - RAG back-ends and long-context assistants on device - Schema-guided structured outputs (JSON) where low RAM is a priority - 3-bit is lossy: tiny improvements in latency/RAM come with some accuracy trade-off vs 6/8-bit. - For very long contexts and/or batching, KV-cache can dominate memory—tune `maxtokens` and batch size. - Add your own guardrails/safety for production deployments. You asked to assume and decide realistic ranges. The numbers below are practical starting points—verify on your machine. - Weights (3-bit): ≈ `totalparams × 0.375 byte` → for 8B params ≈ ~3.0 GB - Runtime overhead: MLX graph/tensors/metadata → ~0.6–1.0 GB - KV-cache: grows with context × layers × heads × dtype → ~0.8–2.5+ GB | Context window | Estimated peak RAM | |---|---:| | 4k tokens | ~4.4–5.5 GB | | 8k tokens | ~5.2–6.6 GB | | 16k tokens | ~6.5–8.8 GB | > For ≤2k windows you may see ~4.0–4.8 GB. Larger windows/batches increase KV-cache and peak RAM. 🧭 Precision choices for LFM2-8B-A1B (lineup planning) While this card is 3-bit, teams often publish multiple precisions. Use this table as a planning guide (8B MoE LM; actuals depend on context/batch/prompts): | Variant | Typical Peak RAM | Relative Speed | Typical Behavior | When to choose | |---|---:|:---:|---|---| | 3-bit (this repo) | ~4.4–8.8 GB | 🔥🔥🔥🔥 | Direct, concise, great latency | Default on 8–16 GB Macs | | 6-bit | ~7.5–12.5 GB | 🔥🔥 | Best quality under quant | Choose if RAM allows | | 8-bit | ~9.5–12+ GB | 🔥🔥 | Largest quantized size / highest fidelity | When you prefer simpler 8-bit workflows | > MoE caveat: MoE lowers compute per token; unless experts are paged/partitioned, memory still scales with total parameters on a single device. Deterministic generation ```bash python -m mlxlm.generate \ --model mlx-community/LFM2-8B-A1B-3bit-MLX \ --prompt "Summarize the following in 5 concise bullet points:\n " \ --max-tokens 256 \ --temperature 0.0 \ --device mps \ --seed 0

NaNK
742
1

MiniMax M2 5bit

This model mlx-community/MiniMax-M2-5bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK
license:mit
739
2

gemma-3-text-4b-it-4bit

The Model mlx-community/gemma-3-text-4b-it-4bit was converted to MLX format from mlx-community/gemma-3-4b-it-bf16 using mlx-lm version 0.22.0.

NaNK
736
0

Qwen3-Coder-30B-A3B-Instruct-4bit

NaNK
license:apache-2.0
688
10

granite-4.0-micro-8bit

This model mlx-community/granite-4.0-micro-8bit was converted to MLX format from ibm-granite/granite-4.0-micro using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
662
0

Phi-3-mini-4k-instruct-4bit

NaNK
license:mit
661
12

Qwen3-VL-235B-A22B-Thinking-3bit

NaNK
license:apache-2.0
655
0

MiniMax-M2.1-6bit

NaNK
648
2

Llama-3.2-11B-Vision-Instruct-abliterated

NaNK
mllama
645
7

Kimi-Dev-72B-4bit-DWQ

This model mlx-community/Kimi-Dev-72B-4bit-DWQ was converted to MLX format from moonshotai/Kimi-Dev-72B using mlx-lm version 0.26.0.

NaNK
license:mit
636
18

Qwen3-VL-30B-A3B-Instruct-bf16

mlx-community/Qwen3-VL-30B-A3B-Instruct-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
634
3

whisper-small-mlx-8bit

NaNK
632
0

Llama-3.2-1B-Instruct-8bit

NaNK
llama
613
1

DeepSeek-R1-Distill-Qwen-7B-4bit

NaNK
609
17

GLM-4.5-Air-4bit

This model mlx-community/GLM-4.5-Air-4bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.0.

NaNK
license:mit
594
25

DeepSeek-V3.2-mlx-5bit

NaNK
license:mit
582
1

Qwen2.5-VL-7B-Instruct-4bit

NaNK
license:apache-2.0
564
3

LFM2-350M-8bit

This model mlx-community/LFM2-350M-8bit was converted to MLX format from LiquidAI/LFM2-350M using mlx-lm version 0.26.0.

NaNK
547
2

DeepSeek-R1-4bit

NaNK
544
36

Meta-Llama-3.1-70B-Instruct-4bit

NaNK
llama
544
4

Kimi-K2-Instruct-4bit

NaNK
541
9

phi-4-8bit

The Model mlx-community/phi-4-8bit was converted to MLX format from microsoft/phi-4 using mlx-lm version 0.20.6.

NaNK
license:mit
534
12

3b-de-ft-research_release-4bit

NaNK
llama
529
0

Devstral-Small-2-24B-Instruct-2512-4bit

NaNK
license:apache-2.0
526
1

whisper-large-mlx

515
2

dolphin3.0-llama3.2-1B-4Bit

The Model mlx-community/dolphin3.0-llama3.2-1B-4Bit was converted to MLX format from dphn/Dolphin3.0-Llama3.2-1B using mlx-lm version 0.26.3.

NaNK
llama
511
0

Qwen3-VL-8B-Instruct-4bit

mlx-community/Qwen3-VL-8B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
506
3

Llama-3.2-11B-Vision-Instruct-8bit

NaNK
mllama
505
10

Qwen3-4B-8bit

NaNK
license:apache-2.0
499
1

gemma-3-4b-it-8bit

NaNK
491
5

gemma-3n-E2B-it-4bit

NaNK
481
9

DeepSeek-V3.1-4bit

NaNK
license:mit
478
6

Qwen3-VL-8B-Thinking-8bit

mlx-community/Qwen3-VL-8B-Thinking-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
473
1

Ring-mini-linear-2.0-4bit

This model mlx-community/Ring-mini-linear-2.0-4bit was converted to MLX format from inclusionAI/Ring-mini-linear-2.0 using mlx-lm version 0.28.1.

NaNK
license:mit
472
3

MiniMax-M2.1-8bit

NaNK
465
0

Qwen3-VL-30B-A3B-Thinking-4bit

NaNK
license:apache-2.0
463
0

Qwen3-4B-Instruct-2507-4bit-DWQ-2510

This model mlx-community/Qwen3-4B-Instruct-2507-4bit-DWQ-2510 was converted to MLX format from Qwen/Qwen3-4B-Instruct-2507 using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
455
1

Qwen3-Coder-30B-A3B-Instruct-4bit-dwq-v2

NaNK
license:apache-2.0
452
7

Qwen3-Coder-30B-A3B-Instruct-8bit

This model mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit was converted to MLX format from Qwen/Qwen3-Coder-30B-A3B-Instruct using mlx-lm version 0.26.1.

NaNK
license:apache-2.0
449
2

Qwen3-VL-30B-A3B-Thinking-3bit

NaNK
license:apache-2.0
449
1

Qwen3-VL-30B-A3B-Thinking-8bit

NaNK
license:apache-2.0
447
0

Qwen3-Coder-480B-A35B-Instruct-4bit

This model mlx-community/Qwen3-Coder-480B-A35B-Instruct-4bit was converted to MLX format from Qwen/Qwen3-Coder-480B-A35B-Instruct using mlx-lm version 0.26.0.

NaNK
license:apache-2.0
444
18

DeepSeek-V3.1-Terminus-4bit

This model mlx-community/DeepSeek-V3.1-Terminus-4bit was converted to MLX format from deepseek-ai/DeepSeek-V3.1-Terminus using mlx-lm version 0.27.1.

NaNK
license:mit
442
2

whisper-large-v3-turbo-q4

437
7

Qwen3-VL-30B-A3B-Thinking-bf16

mlx-community/Qwen3-VL-30B-A3B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
435
1

Granite-4.0-H-Tiny-4bit-DWQ

This model mlx-community/granite-4.0-h-Tiny-4bit-DWQ was converted to MLX format from ibm-granite/granite-4.0-h-small using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
430
2

Llama-3.2-3B-Instruct

NaNK
llama
428
7

Qwen3-Next-80B-A3B-Instruct-4bit

NaNK
license:apache-2.0
425
17

parakeet-tdt_ctc-0.6b-ja

This model was converted to MLX format from nvidia/parakeet-tdtctc-0.6b-ja using the conversion script. Please refer to original model card for more details on the model.

NaNK
license:cc-by-4.0
424
4

Mistral-7B-Instruct-v0.2-4-bit

NaNK
license:apache-2.0
423
24

Qwen3-30B-A3B-4bit

NaNK
license:apache-2.0
419
11

Llama-3.2-3B-Instruct-uncensored-6bit

NaNK
llama
415
3

Kimi-K2-Instruct-0905-mlx-DQ3_K_M

This model mlx-community/Kimi-K2-Instruct-0905-mlx-DQ3KM was converted to MLX format from moonshotai/Kimi-K2-Instruct-0905 using mlx-lm version 0.26.3. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. The 4-bit version of Kimi K2 does not fit. Using research results, we aim to get 4-bit performance from a slightly smaller and smarter quantization. It should also not be so large that it leaves no memory for a useful context window. You can find more similar MLX model quants for Apple Mac Studio with 512 GB at https://huggingface.co/bibproj In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like Should you wish to squeeze more out of your quant, and you do not need to use a larger context window, you can change the last part of the above code to

NaNK
414
7

Qwen2.5-Coder-7B-Instruct-bf16

NaNK
license:apache-2.0
413
2

Mixtral-8x22B-4bit

NaNK
license:apache-2.0
412
54

Qwen3 VL 4B Instruct 8bit

NaNK
license:apache-2.0
411
3

Llama-3.3-70B-Instruct-8bit

NaNK
llama
410
14

nvidia_Llama-3.1-Nemotron-70B-Instruct-HF_4bit

NaNK
llama
409
12

Huihui-GLM-4.5V-abliterated-mxfp4

mlx-community/Huihui-GLM-4.5V-abliterated-mxfp4 This model was converted to MLX format from [`huihui-ai/Huihui-GLM-4.5V-abliterated`]() using `mlx-vlm` with MXFP4 support. Refer to the original model card for more details on the model. Use with mlx

license:mit
407
2

gemma-3-1b-pt-4bit

NaNK
406
1

embeddinggemma-300m-8bit

NaNK
403
2

DeepSeek-R1-Distill-Llama-70B-8bit

NaNK
llama
391
10

chandra-8bit

NaNK
386
1

DeepSeek-Coder-V2-Lite-Instruct-8bit

NaNK
385
5

embeddinggemma-300m-bf16

The Model mlx-community/embeddinggemma-300m-bf16 was converted to MLX format from google/embeddinggemma-300m using mlx-lm version 0.0.4.

384
1

Qwen3-0.6B-bf16

NaNK
license:apache-2.0
382
4

Qwen2.5-3B-Instruct-8bit

NaNK
380
0

Nanonets-OCR2-3B-4bit

mlx-community/Nanonets-OCR2-3B-4bit This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
379
0

Meta-Llama-3-8B-Instruct

NaNK
llama
378
2

Qwen3-VL-8B-Instruct-bf16

mlx-community/Qwen3-VL-8B-Instruct-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
377
3

GLM-4.5-Air-bf16

This model mlx-community/GLM-4.5-Air-bf16 was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.28.2.

license:mit
375
0

Qwen3-VL-32B-Instruct-8bit

mlx-community/Qwen3-VL-32B-Instruct-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
374
1

Ling-1T-mlx-3bit

This model mlx-community/Ling-1T-mlx-3bit/ was converted to MLX format from inclusionAI/Ling-1T using mlx-lm version 0.28.1. You can find more similar MLX model quants for Apple Mac Studio with 512 GB at https://huggingface.co/bibproj

NaNK
license:mit
373
3

Llama-4-Scout-17B-16E-Instruct-4bit

mlx-community/Llama-4-Scout-17B-16E-Instruct-4bit This model was converted to MLX format from [`meta-llama/Llama-4-Scout-17B-16E-Instruct`]() using mlx-vlm version 0.1.21. Refer to the original model card for more details on the model. Use with mlx

NaNK
llama4
371
9

deepseek-r1-distill-qwen-1.5b

NaNK
370
23

Qwen2.5-VL-7B-Instruct-8bit

NaNK
license:apache-2.0
360
18

Apriel-1.5-15b-Thinker-4bit

mlx-community/Apriel-1.5-15b-Thinker-4bit This model was converted to MLX format from [`ServiceNow-AI/Apriel-1.5-15b-Thinker`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
356
2

SmolVLM-Instruct-4bit

NaNK
license:apache-2.0
355
5

dolphin-vision-72b-4bit

NaNK
353
7

Codestral-22B-v0.1-4bit

NaNK
352
13

gemma-3-270m-it-4bit

This model mlx-community/gemma-3-270m-it-4bit was converted to MLX format from google/gemma-3-270m-it using mlx-lm version 0.26.3.

NaNK
352
8

Qwen3-Embedding-0.6B-8bit

NaNK
license:apache-2.0
350
0

CodeLlama-70b-Instruct-hf-4bit-MLX

NaNK
llama
345
25

Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ

NaNK
license:apache-2.0
343
5

Nanonets-OCR2-3B-bf16

mlx-community/Nanonets-OCR2-3B-bf16 This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
341
0

Qwen2.5-7B-Instruct-Uncensored-4bit

NaNK
license:gpl-3.0
340
4

gemma-3-1b-it-8bit

NaNK
340
3

MiniMax-M2.1-5bit

NaNK
339
1

exaone-4.0-1.2b-4bit

NaNK
339
0

LFM2-8B-A1B-8bit-MLX

Maintainer / Publisher: Susant Achary Upstream model: LiquidAI/LFM2-8B-A1B This repo (MLX 8-bit): `mlx-community/LFM2-8B-A1B-8bit-MLX` This repository provides an Apple-Silicon-optimized MLX build of LFM2-8B-A1B at 8-bit quantization for fast, on-device inference. - Architecture: Mixture-of-Experts (MoE) Transformer. - Size: ~8B total parameters with ~1B active per token (the “A1B” suffix commonly denotes ~1B active params). - Why MoE? During generation, only a subset of experts is activated per token, reducing compute per token while keeping a larger total parameter pool for expressivity. > Important memory note (single-device inference): > Although compute per token benefits from MoE (fewer active parameters), the full set of experts still resides in memory for typical single-GPU/CPU deployments. In practice this means RAM usage scales with total parameters, not with the smaller active count. - `config.json` (MLX), `mlxmodel.safetensors` (8-bit shards) - Tokenizer files: `tokenizer.json`, `tokenizerconfig.json` - Model metadata (e.g., `modelindex.json`) Target platform: macOS on Apple Silicon (M-series) using Metal/MPS. - General instruction-following, chat, and summarization - RAG back-ends and long-context workflows on device - Function-calling / structured outputs with schema-style prompts - Even at 8-bit, long contexts (KV-cache) can dominate memory at high `maxtokens` or large batch sizes. - As with any quantization, small regressions vs FP16 can appear on intricate math/code or edge-formatting. You asked to assume and decide RAM usage in absence of your measurements. Below are practical planning numbers derived from first-principles + experience with MLX and similar MoE models. Treat them as starting points and validate on your hardware. - Weights: `~ totalparams × 1 byte` (8-bit). For 8B params → ~8.0 GB baseline. - Runtime overhead: MLX graph + tensors + metadata → ~0.5–1.0 GB typical. - KV cache: grows with contextlength × layers × heads × dtype; often 1–3+ GB for long contexts. | Context window | Estimated peak RAM | |---|---:| | 4k tokens | ~9.5–10.5 GB | | 8k tokens | ~10.5–11.8 GB | | 16k tokens | ~12.0–14.0 GB | > These ranges assume 8-bit weights, A1B MoE (all experts resident), batch size = 1, and standard generation settings. > On lower windows (≤2k), you may see ~9–10 GB. Larger windows or batches will increase KV-cache and peak RAM. While this card is 8-bit, teams often want a consistent lineup. If you later produce 6/5/4/3/2-bit MLX builds, here’s a practical guide (RAM figures are indicative for an 8B MoE LM; your results depend on context/batch): | Variant | Typical Peak RAM | Relative Speed | Typical Behavior | When to choose | |---|---:|:---:|---|---| | 4-bit | ~7–8 GB | 🔥🔥🔥 | Better detail retention | If 3-bit drops too much fidelity | | 6-bit | ~9–10.5 GB | 🔥🔥 | Near-max MLX quality | If you want accuracy under quant | | 8-bit (this repo) | ~9.5–12+ GB | 🔥🔥 | Highest quality among quant tiers | When RAM allows and you want the most faithful outputs | > MoE caveat: MoE reduces compute per token, but unless experts are paged/partitioned across devices and loaded on demand, memory still follows total parameters. On a single Mac, plan RAM as if the whole 8B parameter set is resident. Deterministic generation ```bash python -m mlxlm.generate \ --model mlx-community/LFM2-8B-A1B-8bit-MLX \ --prompt "Summarize the following in 5 bullet points:\n " \ --max-tokens 256 \ --temperature 0.0 \ --device mps \ --seed 0

NaNK
338
2

gemma-3-12b-it-qat-abliterated-lm-4bit

NaNK
338
0

FastVLM-0.5B-bf16

NaNK
337
1

DeepSeek-R1-Distill-Qwen-32B-MLX-8Bit

NaNK
336
16

Qwen3-8B-6bit

NaNK
license:apache-2.0
335
4

gemma-3-27b-it-4bit

NaNK
333
9

Nanonets-OCR2-3B-8bit

mlx-community/Nanonets-OCR2-3B-8bit This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
331
0

GLM-4.5-Air-mxfp4

This model mlx-community/GLM-4.5-Air-mxfp4 was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.28.0.

license:mit
328
2

SmolVLM2-256M-Video-Instruct-mlx

license:apache-2.0
326
10

Qwen3-0.6B-4bit-DWQ-05092025

This model mlx-community/Qwen3-0.6B-4bit-DWQ-05092025 was converted to MLX format from Qwen/Qwen3-0.6B using mlx-lm version 0.24.0.

NaNK
license:apache-2.0
325
0

Dolphin-Mistral-24B-Venice-Edition-mlx-8Bit

mlx-community/Dolphin-Mistral-24B-Venice-Edition-mlx-8Bit MLX 8bit quant of cognitivecomputations/Dolphin-Mistral-24B-Venice-Edition released 2025-06-12. "This is an updated version based on feedback we received on v1", see discussion at original repo. The Model mlx-community/Dolphin-Mistral-24B-Venice-Edition-mlx-8Bit was converted to MLX format from cognitivecomputations/Dolphin-Mistral-24B-Venice-Edition using mlx-lm version 0.22.3.

NaNK
license:apache-2.0
323
4

LFM2-700M-8bit

This model mlx-community/LFM2-700M-8bit was converted to MLX format from LiquidAI/LFM2-700M using mlx-lm version 0.26.0.

NaNK
318
1

Kimi-VL-A3B-Thinking-4bit

NaNK
316
7

DeepSeek-R1-Distill-Llama-8B-4bit

NaNK
llama
311
10

Phi-3.5-vision-instruct-4bit

NaNK
license:mit
310
5

deepseek-vl2-8bit

NaNK
306
6

Qwen3-30B-A3B-4bit-DWQ

NaNK
license:apache-2.0
305
28

DeepSeek-V3-4bit

NaNK
305
8

Qwen3-VL-4B-Instruct-3bit

mlx-community/Qwen3-VL-4B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
303
0

Meta-Llama-3.1-8B-Instruct-8bit

NaNK
llama
302
10

Kimi-Linear-48B-A3B-Instruct-4bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-4bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK
license:mit
302
7

embeddinggemma-300m-4bit

NaNK
300
2

DeepSeek-R1-Distill-Qwen-1.5B-3bit

NaNK
298
1

whisper-tiny.en-mlx

298
0

Llama-3.2-8X4B-MOE-V2-Dark-Champion-Instruct-uncensored-abliterated-21B-Q_6-MLX

NaNK
Llama 3.2
295
3

nomicai-modernbert-embed-base-4bit

The Model mlx-community/nomicai-modernbert-embed-base-4bit was converted to MLX format from nomic-ai/modernbert-embed-base using mlx-lm version 0.0.3.

NaNK
license:apache-2.0
295
0

GLM-4.5-4bit

This model mlx-community/GLM-4.5-4bit was converted to MLX format from zai-org/GLM-4.5 using mlx-lm version 0.26.0.

NaNK
license:mit
294
16

Qwen2.5-Coder-7B-Instruct-4bit

NaNK
license:apache-2.0
291
5

Llama-4-Maverick-17B-16E-Instruct-4bit

NaNK
llama4
290
7

phi-2-hf-4bit-mlx

NaNK
license:mit
289
1

Qwen3-VL-8B-Thinking-4bit

mlx-community/Qwen3-VL-8B-Thinking-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
286
0

Qwen2.5-0.5B-Instruct-8bit

NaNK
license:apache-2.0
285
0

granite-4.0-h-micro-8bit

This model mlx-community/granite-4.0-h-micro-8bit was converted to MLX format from ibm-granite/granite-4.0-h-micro using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
283
2

Ling-1T-mlx-DQ3_K_M

This model mlx-community/Ling-1T-mlx-DQ3KM was converted to MLX format from inclusionAI/Ling-1T using mlx-lm version 0.28.1. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. The 4-bit version of Ling 1T does not fit. Using research results, we aim to get 4-bit performance from a slightly smaller and smarter quantization. It should also not be so large that it leaves no memory for a useful context window. In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like

NaNK
280
0

OlmOCR 2 7B 1025 Bf16

mlx-community/olmOCR-2-7B-1025-bf16 This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
273
2

DeepSeek-R1-Distill-Qwen-14B-4bit

NaNK
270
7

GLM-4-9B-0414-4bit

NaNK
license:mit
270
1

embeddinggemma-300m-qat-q4_0-unquantized-bf16

mlx-community/embeddinggemma-300m-qat-q40-unquantized-bf16 The Model mlx-community/embeddinggemma-300m-qat-q40-unquantized-bf16 was converted to MLX format from google/embeddinggemma-300m-qat-q40-unquantized using mlx-lm version 0.0.4.

270
0

GLM-Z1-9B-0414-4bit

This model mlx-community/GLM-Z1-9B-0414-4bit was converted to MLX format from THUDM/GLM-Z1-9B-0414 using mlx-lm version 0.22.4.

NaNK
license:mit
267
3

gemma-3-12b-it-4bit

NaNK
266
6

gemma-3-12b-it-bf16

NaNK
264
1

DeepSeek-R1-Distill-Qwen-32B-abliterated-4bit

NaNK
261
5

Qwen3-VL-32B-Instruct-4bit

mlx-community/Qwen3-VL-32B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
261
1

whisper-turbo

260
7

GLM-4-32B-0414-8bit

This model mlx-community/GLM-4-32B-0414-8bit was converted to MLX format from THUDM/GLM-4-32B-0414 using mlx-lm version 0.23.1.

NaNK
license:mit
259
6

Apertus-8B-Instruct-2509-bf16

This model mlx-community/Apertus-8B-Instruct-2509-bf16 was converted to MLX format from swiss-ai/Apertus-8B-Instruct-2509 using mlx-lm version 0.27.0.

NaNK
license:apache-2.0
255
4

Qwen3-VL-8B-Thinking-bf16

mlx-community/Qwen3-VL-8B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
255
0

olmOCR-2-7B-1025-4bit

NaNK
license:apache-2.0
254
0

Qwen3-VL-4B-Thinking-bf16

mlx-community/Qwen3-VL-4B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
253
1

Meta-Llama-3.1-8B-Instruct-bf16

NaNK
llama
248
3

granite-4.0-h-tiny-3bit-MLX

Granite-4.0-H-Tiny — MLX 3-bit (Apple Silicon) Maintainer / Publisher: Susant Achary This repository provides an Apple-Silicon-optimized MLX build of IBM Granite-4.0-H-Tiny with 3-bit weight quantization (plus usage guidance for 2/4/5/6-bit variants if RAM allows). Granite 4.0 is IBM’s latest hybrid Mamba-2/Transformer family with selective Mixture-of-Experts (MoE), designed for long-context, hyper-efficient inference and enterprise use. :contentReference[oaicite:0]{index=0} 🔎 What’s Granite 4.0? - Architecture. Hybrid Mamba-2 + softmax attention; H variants add MoE routing (sparse activation). Aims to keep expressivity while dramatically reducing memory footprint. :contentReference[oaicite:1]{index=1} - Efficiency claims. Up to ~70% lower memory and ~2× faster inference vs. comparable models, especially for multi-session and long-context scenarios. :contentReference[oaicite:2]{index=2} - Context window. 128k tokens (Tiny/Base preview cards). :contentReference[oaicite:3]{index=3} - Licensing. Apache-2.0 for public/commercial use. :contentReference[oaicite:4]{index=4} > This MLX build targets Granite-4.0-H-Tiny (≈ 7B total, ≈ 1B active parameters). For reference, the family also includes H-Small (≈32B total / 9B active) and Micro/Micro-H (≈3B dense/hybrid) tiers. :contentReference[oaicite:5]{index=5} 📦 What’s in this repo (MLX format) - `config.json` (MLX), `mlxmodel.safetensors` (3-bit shards), tokenizer files, and processor metadata. - Ready for macOS on M-series chips via Metal/MPS. > The upstream Hugging Face model cards for Granite 4.0 (Tiny/Small) provide additional training details, staged curricula and alignment workflow. Start here for Tiny: ibm-granite/granite-4.0-h-tiny. :contentReference[oaicite:6]{index=6} ✅ Intended use - General instruction-following and chat with long context (128k). :contentReference[oaicite:7]{index=7} - Enterprise assistant patterns (function calling, structured outputs) and RAG backends that benefit from efficient, large windows. :contentReference[oaicite:8]{index=8} - On-device development on Macs (MLX), low-latency local prototyping and evaluation. ⚠️ Limitations - As a quantized, decoder-only LM, it can produce confident but wrong outputs—review for critical use. - 2–4-bit quantization may reduce precision on intricate tasks (math/code, tiny-text parsing); prefer higher bit-widths if RAM allows. - Follow your organization’s safety/PII/guardrail policies (Granite is “open-weight,” not a full product). :contentReference[oaicite:9]{index=9} 🧠 Model family at a glance | Tier | Arch | Params (total / active) | Notes | |---|---|---:|---| | H-Small | Hybrid + MoE | ~32B / 9B | Workhorse for enterprise agent tasks; strong function-calling & instruction following. :contentReference[oaicite:10]{index=10} | | H-Tiny (this repo) | Hybrid + MoE | ~7B / 1B | Long-context, efficiency-first; great for local dev. :contentReference[oaicite:11]{index=11} | | Micro / H-Micro | Dense / Hybrid | ~3B | Edge/low-resource alternatives; when hybrid runtime isn’t optimized. :contentReference[oaicite:12]{index=12} | Context Window: up to 128k tokens for Tiny/Base preview lines. :contentReference[oaicite:13]{index=13} License: Apache-2.0. :contentReference[oaicite:14]{index=14} 🧪 Observed on-device behavior (MLX) Empirically on M-series Macs: - 3-bit often gives crisp, direct answers with good latency and modest RAM. - Higher bit-widths (4/5/6-bit) improve faithfulness on fine-grained tasks (tiny OCR, structured parsing), at higher memory cost. > Performance varies by Mac model, image/token lengths, and temperature; validate on your workload. 🔢 Choosing a quantization level (Apple Silicon) | Variant | Typical Peak RAM (7B-class) | Relative speed | Typical behavior | When to choose | |---|---:|:---:|---|---| | 2-bit | ~3–4 GB | 🔥🔥🔥🔥 | Smallest footprint; most lossy | Minimal RAM devices / smoke tests | | 3-bit (this build) | ~5–6 GB | 🔥🔥🔥🔥 | Direct, concise, great latency | Default for local dev on M1/M2/M3/M4 | | 4-bit | ~6–7.5 GB | 🔥🔥🔥 | Better detail retention | When you need stronger faithfulness | | 5-bit | ~8–9 GB | 🔥🔥☆ | Higher fidelity | For heavy docs / structured outputs | | 6-bit | ~9.5–11 GB | 🔥🔥 | Max quality under MLX quant | If RAM headroom is ample | > Figures are indicative for language-only Tiny (no vision), and will vary with context length and KV cache size. 🚀 Quickstart (CLI — MLX) ```bash Plain generation (deterministic) python -m mlxlm.generate \ --model \ --prompt "Summarize the following notes into 5 bullet points:\n " \ --max-tokens 200 \ --temperature 0.0 \ --device mps \ --seed 0

NaNK
license:apache-2.0
246
2

GLM-4-32B-0414-4bit

NaNK
license:mit
244
5

CodeLlama-13b-Instruct-hf-4bit-MLX

NaNK
llama
244
2

Nanonets-OCR-s-bf16

mlx-community/Nanonets-OCR-s-bf16 This model was converted to MLX format from [`nanonets/Nanonets-OCR-s`]() using mlx-vlm version 0.1.27. Refer to the original model card for more details on the model. Use with mlx

NaNK
241
2

Qwen3-VL-32B-Thinking-4bit

mlx-community/Qwen3-VL-32B-Thinking-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
239
1

Qwen3-VL-2B-Instruct-3bit

mlx-community/Qwen3-VL-2B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
239
0

distil-whisper-large-v3

236
15

DeepSeek-R1-0528-4bit

This model mlx-community/DeepSeek-R1-0528-4bit was converted to MLX format from deepseek-ai/DeepSeek-R1-0528 using mlx-lm version 0.24.1.

NaNK
235
17

GLM-4.5-Air-2bit

This model mlx-community/GLM-4.5-Air-2bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.1.

NaNK
license:mit
235
4

InternVL3_5-GPT-OSS-20B-A4B-Preview-4bit

mlx-community/InternVL35-GPT-OSS-20B-A4B-Preview-4bit This model was converted to MLX format from [`OpenGVLab/InternVL35-GPT-OSS-20B-A4B-Preview-HF`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
235
0

Chatterbox-TTS-4bit

NaNK
231
1

plamo-2-translate

This is a 4-bit quantized version of the PLaMo 2 Translation Model with DWQ (Distilled Weight Quantization) for inference with MLX on Apple Silicon devices. PLaMo翻訳モデルはPreferred Networksによって開発された翻訳向け特化型大規模言語モデルです。 詳しくはブログ記事およびプレスリリースを参照してください。 PLaMo Translation Model is a specialized large-scale language model developed by Preferred Networks for translation tasks. For details, please refer to the blog post and press release. List of models: - plamo-2-translate ... Post-trained model for translation - plamo-2-translate-base ... Base model for translation - plamo-2-translate-eval ... Pair-wise evaluation model PLaMo Translation Model is released under PLaMo community license. Please check the following license and agree to this before downloading. - (EN) under construction: we apologize for the inconvenience - (JA) https://www.preferred.jp/ja/plamo-community-license/ NOTE: This model has NOT been instruction-tuned for chat dialog or other downstream tasks. Please check the PLaMo community license and contact us via the following form to use commercial purpose. PLaMo Translation Model is a new technology that carries risks with use. Testing conducted to date has been in English and Japanese, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, PLaMo Translation Model’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of PLaMo Translation Model, developers should perform safety testing and tuning tailored to their specific applications of the model. This model is trained under the project, “Research and Development Project of the Enhanced Infrastructures for Post 5G Information and Communication System” (JPNP 20017), subsidized by the New Energy and Industrial Technology Development Organization (NEDO). - (EN) https://www.preferred.jp/en/company/aipolicy/ - (JA) https://www.preferred.jp/ja/company/aipolicy/

NaNK
230
12

Llama-3.2-11B-Vision-Instruct-4bit

NaNK
mllama
230
6

CodeLlama-7b-Python-4bit-MLX

NaNK
llama
229
14

gemma-3-12b-it-8bit

NaNK
229
2

Qwen2.5-1.5B-Instruct-8bit

NaNK
license:apache-2.0
228
1

Qwen3-VL-4B-Thinking-4bit

mlx-community/Qwen3-VL-4B-Thinking-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
228
0

Qwen2.5-14B-Instruct-4bit

NaNK
license:apache-2.0
227
10

Mixtral-8x7B-Instruct-v0.1

NaNK
license:apache-2.0
226
23

parakeet-tdt-1.1b

NaNK
license:cc-by-4.0
226
1

Qwen3-Next-80B-A3B-Instruct-8bit

NaNK
license:apache-2.0
225
8

Llama-4-Scout-17B-16E-Instruct-8bit

NaNK
llama4
224
3

Qwen3 VL 8B Thinking 6bit

mlx-community/Qwen3-VL-8B-Thinking-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
224
2

Qwen2-VL-7B-Instruct-4bit

NaNK
license:apache-2.0
223
2

gemma-3-27b-it-qat-8bit

NaNK
222
9

DeepSeek-V3.1-8bit

NaNK
license:mit
222
3

GLM-4.5V-8bit

NaNK
license:mit
222
2

Hermes-3-Llama-3.1-8B-4bit

NaNK
llama
221
4

Qwen3-VL-32B-Thinking-bf16

NaNK
license:apache-2.0
217
0

parakeet-ctc-0.6b

NaNK
license:cc-by-4.0
216
2

Llama-4-Scout-17B-16E-Instruct-6bit

NaNK
llama4
214
5

deepcogito-cogito-v1-preview-llama-8B-4bit

NaNK
llama
214
0

Qwen3-VL-30B-A3B-Instruct-6bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
213
0

mxbai-embed-large-v1

NaNK
license:apache-2.0
211
3

Llama-3-8B-Instruct-1048k-4bit

NaNK
llama
210
25

OpenELM-270M-Instruct

210
5

GLM-4.5-Air-8bit

This model mlx-community/GLM-4.5-Air-8bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.0.

NaNK
license:mit
209
6

Qwen3-VL-4B-Instruct-5bit

mlx-community/Qwen3-VL-4B-Instruct-5bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
208
0

Mistral-7B-Instruct-v0.2

NaNK
license:apache-2.0
207
20

DeepSeek-R1-Distill-Llama-70B-4bit

NaNK
llama
206
8

Qwen3-VL-32B-Thinking-8bit

NaNK
license:apache-2.0
206
0

GLM-4.5-Air-3bit-DWQ-v2

NaNK
license:mit
202
3

Qwen3-VL-8B-Instruct-8bit

mlx-community/Qwen3-VL-8B-Instruct-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
202
2

Nous-Hermes-2-Mixtral-8x7B-DPO-4bit

NaNK
license:apache-2.0
201
18

Phi-3-mini-128k-instruct-4bit

NaNK
license:mit
200
12

Qwen2.5-VL-72B-Instruct-4bit

NaNK
198
7

Meta-Llama-3.1-405B-4bit

NaNK
llama
198
5

Qwen3-Next-80B-A3B-Thinking-4bit

This model mlx-community/Qwen3-Next-80B-A3B-Thinking-4bit was converted to MLX format from Qwen/Qwen3-Next-80B-A3B-Thinking using mlx-lm version 0.27.1.

NaNK
license:apache-2.0
197
3

Jinx-gpt-oss-20b-mxfp4-mlx

This model mlx-community/Jinx-gpt-oss-20b-mxfp4-mlx was converted to MLX format from Jinx-org/Jinx-gpt-oss-20b-mxfp4 using mlx-lm version 0.27.1.

NaNK
license:apache-2.0
197
1

Ministral-3-8B-Instruct-2512-4bit

NaNK
license:apache-2.0
195
0

Llama-4-Scout-17B-16E-4bit

NaNK
llama4
194
2

Qwen3-14B-4bit

NaNK
license:apache-2.0
194
1

NVIDIA-Nemotron-Nano-9B-v2-4bits

NaNK
193
2

Kimi-K2-Instruct-0905-mlx-3bit

mlx-community/moonshotaiKimi-K2-Instruct-0905-mlx-3bit This model mlx-community/moonshotaiKimi-K2-Instruct-0905-mlx-3bit was converted to MLX format from moonshotai/Kimi-K2-Instruct-0905 using mlx-lm version 0.26.3.

NaNK
191
1

Llama-3_3-Nemotron-Super-49B-v1_5-mlx-4Bit

mlx-community/Llama-33-Nemotron-Super-49B-v15-mlx-4Bit The Model mlx-community/Llama-33-Nemotron-Super-49B-v15-mlx-4Bit was converted to MLX format from unsloth/Llama-33-Nemotron-Super-49B-v15 using mlx-lm version 0.26.4.

NaNK
unsloth - llama-3 - pytorch
189
2

gemma-2-27b-it-4bit

NaNK
188
8

Qwen3-VL-30B-A3B-Instruct-3bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
186
0

DeepSeek-Coder-V2-Lite-Instruct-4bit-AWQ

NaNK
185
0

chandra-bf16

185
0

Qwen3-1.7B-MLX-MXFP4

This model mlx-community/Qwen3-1.7B-MLX-MXFP4 was converted to MLX format from Qwen/Qwen3-1.7B using mlx-lm version 0.28.3.

NaNK
license:apache-2.0
183
1

Kokoro-82M-4bit

NaNK
license:apache-2.0
182
5

Phi-3-mini-4k-instruct-4bit-no-q-embed

NaNK
license:mit
182
3

gemma-3-27b-it-8bit

NaNK
180
7

Qwen3-VL-30B-A3B-Thinking-6bit

mlx-community/Qwen3-VL-30B-A3B-Thinking-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
179
0

NousResearch_Hermes-4-14B-BF16-abliterated-mlx

NaNK
license:apache-2.0
178
1

gemma-3-4b-it-5bit

This model mlx-community/gemma-3-4b-it-5bit was converted to MLX format from google/gemma-3-4b-it using mlx-lm version 0.28.2.

NaNK
178
0

Chandra 4bit

mlx-community/chandra-4bit This model was converted to MLX format from [`datalab-to/chandra`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
177
4

olmOCR-2-7B-1025-8bit

mlx-community/olmOCR-2-7B-1025-8bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
176
0

Llama-3.1-Nemotron-70B-Instruct-HF-bf16

NaNK
llama
175
1

Qwen3-4B-6bit

NaNK
license:apache-2.0
174
0

Mistral-7B-Instruct-v0.2-4bit

NaNK
license:apache-2.0
172
1

Llama-3.2-90B-Vision-Instruct-4bit

NaNK
mllama
171
4

GLM-4.5V-abliterated-4bit

mlx-community/GLM-4.5V-abliterated-4bit This model was converted to MLX format from [`huihui-ai/Huihui-GLM-4.5V-abliterated`]() using mlx-vlm. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
171
1

quantized-gemma-2b-it

NaNK
170
10

Fara-7B-4bit

NaNK
license:apache-2.0
169
1

Meta-Llama-3-70B-Instruct-4bit

NaNK
llama
168
7

olmOCR-2-7B-1025-mlx-8bit

mlx-community/olmOCR-2-7B-1025-mlx-8bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
165
1

TinyLlama-1.1B-Chat-v1.0-4bit

NaNK
llama
164
0

Unsloth-Phi-4-4bit

NaNK
llama
162
5

Qwen2.5-Coder-14B-Instruct-4bit

NaNK
license:apache-2.0
162
4

GLM-4.5V-abliterated-8bit

mlx-community/GLM-4.5V-abliterated-8bit This model was converted to MLX format from [`huihui-ai/Huihui-GLM-4.5V-abliterated`]() using mlx-vlm. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
162
1

jinaai-ReaderLM-v2

NaNK
license:mit
161
23

Apertus-8B-Instruct-2509-4bit

NaNK
license:apache-2.0
161
1

Meta-Llama-3.1-70B-Instruct-bf16-CORRECTED

NaNK
llama
161
0

Qwen3-VL-4B-Thinking-8bit

mlx-community/Qwen3-VL-4B-Thinking-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
160
0

paligemma-3b-mix-448-8bit

NaNK
159
7

whisper-tiny-mlx

159
2

phi-4-4bit

The Model mlx-community/phi-4-4bit was converted to MLX format from microsoft/phi-4 using mlx-lm version 0.21.0.

NaNK
license:mit
158
19

llava-phi-3-mini-4bit

NaNK
license:apache-2.0
158
9

GLM-4.5-Air-3bit-DWQ

This model mlx-community/GLM-4.5-Air-3bit-DWQ was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.1.

NaNK
license:mit
158
4

Qwen2.5-Coder-1.5B-Instruct-4bit

NaNK
license:apache-2.0
157
1

granite-4.0-h-1b-6bit

This model mlx-community/granite-4.0-h-1b-6bit was converted to MLX format from ibm-granite/granite-4.0-h-1b using mlx-lm version 0.28.4.

NaNK
license:apache-2.0
157
0

Qwen2.5-32B-Instruct-4bit

NaNK
license:apache-2.0
156
4

Mistral-Large-Instruct-2407-4bit

NaNK
156
1

Apriel-1.5-15b-Thinker-8bit

NaNK
license:mit
156
0

Qwen3-14B-4bit-AWQ

NaNK
license:apache-2.0
155
4

DeepSeek-R1-Qwen3-0528-8B-4bit-AWQ

NaNK
license:mit
155
4

granite-4.0-h-1b-8bit

This model mlx-community/granite-4.0-h-1b-8bit was converted to MLX format from ibm-granite/granite-4.0-h-1b using mlx-lm version 0.28.4.

NaNK
license:apache-2.0
153
1

Qwen3-4B-Thinking-2507-fp16

NaNK
license:apache-2.0
153
0

granite-4.0-h-350m-8bit

NaNK
license:apache-2.0
153
0

Qwen2.5-Coder-32B-Instruct-4bit

NaNK
license:apache-2.0
152
10

Huihui-gemma-3n-E4B-it-abliterated-lm-8bit

NaNK
149
1

Phi-3-vision-128k-instruct-4bit

NaNK
license:mit
148
8

Nous-Hermes-2-Mistral-7B-DPO-4bit-MLX

NaNK
license:apache-2.0
148
5

Josiefied Qwen3 30B A3B Abliterated V2 4bit

NaNK
145
2

AI21-Jamba-Reasoning-3B-4bit

This model mlx-community/AI21-Jamba-Reasoning-3B-4bit was converted to MLX format from ai21labs/AI21-Jamba-Reasoning-3B using mlx-lm version 0.28.2.

NaNK
license:apache-2.0
145
0

DeepSeek-Coder-V2-Instruct-AQ4_1

144
3

Josiefied-Qwen3-4B-Instruct-2507-abliterated-v1-8bit

NaNK
143
0

Ministral-8B-Instruct-2410-4bit

NaNK
142
9

Josiefied-Qwen3-8B-abliterated-v1-4bit

NaNK
142
2

UTENA-7B-NSFW-V2-4bit

NaNK
142
1

olmOCR-2-7B-1025-mlx-4bit

mlx-community/olmOCR-2-7B-1025-mlx-4bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
142
1

parakeet-tdt_ctc-1.1b

NaNK
license:cc-by-4.0
142
0

DeepSeek-Coder-V2-Lite-Instruct-4bit

NaNK
142
0

SmolVLM2-2.2B-Instruct-mlx

NaNK
license:apache-2.0
141
8

Mistral-7B-v0.1-LoRA-Text2SQL

NaNK
license:mit
141
2

gemma-3n-E2B-it-lm-bf16

NaNK
141
0

csm-1b

NaNK
license:apache-2.0
139
20

Llama-4-Maverick-17B-16E-Instruct-6bit

mlx-community/Llama-4-Maverick-17B-16E-Instruct-6bit This model mlx-community/Llama-4-Maverick-17B-16E-Instruct-6bit was converted to MLX format from meta-llama/Llama-4-Maverick-17B-128E-Instruct using mlx-lm version 0.22.3.

NaNK
llama4
139
2

SmolLM-135M-4bit

NaNK
llama
139
1

DeepSeek-V3.1-mlx-DQ5_K_M

This model mlx-community/DeepSeek-V3.1-mlx-DQ5KM was converted to MLX format from deepseek-ai/DeepSeek-V3.1 using mlx-lm version 0.26.3. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. With 512 GB, we can do better than the 4-bit version of DeepSeek V3.1. Using research results, we aim to get better than 5-bit performance using smarter quantization. We aim to not have the quant so large that it leaves no memory for a useful context window. The temperature of 1.3 is DeepSeek's recommendation for translations. For coding, you should probably use a temperature of 0.6 or lower. In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In this case we did not want a improved 3-bit quant, but rather the best possible "5-bit" quant. We therefore modified the `DQ3KM` quantization by replacing 3-bit by 5-bit, 4-bit by 6-bit, and 6-bit by 8-bit to create a new `DQ5KM` quant. This produces a quantization of 5.638 bpw (bits per weight). In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like Should you wish to squeeze more out of your quant, and you do not need to use a larger context window, you can change the last part of the above code to

NaNK
license:mit
139
1

Ring-flash-linear-2.0-128k-4bit

This model mlx-community/Ring-flash-linear-2.0-128k-4bit was converted to MLX format from inclusionAI/Ring-flash-linear-2.0-128k using mlx-lm version 0.28.2.

NaNK
license:mit
139
1

Qwen3-Coder-30B-A3B-Instruct-3bit

This model mlx-community/Qwen3-Coder-30B-A3B-Instruct-3bit was converted to MLX format from Qwen/Qwen3-Coder-30B-A3B-Instruct using mlx-lm version 0.26.1.

NaNK
license:apache-2.0
139
0

whisper-large-v3-mlx-8bit

NaNK
138
5

Qwen3-30B-A3B-bf16

NaNK
license:apache-2.0
138
2

Qwen3-30B-A3B-Instruct-2507-6bit

This model mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit was converted to MLX format from Qwen/Qwen3-30B-A3B-Instruct-2507 using mlx-lm version 0.26.1.

NaNK
license:apache-2.0
137
0

meta-llama-Llama-4-Scout-17B-16E-4bit

NaNK
llama4
136
7

Qwen3-235B-A22B-Thinking-2507-3bit-DWQ

mlx-community/Qwen3-235B-A22B-Thinking-2507-3bit-DWQ This model mlx-community/Qwen3-235B-A22B-Thinking-2507-3bit-DWQ was converted to MLX format from Qwen/Qwen3-235B-A22B-Thinking-2507 using mlx-lm version 0.26.0.

NaNK
license:apache-2.0
136
6

DeepSeek-R1-Distill-Qwen-14B-8bit

NaNK
136
5

gemma-3-27b-it-qat-bf16

NaNK
136
5

GLM-4.5-Air-2bit-DWQ

This model mlx-community/GLM-4.5-Air-2bit-DWQ was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.2.

NaNK
license:mit
136
2

GLM-4-9B-0414-8bit

NaNK
license:mit
135
0

DeepSeek-V3.1-Base-4bit

NaNK
license:mit
134
3

deepseek-coder-33b-instruct-hf-4bit-mlx

NaNK
llama
134
1

Qwen3-VL-30B-A3B-Instruct-5bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-5bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
134
0

Qwen3-Next-80B-A3B-Thinking-8bit

NaNK
license:apache-2.0
133
2

moonshotai_Kimi-K2-Instruct-mlx-3bit

This model mlx-community/moonshotaiKimi-K2-Instruct-mlx-3bit was converted to MLX format from moonshotai/Kimi-K2-Instruct using mlx-lm version 0.26.3.

NaNK
133
0

UserLM-8b-8bit

NaNK
llama
133
0

Qwen2.5-7B-Instruct-1M-4bit

NaNK
license:apache-2.0
132
10

Llama-3.1-8B-Instruct

NaNK
llama
132
5

Llama-4-Maverick-17B-128E-Instruct-4bit

NaNK
llama4
132
2

Apriel 1.5 15b Thinker 6bit MLX

Apriel-1.5-15B-Thinker — MLX Quantized (Apple Silicon) Format: MLX (Apple Silicon) Variants: 6-bit (recommended) Base model: ServiceNow-AI/Apriel-1.5-15B-Thinker Architecture: Pixtral-style LLaVA (vision encoder → 2-layer projector → decoder) Intended use: image understanding & grounded reasoning; document/chart/OCR-style tasks; math/coding Q&A with visual context. > This repository provides MLX-format weights for Apple Silicon (M-series) built from the original Apriel-1.5-15B-Thinker release. It is optimized for on-device inference with small memory footprints and fast startup on macOS. Apriel-1.5-15B-Thinker is a 15B open-weights multimodal reasoning model trained via a data-centric mid-training recipe rather than RLHF/RM. Starting from Pixtral-12B as the base, the authors apply: 1) Depth Upscaling (capacity expansion without pretraining from scratch), 2) Two-stage multimodal continual pretraining (CPT) to build text + visual reasoning, and 3) High-quality SFT with explicit reasoning traces across math, coding, science, and tool use. This approach delivers frontier-level capability on compact compute. :contentReference[oaicite:0]{index=0} Key reported results (original model) - AAI Index: 52, matching DeepSeek-R1-0528 at far lower compute. :contentReference[oaicite:1]{index=1} - Multimodal: On 10 image benchmarks, within ~5 points of Gemini-2.5-Flash and Claude Sonnet-3.7 on average. :contentReference[oaicite:2]{index=2} - Designed for single-GPU / constrained deployment scenarios. :contentReference[oaicite:3]{index=3} > Notes above summarize the upstream paper; MLX quantization can slightly affect absolute scores. Always validate on your use case. - Backbone: Pixtral-12B-Base-2409 adapted to a larger 15B decoder via depth upscaling (layers 40 → 48), then re-aligned with a 2-layer projection network connecting the vision encoder and decoder. :contentReference[oaicite:4]{index=4} - Training stack: - CPT Stage-1: mixed tokens (≈50% text, 20% replay, 30% multimodal) for foundational reasoning & image understanding; 32k context; cosine LR with warmup; all components unfrozen; checkpoint averaging. :contentReference[oaicite:5]{index=5} - CPT Stage-2: targeted synthetic visual tasks (reconstruction, visual matching, detection, counting) to strengthen spatial/compositional/fine-grained reasoning; vision encoder frozen; loss on responses for instruct data; 16k context. :contentReference[oaicite:6]{index=6} - SFT: curated instruction-response pairs with explicit reasoning traces (math, coding, science, tools). :contentReference[oaicite:7]{index=7} - Why MLX? Native Apple-Silicon inference with small binaries, fast load, and low memory overhead. - What’s included: `config.json`, `mlxmodel.safetensors` (sharded), tokenizer & processor files, and metadata for VLM pipelines. - Quantization options: - 6-bit (recommended): best balance of quality & memory. > Tip: If you’re capacity-constrained on an M1/M2, try 6-bit first; ```bash Basic image caption python -m mlxvlm.generate \ --model \ --image /path/to/image.jpg \ --prompt "Describe this image." \ --max-tokens 128 --temperature 0.0 --device mps

NaNK
license:mit
132
1

DeepSeek-R1-0528-Qwen3-8B-4bit-DWQ

This model mlx-community/DeepSeek-R1-0528-Qwen3-8B-4bit-DWQ was converted to MLX format from deepseek-ai/DeepSeek-R1-0528-Qwen3-8B using mlx-lm version 0.24.1.

NaNK
license:mit
131
8

all-MiniLM-L6-v2-4bit

NaNK
license:apache-2.0
131
1

InternVL3_5-30B-A3B-4bit

mlx-community/InternVL35-30B-A3B-4bit This model was converted to MLX format from [`OpenGVLab/InternVL35-30B-A3B-HF`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
131
0

mistral-7B-v0.1

NaNK
license:apache-2.0
130
10

LFM2-8B-A1B-8bit

NaNK
130
1

Qwen3-VL-30B-A3B-Thinking-5bit

NaNK
license:apache-2.0
130
0

DeepSeek-R1-Distill-Qwen-14B-6bit

NaNK
129
6

Codestral-22B-v0.1-8bit

NaNK
128
8

GLM-Z1-32B-0414-4bit

NaNK
license:mit
128
2

Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8

NaNK
license:apache-2.0
128
1

bge-small-en-v1.5-4bit

NaNK
license:mit
128
0

DeepSeek-R1-3bit

NaNK
127
15

chatterbox-4bit

NaNK
license:apache-2.0
127
1

Nanonets-OCR2-3B-6bit

mlx-community/Nanonets-OCR2-3B-6bit This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
127
0

DeepSeek-v3-0324-8bit

This model mlx-community/DeepSeek-v3-0324-8bit was converted to MLX format from deepseek-ai/DeepSeek-v3-0324 using mlx-lm version 0.22.2.

NaNK
license:mit
126
1

Ring-1T-mlx-DQ3_K_M

This model mlx-community/Ring-1T-mlx-DQ3KM was converted to MLX format from inclusionAI/Ring-1T using mlx-lm version 0.28.1. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. The 4-bit version of Ring 1T does not fit. Using research results, we aim to get 4-bit performance from a slightly smaller and smarter quantization. It should also not be so large that it leaves no memory for a useful context window. In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like

NaNK
126
1

olmOCR-2-7B-1025-5bit

mlx-community/olmOCR-2-7B-1025-5bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
126
0

DeepSeek-R1-Distill-Qwen-7B-8bit

NaNK
125
8

plamo-2-1b

NaNK
license:apache-2.0
125
4

Llama-3.2-3B-Instruct-abliterated-6bit

NaNK
llama
125
0

embeddinggemma-300m-qat-q8_0-unquantized-bf16

mlx-community/embeddinggemma-300m-qat-q80-unquantized-bf16 The Model mlx-community/embeddinggemma-300m-qat-q80-unquantized-bf16 was converted to MLX format from google/embeddinggemma-300m-qat-q80-unquantized using mlx-lm version 0.0.4.

125
0

Qwen3-4B-Instruct-2507-8bit

This model mlx-community/Qwen3-4B-Instruct-2507-8bit was converted to MLX format from Qwen/Qwen3-4B-Instruct-2507 using mlx-lm version 0.26.2.

NaNK
license:apache-2.0
124
4

Llama-3.3-70B-Instruct-bf16

NaNK
llama
124
1

Qwen3-VL-32B-Instruct-bf16

mlx-community/Qwen3-VL-32B-Instruct-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
124
0

codegemma-7b-it-8bit

NaNK
123
5

Llama-3.1-8B-Instruct-4bit

The Model mlx-community/Llama-3.1-8B-Instruct-4bit was converted to MLX format from meta-llama/Llama-3.1-8B-Instruct using mlx-lm version 0.21.4.

NaNK
llama
123
2

Qwen3-Next-80B-A3B-Instruct-5bit

NaNK
license:apache-2.0
123
2

rnj-1-instruct-4bit

NaNK
license:apache-2.0
123
0

granite-3.3-8b-instruct-4bit

NaNK
license:apache-2.0
122
1

Qwen3-8B-4bit-DWQ-053125

This model mlx-community/Qwen3-8B-4bit-DWQ-053125 was converted to MLX format from Qwen/Qwen3-8B using mlx-lm version 0.24.1.

NaNK
license:apache-2.0
122
1

c4ai-command-r-plus-4bit

NaNK
license:cc-by-nc-4.0
121
49

Qwen2.5-72B-Instruct-4bit

NaNK
121
5

gemma-3-27b-it-4bit-DWQ

This model mlx-community/gemma-3-27b-it-4bit-DWQ was converted to MLX format from google/gemma-3-27b-it using mlx-lm version 0.24.0.

NaNK
121
3

dolphin-2.9-llama3-70b-4bit

NaNK
llama
120
5

Mistral-Small-24B-Instruct-2501-4bit

NaNK
license:apache-2.0
119
14

llava-v1.6-mistral-7b-4bit

NaNK
license:apache-2.0
119
5

gemma-3-1b-it-bf16

NaNK
119
1

dac-speech-24khz-1.5kbps

NaNK
119
1

Llama-OuteTTS-1.0-1B-4bit

NaNK
llama
119
1

LongCat-Flash-Chat-4bit

NaNK
license:mit
119
1

granite-4.0-h-1b-base-8bit

This model mlx-community/granite-4.0-h-1b-base-8bit was converted to MLX format from ibm-granite/granite-4.0-h-1b-base using mlx-lm version 0.28.4.

NaNK
license:apache-2.0
119
1

Llama-3.3-70B-Instruct-3bit

NaNK
llama
117
7

deepseek-coder-33b-instruct

NaNK
llama
117
0

Kimi-Linear-48B-A3B-Instruct-6bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-6bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK
license:mit
116
3

bitnet-b1.58-2B-4T-4bit

This model mlx-community/bitnet-b1.58-2B-4T-4bit was converted to MLX format from microsoft/bitnet-b1.58-2B-4T using mlx-lm version 0.25.1.

NaNK
license:mit
116
0

MinerU2.5-2509-1.2B-bf16

mlx-community/MinerU2.5-2509-1.2B-bf16 This model was converted to MLX format from [`opendatalab/MinerU2.5-2509-1.2B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:agpl-3.0
115
0

Mistral-Small-3.1-24B-Instruct-2503-4bit

NaNK
license:apache-2.0
114
9

Mixtral-8x7B-Instruct-v0.1-hf-4bit-mlx

NaNK
license:apache-2.0
114
7

Llama-3.1-Nemotron-Nano-4B-v1.1-4bit

This model mlx-community/Llama-3.1-Nemotron-Nano-4B-v1.1-4bit was converted to MLX format from nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1 using mlx-lm version 0.25.0.

NaNK
llama
114
0

Apriel-1.5-15b-Thinker-bf16

mlx-community/Apriel-1.5-15b-Thinker-bf16 This model was converted to MLX format from [`ServiceNow-AI/Apriel-1.5-15b-Thinker`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:mit
114
0

Qwen3-30B-A3B-Thinking-2507-4bit

This model mlx-community/Qwen3-30B-A3B-Thinking-2507-4bit was converted to MLX format from Qwen/Qwen3-30B-A3B-Thinking-2507 using mlx-lm version 0.26.3.

NaNK
license:apache-2.0
113
3

LFM2-8B-A1B-fp16

NaNK
113
2

Qwen2.5-VL-32B-Instruct-4bit

NaNK
license:apache-2.0
112
4

Qwen3-14B-4bit-DWQ-053125

This model mlx-community/Qwen3-14B-4bit-DWQ-053125 was converted to MLX format from Qwen/Qwen3-14B using mlx-lm version 0.24.1.

NaNK
license:apache-2.0
112
4

meta-llama-Llama-4-Scout-17B-16E-fp16

NaNK
llama4
112
3

gemma-3-4b-it-bf16

NaNK
112
1

deepseek-coder-6.7b-instruct-hf-4bit-mlx

NaNK
llama
112
0

gemma-3-1b-it-4bit-DWQ

This model mlx-community/gemma-3-1b-it-4bit-DWQ was converted to MLX format from google/gemma-3-1b-it using mlx-lm version 0.24.1.

NaNK
112
0

gemma-3n-E4B-it-bf16

NaNK
111
12

LongCat-Flash-Chat-mlx-DQ6_K_M

NaNK
111
1

gemma-3-270m-it-bf16

This model mlx-community/gemma-3-270m-it-bf16 was converted to MLX format from google/gemma-3-270m-it using mlx-lm version 0.26.3.

111
1

whisper-medium-mlx-4bit

NaNK
111
0

Qwen3-14B-6bit

NaNK
license:apache-2.0
111
0

gpt2-base-mlx

NaNK
license:mit
111
0

LFM2-VL-450M-8bit

NaNK
110
10

starcoder2-7b-4bit

NaNK
110
2

Ling-mini-2.0-4bit

This model mlx-community/Ling-mini-2.0-4bit was converted to MLX format from inclusionAI/Ling-mini-2.0 using mlx-lm version 0.27.1.

NaNK
license:mit
110
1

LLaDA2.0-mini-preview-4bit

This model mlx-community/LLaDA2.0-mini-preview-4bit was converted to MLX format from inclusionAI/LLaDA2.0-mini-preview using mlx-lm version 0.28.4.

NaNK
license:apache-2.0
110
1

Qwen3-4B-4bit-DWQ-053125

NaNK
license:apache-2.0
109
2

Dolphin-Mistral-24B-Venice-Edition-4bit

mlx-community/Dolphin-Mistral-24B-Venice-Edition-4bit This model mlx-community/Dolphin-Mistral-24B-Venice-Edition-4bit was converted to MLX format from cognitivecomputations/Dolphin-Mistral-24B-Venice-Edition using mlx-lm version 0.25.3.

NaNK
license:apache-2.0
109
1

Llama-3-8B-Instruct-1048k-8bit

NaNK
llama
108
17

conikeec-deepseek-coder-6.7b-instruct

NaNK
llama
108
1

Josiefied-DeepSeek-R1-0528-Qwen3-8B-abliterated-v1-4bit

mlx-community/Josiefied-DeepSeek-R1-0528-Qwen3-8B-abliterated-v1-4bit This model mlx-community/Josiefied-DeepSeek-R1-0528-Qwen3-8B-abliterated-v1-4bit was converted to MLX format from Goekdeniz-Guelmez/Josiefied-DeepSeek-R1-0528-Qwen3-8B-abliterated-v1 using mlx-lm version 0.24.1.

NaNK
108
1

Apertus-8B-Instruct-2509-8bit

NaNK
license:apache-2.0
108
0

Gemma-3-Glitter-12B-8bit

NaNK
108
0

gemma-3-12b-it-4bit-DWQ

This model mlx-community/gemma3-12b-it-4bit-DWQ was converted to MLX format from google/gemma-3-12b-it using mlx-lm version 0.24.0.

NaNK
107
2

Gabliterated-Qwen3-0.6B-4bit

This model mlx-community/Gabliterated-Qwen3-0.6B-4bit was converted to MLX format from Goekdeniz-Guelmez/Gabliterated-Qwen3-0.6B using mlx-lm version 0.25.2.

NaNK
license:apache-2.0
107
0

gemma-3-270m-4bit

This model mlx-community/gemma-3-270m-4bit was converted to MLX format from google/gemma-3-270m using mlx-lm version 0.26.3.

NaNK
107
0

Qwen3-VL-2B-Thinking-bf16

mlx-community/Qwen3-VL-2B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
107
0

chatterbox-fp16

license:apache-2.0
106
2

gemma-2-27b-bf16

NaNK
106
0

Qwen3-VL-4B-Instruct-6bit

mlx-community/Qwen3-VL-4B-Instruct-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
106
0

Mistral-7B-Instruct-v0.3-8bit

NaNK
license:apache-2.0
105
3

nomicai-modernbert-embed-base-bf16

license:apache-2.0
105
0

bitnet-b1.58-2B-4T-8bit

This model mlx-community/bitnet-b1.58-2B-4T-8bit was converted to MLX format from microsoft/bitnet-b1.58-2B-4T using mlx-lm version 0.25.1.

NaNK
license:mit
105
0

Qwen3-Coder-30B-A3B-Instruct-bf16

This model mlx-community/Qwen3-Coder-30B-A3B-Instruct-bf16 was converted to MLX format from Qwen/Qwen3-Coder-30B-A3B-Instruct using mlx-lm version 0.26.2.

NaNK
license:apache-2.0
105
0

LFM2-8B-A1B-6bit

This model mlx-community/LFM2-8B-A1B-6bit was converted to MLX format from LiquidAI/LFM2-8B-A1B using mlx-lm version 0.28.2.

NaNK
105
0

gemma-3n-E4B-it-lm-bf16

NaNK
103
4

Qwen2.5-Coder-1.5B-4bit

NaNK
license:apache-2.0
103
2

gemma-3-270m-it-qat-4bit

This model mlx-community/gemma-3-270m-it-qat-4bit was converted to MLX format from google/gemma-3-270m-it-qat using mlx-lm version 0.26.3.

NaNK
103
1

DeepSeek-R1-Distill-Qwen-1.5B-6bit

NaNK
103
0

medgemma-27b-it-8bit

NaNK
103
0

gemma-3-27b-it-bf16

NaNK
102
4

orpheus-3b-0.1-ft-4bit

This model mlx-community/orpheus-3b-0.1-ft-4bit was converted to MLX format from canopylabs/orpheus-3b-0.1-ft using mlx-audio version 0.0.3.

NaNK
llama
102
3

meta-llama-Llama-4-Scout-17B-16E-Instruct-bf16

NaNK
llama4
102
0

c4ai-command-r-v01-4bit

NaNK
101
23

Llama-3.2-8X4B-MOE-V2-Dark-Champion-Instruct-uncensored-abliterated-21B-MLX

NaNK
Llama 3.2
101
1

Qwen3-1.7B-4bit-DWQ-053125

NaNK
license:apache-2.0
100
2

Qwen3-4B-Instruct-2507-5bit

NaNK
license:apache-2.0
100
1

LFM2-8B-A1B-6bit-MLX

Maintainer / Publisher: Susant Achary Upstream model: LiquidAI/LFM2-8B-A1B This repo (MLX 6-bit): `mlx-community/LFM2-8B-A1B-6bit-MLX` This repository provides an Apple-Silicon-optimized MLX build of LFM2-8B-A1B at 6-bit quantization. Among quantized tiers, 6-bit is a strong fidelity sweet-spot for many Macs—noticeably smaller than FP16/8-bit while preserving answer quality for instruction following, summarization, and structured extraction. - Architecture: Mixture-of-Experts (MoE) Transformer. - Size: ~8B total parameters with ~1B active per token (A1B ≈ “~1B active”). - Why MoE? At each token, a subset of experts is activated, reducing compute per token while keeping a larger parameter pool for expressivity. > Single-device memory reality: Even though only ~1B are active per token, all experts typically reside in memory during inference on one device. That means RAM planning should track total parameters, not just the active slice. - `config.json` (MLX), `mlxmodel.safetensors` (6-bit shards) - Tokenizer files: `tokenizer.json`, `tokenizerconfig.json` - Model metadata (e.g., `modelindex.json`) Target: macOS on Apple Silicon (M-series) with Metal/MPS. - General instruction following, chat, and summarization - RAG and long-context assistants on device - Schema-guided structured outputs (JSON) - Quantization can cause small regressions vs FP16 on tricky math/code or tight formatting. - For very long contexts and/or batching, the KV-cache can dominate memory—tune `maxtokens` and batch size. - Add your own safety/guardrails for sensitive deployments. You asked to assume and decide realistic ranges. The following are practical starting points for a single-device MLX run; validate on your hardware. Rule-of-thumb components - Weights (6-bit): ≈ `totalparams × 0.75 byte` → for 8B params ≈ ~6.0 GB

NaNK
100
0

Josiefied-Qwen2.5-7B-Instruct-abliterated-v2

NaNK
license:apache-2.0
99
3

deepseek-coder-1.3b-instruct-mlx

NaNK
llama
99
1

Qwen2.5-Coder-32B-Instruct-8bit

NaNK
license:apache-2.0
98
11

Qwen2.5-VL-3B-Instruct-bf16

NaNK
97
4

gemma-3-4b-it-4bit-DWQ

This model mlx-community/gemma-3-4b-it-4bit-DWQ was converted to MLX format from google/gemma-3-4b-it using mlx-lm version 0.24.0.

NaNK
97
1

Qwen3-1.7B-8bit

NaNK
license:apache-2.0
97
0

Huihui-gemma-3n-E4B-it-abliterated-lm-6bit

mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-6bit The Model mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-6bit was converted to MLX format from huihui-ai/Huihui-gemma-3n-E4B-it-abliterated using mlx-lm version 0.26.4.

NaNK
97
0

Qwen3-VL-2B-Instruct-8bit

mlx-community/Qwen3-VL-2B-Instruct-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK
license:apache-2.0
97
0

GLM-4-32B-0414-4bit-DWQ

NaNK
license:mit
96
4

granite-4.0-h-tiny-5bit-MLX

NaNK
license:apache-2.0
96
2

Josiefied-Qwen3-30B-A3B-abliterated-v2-8bit

NaNK
96
0

Huihui-gemma-3n-E4B-it-abliterated-lm-4bit

mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-4bit The Model mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-4bit was converted to MLX format from huihui-ai/Huihui-gemma-3n-E4B-it-abliterated using mlx-lm version 0.26.4.

NaNK
96
0