mlx-community

✓ VerifiedCommunity

Apple MLX framework community contributions

500 models • 27 total models in database

Sort by:

gpt-oss-20b-MXFP4-Q8

--- license: apache-2.0 pipeline_tag: text-generation library_name: mlx tags: - vllm - mlx base_model: openai/gpt-oss-20b ---

NaNK

license:apache-2.0

812,246

parakeet-tdt-0.6b-v3

--- library_name: mlx language: - en - es - fr - de - bg - hr - cs - da - nl - et - fi - el - hu - it - lv - lt - mt - pl - pt - ro - sk - sl - sv - ru - uk tags: - mlx - automatic-speech-recognition - speech - audio - FastConformer - Conformer - Parakeet license: cc-by-4.0 pipeline_tag: automatic-speech-recognition base_model: nvidia/parakeet-tdt-0.6b-v3 ---

NaNK

license:cc-by-4.0

764,711

parakeet-tdt-0.6b-v2

--- library_name: mlx tags: - mlx - automatic-speech-recognition - speech - audio - FastConformer - Conformer - Parakeet license: cc-by-4.0 pipeline_tag: automatic-speech-recognition base_model: nvidia/parakeet-tdt-0.6b-v2 ---

NaNK

license:cc-by-4.0

566,735

whisper-small-mlx

—

122,365

gemma-3-12b-it-qat-4bit

NaNK

—

116,754

gemma-3-27b-it-qat-4bit

mlx-community/gemma-3-27b-it-qat-4bit This model was converted to MLX format from [`google/gemma-3-27b-it-qat-q40-unquantized`]() using mlx-vlm version 0.1.23. Refer to the original model card for more details on the model. Use with mlx

NaNK

—

80,767

gemma-3-4b-it-qat-4bit

NaNK

—

46,432

Qwen3-30B-A3B-Instruct-2507-4bit

This model mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit was converted to MLX format from Qwen/Qwen3-30B-A3B-Instruct-2507 using mlx-lm version 0.26.3.

NaNK

license:apache-2.0

35,954

gemma-3-1b-it-qat-4bit

The Model mlx-community/gemma-3-1b-it-qat-4bit was converted to MLX format from google/gemma-3-1b-it-qat-q40 using mlx-lm version 0.22.5.

NaNK

—

31,179

DeepSeek-OCR-8bit

mlx-community/DeepSeek-OCR-8bit This model was converted to MLX format from [`deepseek-ai/DeepSeek-OCR`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:mit

6,579

Qwen3-Embedding-0.6B-4bit-DWQ

NaNK

license:apache-2.0

6,330

Mistral-7B-Instruct-v0.3-4bit

NaNK

license:apache-2.0

6,195

gemma-3-1b-it-4bit

NaNK

—

5,765

gemma-3n-E4B-it-lm-4bit

NaNK

—

5,528

gemma-3n-E2B-it-lm-4bit

NaNK

—

5,251

Kokoro-82M-bf16

mlx-community/Kokoro-82M-bf16 This model was converted to MLX format from [`hexagrad/Kokoro-82M`]() using mlx-audio version 0.0.1. Refer to the original model card for more details on the model. Use with mlx

license:apache-2.0

5,087

Qwen3-4B-4bit

NaNK

license:apache-2.0

5,003

Qwen3-0.6B-4bit

NaNK

license:apache-2.0

4,446

MiniMax-M2-4bit

This model mlx-community/MiniMax-M2-4bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK

license:mit

4,381

Qwen2-VL-2B-Instruct-4bit

NaNK

license:apache-2.0

4,298

whisper-large-v3-turbo

whisper-large-v3-turbo This model was converted to MLX format from [`large-v3-turbo`]().

—

3,909

Qwen2.5-3B-Instruct-4bit

NaNK

—

3,781

bge-small-en-v1.5-bf16

license:mit

3,206

Qwen3-VL-2B-Instruct-4bit

mlx-community/Qwen3-VL-2B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

3,075

Qwen3-235B-A22B-4bit

NaNK

license:apache-2.0

2,917

Dolphin3.0-Llama3.1-8B-4bit

NaNK

llama

2,899

DeepSeek-OCR-4bit

NaNK

license:mit

2,836

SmolLM3-3B-4bit

NaNK

license:apache-2.0

2,725

Qwen2.5-0.5B-Instruct-4bit

NaNK

license:apache-2.0

2,718

Qwen3-235B-A22B-8bit

NaNK

license:apache-2.0

2,715

Qwen1.5-0.5B-Chat-4bit

NaNK

—

2,702

DeepSeek-R1-Distill-Qwen-1.5B-8bit

NaNK

—

2,701

Qwen3-4B-Thinking-2507-4bit

NaNK

license:apache-2.0

2,475

GLM-4.6-4bit

This model mlx-community/GLM-4.6-4bit was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.1.

NaNK

license:mit

2,440

DeepSeek-OCR-5bit

mlx-community/DeepSeek-OCR-5bit This model was converted to MLX format from [`deepseek-ai/DeepSeek-OCR`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:mit

2,418

Qwen2.5-1.5B-Instruct-4bit

NaNK

license:apache-2.0

2,369

MiniMax-M2-3bit

NaNK

license:mit

2,368

DeepSeek-R1-Distill-Qwen-1.5B-4bit

NaNK

—

2,199

Kimi-Linear-48B-A3B-Instruct-4bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-4bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK

license:mit

2,023

whisper-medium-mlx

—

1,901

Qwen3-4B-Instruct-2507-4bit

NaNK

license:apache-2.0

1,898

granite-4.0-h-micro-4bit

NaNK

license:apache-2.0

1,889

MiniMax-M2-8bit

This model mlx-community/MiniMax-M2-8bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK

license:mit

1,865

Qwen3-8B-4bit

NaNK

license:apache-2.0

1,815

Llama-3.2-3B-Instruct-8bit

NaNK

llama

1,806

Meta-Llama-3-8B-Instruct-4bit

mlx-community/Meta-Llama-3-8B-Instruct-4bit This model was converted to MLX format from [`meta-llama/Meta-Llama-3-8B-Instruct`]() using mlx-lm version 0.9.0. Refer to the original model card for more details on the model. Use with mlx

NaNK

llama

1,792

SmolVLM2-500M-Video-Instruct-mlx

license:apache-2.0

1,649

Qwen3-Embedding-4B-4bit-DWQ

NaNK

license:apache-2.0

1,629

Josiefied-Qwen3-4B-abliterated-v1-4bit

mlx-community/Josiefied-Qwen3-4B-abliterated-v1-4bit This model mlx-community/Josiefied-Qwen3-4B-abliterated-v1-4bit was converted to MLX format from Goekdeniz-Guelmez/Josiefied-Qwen3-4B-abliterated-v1 using mlx-lm version 0.24.0.

NaNK

—

1,565

Qwen2.5-7B-Instruct-4bit

NaNK

license:apache-2.0

1,491

Phi-4-mini-instruct-4bit

NaNK

license:mit

1,490

phi-2

NaNK

—

1,461

MiniMax-M2-mlx-8bit-gs32

This model mlx-community/MiniMax-M2-mlx-8bit-gs32 was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.1. Recipe: 8-bit group-size 32 9 bits per weight (bpw) You can find more similar MLX model quants for a single Apple Mac Studio M3 Ultra with 512 GB at https://huggingface.co/bibproj

NaNK

license:mit

1,382

granite-3.3-2b-instruct-4bit

NaNK

license:apache-2.0

1,347

Llama-3.3-70B-Instruct-4bit

NaNK

llama

1,342

GLM-4.6-mlx-8bit-gs32

This model mlx-community/GLM-4.6-mlx-8bit-gs32 was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.1. Recipe: 8-bit group-size 32 9 bits per weight (bpw) You can find more similar MLX model quants for Apple Mac Studio with 512 GB at https://huggingface.co/bibproj

NaNK

license:mit

1,340

Qwen3-VL-235B-A22B-Instruct-3bit

mlx-community/Qwen3-VL-235B-A22B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-235B-A22B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

1,337

Llama-2-7b-chat-mlx

NaNK

llama

1,307

granite-4.0-h-tiny-4bit

NaNK

license:apache-2.0

1,253

DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx

NaNK

—

1,240

Qwen2.5-VL-3B-Instruct-4bit

NaNK

—

1,214

gemma-3-270m-it-8bit

NaNK

—

1,160

whisper-base-mlx

—

1,147

Qwen3-Embedding-8B-4bit-DWQ

NaNK

license:apache-2.0

1,145

parakeet-rnnt-0.6b

NaNK

license:cc-by-4.0

1,098

gpt-oss-120b-MXFP4-Q4

This model mlx-community/gpt-oss-120b-MXFP4-Q4 was converted to MLX format from openai/gpt-oss-120b using mlx-lm version 0.27.0.

NaNK

license:apache-2.0

1,080

Phi-3.5-mini-instruct-4bit

NaNK

license:mit

1,047

LFM2-2.6B-4bit

This model mlx-community/LFM2-2.6B-4bit was converted to MLX format from LiquidAI/LFM2-2.6B using mlx-lm version 0.28.0.

NaNK

—

1,026

LFM2-8B-A1B-4bit

NaNK

—

1,001

gpt-oss-20b-MXFP4-Q4

NaNK

license:apache-2.0

963

GLM-4.6-bf16

This model mlx-community/GLM-4.6-bf16 was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.2.

NaNK

license:mit

962

Qwen3-0.6B-8bit

NaNK

license:apache-2.0

953

LFM2-1.2B-4bit

NaNK

—

947

whisper-large-v2-mlx

—

939

gemma-3n-E4B-it-4bit

NaNK

—

921

Qwen3-VL-30B-A3B-Instruct-4bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

917

Dolphin3.0-Llama3.1-8B-8bit

NaNK

llama

916

MiniMax M2 6bit

This model mlx-community/MiniMax-M2-6bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK

license:mit

905

deepcogito-cogito-v1-preview-llama-3B-4bit

NaNK

llama

873

Hermes-3-Llama-3.2-3B-4bit

NaNK

llama

863

DeepSeek-R1-Distill-Qwen-32B-4bit

NaNK

—

853

Qwen3-VL-30B-A3B-Instruct-8bit

NaNK

license:apache-2.0

851

dolphin3.0-llama3.2-3B-4Bit

NaNK

llama

849

Phi-4-mini-instruct-8bit

NaNK

license:mit

844

GLM-4.5-Air-3bit

This model mlx-community/GLM-4.5-Air-3bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.1.

NaNK

license:mit

837

DeepSeek-V3-0324-4bit

NaNK

license:mit

826

GLM-4.6-5bit

This model mlx-community/GLM-4.6-5bit was converted to MLX format from zai-org/GLM-4.6 using mlx-lm version 0.28.1.

NaNK

license:mit

820

gemma-2-9b-it-4bit

NaNK

—

812

gpt-oss-120b-MXFP4-Q8

This model mlx-community/gpt-oss-120b-MXFP4-Q8 was converted to MLX format from openai/gpt-oss-120b using mlx-lm version 0.27.0.

NaNK

license:apache-2.0

803

DeepSeek-OCR-6bit

mlx-community/DeepSeek-OCR-6bit This model was converted to MLX format from [`deepseek-ai/DeepSeek-OCR`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:mit

796

Josiefied-Qwen3-1.7B-abliterated-v1-4bit

NaNK

—

795

DeepSeek-R1-0528-Qwen3-8B-4bit

NaNK

license:mit

791

Gemma 3 4b It 4bit

mlx-community/gemma-3-4b-it-4bit This model was converted to MLX format from [`google/gemma-3-4b-it`]() using mlx-vlm version 0.1.18. Refer to the original model card for more details on the model. Use with mlx

NaNK

—

772

SmolLM-135M-Instruct-4bit

NaNK

llama

749

whisper-tiny

—

744

Mistral-Nemo-Instruct-2407-4bit

NaNK

license:apache-2.0

743

LFM2-8B-A1B-3bit-MLX

Maintainer / Publisher: Susant Achary Upstream model: LiquidAI/LFM2-8B-A1B This repo (MLX 3-bit): `mlx-community/LFM2-8B-A1B-3bit-MLX` This repository provides an Apple-Silicon-optimized MLX build of LFM2-8B-A1B at 3-bit quantization. 3-bit is an excellent size↔quality sweet spot on many Macs—very small memory footprint with surprisingly solid answer quality and snappy decoding. - Architecture: Mixture-of-Experts (MoE) Transformer. - Size: ~8B total parameters with ~1B active per token (the “A1B” naming commonly indicates ~1B active params). - Why MoE? Per token, only a subset of experts is activated → lower compute per token while retaining a larger parameter pool for expressivity. > Memory reality on a single device: Even though ~1B parameters are active at a time, all experts typically reside in memory in single-device runs. Plan RAM based on total parameters, not just the active slice. - `config.json` (MLX), `mlxmodel.safetensors` (3-bit shards) - Tokenizer: `tokenizer.json`, `tokenizerconfig.json` - Metadata: `modelindex.json` (and/or processor metadata as applicable) Target: macOS on Apple Silicon (M-series) using Metal/MPS. - General instruction following, chat, and summarization - RAG back-ends and long-context assistants on device - Schema-guided structured outputs (JSON) where low RAM is a priority - 3-bit is lossy: tiny improvements in latency/RAM come with some accuracy trade-off vs 6/8-bit. - For very long contexts and/or batching, KV-cache can dominate memory—tune `maxtokens` and batch size. - Add your own guardrails/safety for production deployments. You asked to assume and decide realistic ranges. The numbers below are practical starting points—verify on your machine. - Weights (3-bit): ≈ `totalparams × 0.375 byte` → for 8B params ≈ ~3.0 GB - Runtime overhead: MLX graph/tensors/metadata → ~0.6–1.0 GB - KV-cache: grows with context × layers × heads × dtype → ~0.8–2.5+ GB | Context window | Estimated peak RAM | |---|---:| | 4k tokens | ~4.4–5.5 GB | | 8k tokens | ~5.2–6.6 GB | | 16k tokens | ~6.5–8.8 GB | > For ≤2k windows you may see ~4.0–4.8 GB. Larger windows/batches increase KV-cache and peak RAM. 🧭 Precision choices for LFM2-8B-A1B (lineup planning) While this card is 3-bit, teams often publish multiple precisions. Use this table as a planning guide (8B MoE LM; actuals depend on context/batch/prompts): | Variant | Typical Peak RAM | Relative Speed | Typical Behavior | When to choose | |---|---:|:---:|---|---| | 3-bit (this repo) | ~4.4–8.8 GB | 🔥🔥🔥🔥 | Direct, concise, great latency | Default on 8–16 GB Macs | | 6-bit | ~7.5–12.5 GB | 🔥🔥 | Best quality under quant | Choose if RAM allows | | 8-bit | ~9.5–12+ GB | 🔥🔥 | Largest quantized size / highest fidelity | When you prefer simpler 8-bit workflows | > MoE caveat: MoE lowers compute per token; unless experts are paged/partitioned, memory still scales with total parameters on a single device. Deterministic generation ```bash python -m mlxlm.generate \ --model mlx-community/LFM2-8B-A1B-3bit-MLX \ --prompt "Summarize the following in 5 concise bullet points:\n " \ --max-tokens 256 \ --temperature 0.0 \ --device mps \ --seed 0

NaNK

—

742

MiniMax M2 5bit

This model mlx-community/MiniMax-M2-5bit was converted to MLX format from MiniMaxAI/MiniMax-M2 using mlx-lm version 0.28.4.

NaNK

license:mit

739

gemma-3-text-4b-it-4bit

NaNK

—

736

Qwen3-Coder-30B-A3B-Instruct-4bit

NaNK

license:apache-2.0

688

granite-4.0-micro-8bit

This model mlx-community/granite-4.0-micro-8bit was converted to MLX format from ibm-granite/granite-4.0-micro using mlx-lm version 0.28.2.

NaNK

license:apache-2.0

662

Phi-3-mini-4k-instruct-4bit

NaNK

license:mit

661

Qwen3-VL-235B-A22B-Thinking-3bit

NaNK

license:apache-2.0

655

Llama-3.2-11B-Vision-Instruct-abliterated

NaNK

mllama

645

Kimi-Dev-72B-4bit-DWQ

NaNK

license:mit

636

Qwen3-VL-30B-A3B-Instruct-bf16

mlx-community/Qwen3-VL-30B-A3B-Instruct-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

634

whisper-small-mlx-8bit

NaNK

—

632

Llama-3.2-1B-Instruct-8bit

NaNK

llama

613

DeepSeek-R1-Distill-Qwen-7B-4bit

NaNK

—

609

GLM-4.5-Air-4bit

NaNK

license:mit

594

Qwen2.5-VL-7B-Instruct-4bit

NaNK

license:apache-2.0

564

LFM2-350M-8bit

NaNK

—

547

DeepSeek-R1-4bit

NaNK

—

544

Meta-Llama-3.1-70B-Instruct-4bit

NaNK

llama

544

Kimi-K2-Instruct-4bit

NaNK

—

541

phi-4-8bit

NaNK

license:mit

534

3b-de-ft-research_release-4bit

NaNK

llama

529

whisper-large-mlx

—

515

dolphin3.0-llama3.2-1B-4Bit

NaNK

llama

511

Qwen3-VL-8B-Instruct-4bit

mlx-community/Qwen3-VL-8B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

506

Llama-3.2-11B-Vision-Instruct-8bit

NaNK

mllama

505

Qwen3-4B-8bit

NaNK

license:apache-2.0

499

gemma-3-4b-it-8bit

NaNK

—

491

gemma-3n-E2B-it-4bit

NaNK

—

481

DeepSeek-V3.1-4bit

NaNK

license:mit

478

Qwen3-VL-8B-Thinking-8bit

mlx-community/Qwen3-VL-8B-Thinking-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

473

Ring-mini-linear-2.0-4bit

This model mlx-community/Ring-mini-linear-2.0-4bit was converted to MLX format from inclusionAI/Ring-mini-linear-2.0 using mlx-lm version 0.28.1.

NaNK

license:mit

472

Qwen3-VL-30B-A3B-Thinking-4bit

NaNK

license:apache-2.0

463

Qwen3-4B-Instruct-2507-4bit-DWQ-2510

This model mlx-community/Qwen3-4B-Instruct-2507-4bit-DWQ-2510 was converted to MLX format from Qwen/Qwen3-4B-Instruct-2507 using mlx-lm version 0.28.2.

NaNK

license:apache-2.0

455

Qwen3-Coder-30B-A3B-Instruct-4bit-dwq-v2

NaNK

license:apache-2.0

452

Qwen3-Coder-30B-A3B-Instruct-8bit

This model mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit was converted to MLX format from Qwen/Qwen3-Coder-30B-A3B-Instruct using mlx-lm version 0.26.1.

NaNK

license:apache-2.0

449

Qwen3-VL-30B-A3B-Thinking-3bit

NaNK

license:apache-2.0

449

Qwen3-VL-30B-A3B-Thinking-8bit

NaNK

license:apache-2.0

447

Qwen3-Coder-480B-A35B-Instruct-4bit

NaNK

license:apache-2.0

444

DeepSeek-V3.1-Terminus-4bit

This model mlx-community/DeepSeek-V3.1-Terminus-4bit was converted to MLX format from deepseek-ai/DeepSeek-V3.1-Terminus using mlx-lm version 0.27.1.

NaNK

license:mit

442

whisper-large-v3-turbo-q4

—

437

Qwen3-VL-30B-A3B-Thinking-bf16

mlx-community/Qwen3-VL-30B-A3B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

435

Granite-4.0-H-Tiny-4bit-DWQ

This model mlx-community/granite-4.0-h-Tiny-4bit-DWQ was converted to MLX format from ibm-granite/granite-4.0-h-small using mlx-lm version 0.28.2.

NaNK

license:apache-2.0

430

Llama-3.2-3B-Instruct

NaNK

llama

428

Qwen3-Next-80B-A3B-Instruct-4bit

NaNK

license:apache-2.0

425

parakeet-tdt_ctc-0.6b-ja

This model was converted to MLX format from nvidia/parakeet-tdtctc-0.6b-ja using the conversion script. Please refer to original model card for more details on the model.

NaNK

license:cc-by-4.0

424

Mistral-7B-Instruct-v0.2-4-bit

NaNK

license:apache-2.0

423

Qwen3-30B-A3B-4bit

NaNK

license:apache-2.0

419

Llama-3.2-3B-Instruct-uncensored-6bit

NaNK

llama

415

Kimi-K2-Instruct-0905-mlx-DQ3_K_M

This model mlx-community/Kimi-K2-Instruct-0905-mlx-DQ3KM was converted to MLX format from moonshotai/Kimi-K2-Instruct-0905 using mlx-lm version 0.26.3. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. The 4-bit version of Kimi K2 does not fit. Using research results, we aim to get 4-bit performance from a slightly smaller and smarter quantization. It should also not be so large that it leaves no memory for a useful context window. You can find more similar MLX model quants for Apple Mac Studio with 512 GB at https://huggingface.co/bibproj In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like Should you wish to squeeze more out of your quant, and you do not need to use a larger context window, you can change the last part of the above code to

NaNK

—

414

Qwen2.5-Coder-7B-Instruct-bf16

NaNK

license:apache-2.0

413

Mixtral-8x22B-4bit

NaNK

license:apache-2.0

412

Qwen3 VL 4B Instruct 8bit

NaNK

license:apache-2.0

411

Kimi-Linear-48B-A3B-Instruct-8bit

NaNK

license:mit

411

Llama-3.3-70B-Instruct-8bit

NaNK

llama

410

nvidia_Llama-3.1-Nemotron-70B-Instruct-HF_4bit

NaNK

llama

409

Huihui-GLM-4.5V-abliterated-mxfp4

mlx-community/Huihui-GLM-4.5V-abliterated-mxfp4 This model was converted to MLX format from [`huihui-ai/Huihui-GLM-4.5V-abliterated`]() using `mlx-vlm` with MXFP4 support. Refer to the original model card for more details on the model. Use with mlx

license:mit

407

gemma-3-1b-pt-4bit

NaNK

—

406

embeddinggemma-300m-8bit

NaNK

—

403

DeepSeek-R1-Distill-Llama-70B-8bit

NaNK

llama

391

chandra-8bit

NaNK

—

386

DeepSeek-Coder-V2-Lite-Instruct-8bit

NaNK

—

385

embeddinggemma-300m-bf16

The Model mlx-community/embeddinggemma-300m-bf16 was converted to MLX format from google/embeddinggemma-300m using mlx-lm version 0.0.4.

—

384

Qwen3-0.6B-bf16

NaNK

license:apache-2.0

382

Qwen2.5-3B-Instruct-8bit

NaNK

—

380

Nanonets-OCR2-3B-4bit

mlx-community/Nanonets-OCR2-3B-4bit This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

—

379

Meta-Llama-3-8B-Instruct

NaNK

llama

378

Qwen3-VL-8B-Instruct-bf16

mlx-community/Qwen3-VL-8B-Instruct-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

377

GLM-4.5-Air-bf16

This model mlx-community/GLM-4.5-Air-bf16 was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.28.2.

license:mit

375

Qwen3-VL-32B-Instruct-8bit

mlx-community/Qwen3-VL-32B-Instruct-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

374

Ling-1T-mlx-3bit

This model mlx-community/Ling-1T-mlx-3bit/ was converted to MLX format from inclusionAI/Ling-1T using mlx-lm version 0.28.1. You can find more similar MLX model quants for Apple Mac Studio with 512 GB at https://huggingface.co/bibproj

NaNK

license:mit

373

Llama-4-Scout-17B-16E-Instruct-4bit

NaNK

llama4

371

deepseek-r1-distill-qwen-1.5b

NaNK

—

370

Qwen2.5-VL-7B-Instruct-8bit

NaNK

license:apache-2.0

360

Apriel-1.5-15b-Thinker-4bit

mlx-community/Apriel-1.5-15b-Thinker-4bit This model was converted to MLX format from [`ServiceNow-AI/Apriel-1.5-15b-Thinker`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:mit

356

SmolVLM-Instruct-4bit

NaNK

license:apache-2.0

355

dolphin-vision-72b-4bit

NaNK

—

353

Codestral-22B-v0.1-4bit

NaNK

—

352

gemma-3-270m-it-4bit

NaNK

—

352

Qwen3-Embedding-0.6B-8bit

NaNK

license:apache-2.0

350

CodeLlama-70b-Instruct-hf-4bit-MLX

NaNK

llama

345

Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ

NaNK

license:apache-2.0

343

Nanonets-OCR2-3B-bf16

mlx-community/Nanonets-OCR2-3B-bf16 This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

—

341

Qwen2.5-7B-Instruct-Uncensored-4bit

NaNK

license:gpl-3.0

340

gemma-3-1b-it-8bit

NaNK

—

340

exaone-4.0-1.2b-4bit

NaNK

—

339

LFM2-8B-A1B-8bit-MLX

Maintainer / Publisher: Susant Achary Upstream model: LiquidAI/LFM2-8B-A1B This repo (MLX 8-bit): `mlx-community/LFM2-8B-A1B-8bit-MLX` This repository provides an Apple-Silicon-optimized MLX build of LFM2-8B-A1B at 8-bit quantization for fast, on-device inference. - Architecture: Mixture-of-Experts (MoE) Transformer. - Size: ~8B total parameters with ~1B active per token (the “A1B” suffix commonly denotes ~1B active params). - Why MoE? During generation, only a subset of experts is activated per token, reducing compute per token while keeping a larger total parameter pool for expressivity. > Important memory note (single-device inference): > Although compute per token benefits from MoE (fewer active parameters), the full set of experts still resides in memory for typical single-GPU/CPU deployments. In practice this means RAM usage scales with total parameters, not with the smaller active count. - `config.json` (MLX), `mlxmodel.safetensors` (8-bit shards) - Tokenizer files: `tokenizer.json`, `tokenizerconfig.json` - Model metadata (e.g., `modelindex.json`) Target platform: macOS on Apple Silicon (M-series) using Metal/MPS. - General instruction-following, chat, and summarization - RAG back-ends and long-context workflows on device - Function-calling / structured outputs with schema-style prompts - Even at 8-bit, long contexts (KV-cache) can dominate memory at high `maxtokens` or large batch sizes. - As with any quantization, small regressions vs FP16 can appear on intricate math/code or edge-formatting. You asked to assume and decide RAM usage in absence of your measurements. Below are practical planning numbers derived from first-principles + experience with MLX and similar MoE models. Treat them as starting points and validate on your hardware. - Weights: `~ totalparams × 1 byte` (8-bit). For 8B params → ~8.0 GB baseline. - Runtime overhead: MLX graph + tensors + metadata → ~0.5–1.0 GB typical. - KV cache: grows with contextlength × layers × heads × dtype; often 1–3+ GB for long contexts. | Context window | Estimated peak RAM | |---|---:| | 4k tokens | ~9.5–10.5 GB | | 8k tokens | ~10.5–11.8 GB | | 16k tokens | ~12.0–14.0 GB | > These ranges assume 8-bit weights, A1B MoE (all experts resident), batch size = 1, and standard generation settings. > On lower windows (≤2k), you may see ~9–10 GB. Larger windows or batches will increase KV-cache and peak RAM. While this card is 8-bit, teams often want a consistent lineup. If you later produce 6/5/4/3/2-bit MLX builds, here’s a practical guide (RAM figures are indicative for an 8B MoE LM; your results depend on context/batch): | Variant | Typical Peak RAM | Relative Speed | Typical Behavior | When to choose | |---|---:|:---:|---|---| | 4-bit | ~7–8 GB | 🔥🔥🔥 | Better detail retention | If 3-bit drops too much fidelity | | 6-bit | ~9–10.5 GB | 🔥🔥 | Near-max MLX quality | If you want accuracy under quant | | 8-bit (this repo) | ~9.5–12+ GB | 🔥🔥 | Highest quality among quant tiers | When RAM allows and you want the most faithful outputs | > MoE caveat: MoE reduces compute per token, but unless experts are paged/partitioned across devices and loaded on demand, memory still follows total parameters. On a single Mac, plan RAM as if the whole 8B parameter set is resident. Deterministic generation ```bash python -m mlxlm.generate \ --model mlx-community/LFM2-8B-A1B-8bit-MLX \ --prompt "Summarize the following in 5 bullet points:\n " \ --max-tokens 256 \ --temperature 0.0 \ --device mps \ --seed 0

NaNK

—

338

gemma-3-12b-it-qat-abliterated-lm-4bit

NaNK

—

338

FastVLM-0.5B-bf16

NaNK

—

337

DeepSeek-R1-Distill-Qwen-32B-MLX-8Bit

NaNK

—

336

Qwen3-8B-6bit

NaNK

license:apache-2.0

335

gemma-3-27b-it-4bit

NaNK

—

333

Nanonets-OCR2-3B-8bit

mlx-community/Nanonets-OCR2-3B-8bit This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

—

331

GLM-4.5-Air-mxfp4

This model mlx-community/GLM-4.5-Air-mxfp4 was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.28.0.

license:mit

328

SmolVLM2-256M-Video-Instruct-mlx

license:apache-2.0

326

Qwen3-0.6B-4bit-DWQ-05092025

NaNK

license:apache-2.0

325

Dolphin-Mistral-24B-Venice-Edition-mlx-8Bit

NaNK

license:apache-2.0

323

LFM2-700M-8bit

NaNK

—

318

Kimi-VL-A3B-Thinking-4bit

NaNK

—

316

DeepSeek-R1-Distill-Llama-8B-4bit

NaNK

llama

311

Phi-3.5-vision-instruct-4bit

NaNK

license:mit

310

deepseek-vl2-8bit

NaNK

—

306

Qwen3-30B-A3B-4bit-DWQ

NaNK

license:apache-2.0

305

DeepSeek-V3-4bit

NaNK

—

305

Qwen3-VL-4B-Instruct-3bit

mlx-community/Qwen3-VL-4B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

303

Meta-Llama-3.1-8B-Instruct-8bit

NaNK

llama

302

embeddinggemma-300m-4bit

NaNK

—

300

DeepSeek-R1-Distill-Qwen-1.5B-3bit

NaNK

—

298

whisper-tiny.en-mlx

—

298

Llama-3.2-8X4B-MOE-V2-Dark-Champion-Instruct-uncensored-abliterated-21B-Q_6-MLX

NaNK

Llama 3.2

295

nomicai-modernbert-embed-base-4bit

NaNK

license:apache-2.0

295

GLM-4.5-4bit

NaNK

license:mit

294

Kimi-Linear-48B-A3B-Instruct-6bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-6bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK

license:mit

294

Qwen2.5-Coder-7B-Instruct-4bit

NaNK

license:apache-2.0

291

Llama-4-Maverick-17B-16E-Instruct-4bit

NaNK

llama4

290

phi-2-hf-4bit-mlx

NaNK

license:mit

289

Qwen3-VL-8B-Thinking-4bit

mlx-community/Qwen3-VL-8B-Thinking-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

286

Qwen2.5-0.5B-Instruct-8bit

NaNK

license:apache-2.0

285

granite-4.0-h-micro-8bit

This model mlx-community/granite-4.0-h-micro-8bit was converted to MLX format from ibm-granite/granite-4.0-h-micro using mlx-lm version 0.28.2.

NaNK

license:apache-2.0

283

Ling-1T-mlx-DQ3_K_M

This model mlx-community/Ling-1T-mlx-DQ3KM was converted to MLX format from inclusionAI/Ling-1T using mlx-lm version 0.28.1. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. The 4-bit version of Ling 1T does not fit. Using research results, we aim to get 4-bit performance from a slightly smaller and smarter quantization. It should also not be so large that it leaves no memory for a useful context window. In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like

NaNK

—

280

OlmOCR 2 7B 1025 Bf16

mlx-community/olmOCR-2-7B-1025-bf16 This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

273

DeepSeek-R1-Distill-Qwen-14B-4bit

NaNK

—

270

GLM-4-9B-0414-4bit

NaNK

license:mit

270

embeddinggemma-300m-qat-q4_0-unquantized-bf16

mlx-community/embeddinggemma-300m-qat-q40-unquantized-bf16 The Model mlx-community/embeddinggemma-300m-qat-q40-unquantized-bf16 was converted to MLX format from google/embeddinggemma-300m-qat-q40-unquantized using mlx-lm version 0.0.4.

—

270

GLM-Z1-9B-0414-4bit

NaNK

license:mit

267

gemma-3-12b-it-4bit

NaNK

—

266

gemma-3-12b-it-bf16

NaNK

—

264

DeepSeek-R1-Distill-Qwen-32B-abliterated-4bit

NaNK

—

261

Qwen3-VL-32B-Instruct-4bit

mlx-community/Qwen3-VL-32B-Instruct-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

261

whisper-turbo

—

260

GLM-4-32B-0414-8bit

NaNK

license:mit

259

Apertus-8B-Instruct-2509-bf16

This model mlx-community/Apertus-8B-Instruct-2509-bf16 was converted to MLX format from swiss-ai/Apertus-8B-Instruct-2509 using mlx-lm version 0.27.0.

NaNK

license:apache-2.0

255

Qwen3-VL-8B-Thinking-bf16

mlx-community/Qwen3-VL-8B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

255

olmOCR-2-7B-1025-4bit

NaNK

license:apache-2.0

254

Qwen3-VL-4B-Thinking-bf16

mlx-community/Qwen3-VL-4B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

253

Meta-Llama-3.1-8B-Instruct-bf16

NaNK

llama

248

granite-4.0-h-tiny-3bit-MLX

Granite-4.0-H-Tiny — MLX 3-bit (Apple Silicon) Maintainer / Publisher: Susant Achary This repository provides an Apple-Silicon-optimized MLX build of IBM Granite-4.0-H-Tiny with 3-bit weight quantization (plus usage guidance for 2/4/5/6-bit variants if RAM allows). Granite 4.0 is IBM’s latest hybrid Mamba-2/Transformer family with selective Mixture-of-Experts (MoE), designed for long-context, hyper-efficient inference and enterprise use. :contentReference[oaicite:0]{index=0} 🔎 What’s Granite 4.0? - Architecture. Hybrid Mamba-2 + softmax attention; H variants add MoE routing (sparse activation). Aims to keep expressivity while dramatically reducing memory footprint. :contentReference[oaicite:1]{index=1} - Efficiency claims. Up to ~70% lower memory and ~2× faster inference vs. comparable models, especially for multi-session and long-context scenarios. :contentReference[oaicite:2]{index=2} - Context window. 128k tokens (Tiny/Base preview cards). :contentReference[oaicite:3]{index=3} - Licensing. Apache-2.0 for public/commercial use. :contentReference[oaicite:4]{index=4} > This MLX build targets Granite-4.0-H-Tiny (≈ 7B total, ≈ 1B active parameters). For reference, the family also includes H-Small (≈32B total / 9B active) and Micro/Micro-H (≈3B dense/hybrid) tiers. :contentReference[oaicite:5]{index=5} 📦 What’s in this repo (MLX format) - `config.json` (MLX), `mlxmodel.safetensors` (3-bit shards), tokenizer files, and processor metadata. - Ready for macOS on M-series chips via Metal/MPS. > The upstream Hugging Face model cards for Granite 4.0 (Tiny/Small) provide additional training details, staged curricula and alignment workflow. Start here for Tiny: ibm-granite/granite-4.0-h-tiny. :contentReference[oaicite:6]{index=6} ✅ Intended use - General instruction-following and chat with long context (128k). :contentReference[oaicite:7]{index=7} - Enterprise assistant patterns (function calling, structured outputs) and RAG backends that benefit from efficient, large windows. :contentReference[oaicite:8]{index=8} - On-device development on Macs (MLX), low-latency local prototyping and evaluation. ⚠️ Limitations - As a quantized, decoder-only LM, it can produce confident but wrong outputs—review for critical use. - 2–4-bit quantization may reduce precision on intricate tasks (math/code, tiny-text parsing); prefer higher bit-widths if RAM allows. - Follow your organization’s safety/PII/guardrail policies (Granite is “open-weight,” not a full product). :contentReference[oaicite:9]{index=9} 🧠 Model family at a glance | Tier | Arch | Params (total / active) | Notes | |---|---|---:|---| | H-Small | Hybrid + MoE | ~32B / 9B | Workhorse for enterprise agent tasks; strong function-calling & instruction following. :contentReference[oaicite:10]{index=10} | | H-Tiny (this repo) | Hybrid + MoE | ~7B / 1B | Long-context, efficiency-first; great for local dev. :contentReference[oaicite:11]{index=11} | | Micro / H-Micro | Dense / Hybrid | ~3B | Edge/low-resource alternatives; when hybrid runtime isn’t optimized. :contentReference[oaicite:12]{index=12} | Context Window: up to 128k tokens for Tiny/Base preview lines. :contentReference[oaicite:13]{index=13} License: Apache-2.0. :contentReference[oaicite:14]{index=14} 🧪 Observed on-device behavior (MLX) Empirically on M-series Macs: - 3-bit often gives crisp, direct answers with good latency and modest RAM. - Higher bit-widths (4/5/6-bit) improve faithfulness on fine-grained tasks (tiny OCR, structured parsing), at higher memory cost. > Performance varies by Mac model, image/token lengths, and temperature; validate on your workload. 🔢 Choosing a quantization level (Apple Silicon) | Variant | Typical Peak RAM (7B-class) | Relative speed | Typical behavior | When to choose | |---|---:|:---:|---|---| | 2-bit | ~3–4 GB | 🔥🔥🔥🔥 | Smallest footprint; most lossy | Minimal RAM devices / smoke tests | | 3-bit (this build) | ~5–6 GB | 🔥🔥🔥🔥 | Direct, concise, great latency | Default for local dev on M1/M2/M3/M4 | | 4-bit | ~6–7.5 GB | 🔥🔥🔥 | Better detail retention | When you need stronger faithfulness | | 5-bit | ~8–9 GB | 🔥🔥☆ | Higher fidelity | For heavy docs / structured outputs | | 6-bit | ~9.5–11 GB | 🔥🔥 | Max quality under MLX quant | If RAM headroom is ample | > Figures are indicative for language-only Tiny (no vision), and will vary with context length and KV cache size. 🚀 Quickstart (CLI — MLX) ```bash Plain generation (deterministic) python -m mlxlm.generate \ --model \ --prompt "Summarize the following notes into 5 bullet points:\n " \ --max-tokens 200 \ --temperature 0.0 \ --device mps \ --seed 0

NaNK

license:apache-2.0

246

GLM-4-32B-0414-4bit

NaNK

license:mit

244

CodeLlama-13b-Instruct-hf-4bit-MLX

NaNK

llama

244

Nanonets-OCR-s-bf16

NaNK

—

241

Qwen3-VL-32B-Thinking-4bit

mlx-community/Qwen3-VL-32B-Thinking-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

239

Qwen3-VL-2B-Instruct-3bit

mlx-community/Qwen3-VL-2B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

239

distil-whisper-large-v3

—

236

DeepSeek-R1-0528-4bit

NaNK

—

235

GLM-4.5-Air-2bit

This model mlx-community/GLM-4.5-Air-2bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.1.

NaNK

license:mit

235

InternVL3_5-GPT-OSS-20B-A4B-Preview-4bit

mlx-community/InternVL35-GPT-OSS-20B-A4B-Preview-4bit This model was converted to MLX format from [`OpenGVLab/InternVL35-GPT-OSS-20B-A4B-Preview-HF`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

235

plamo-2-translate

NaNK

—

230

Llama-3.2-11B-Vision-Instruct-4bit

NaNK

mllama

230

Kokoro-82M-4bit

NaNK

license:apache-2.0

230

CodeLlama-7b-Python-4bit-MLX

NaNK

llama

229

gemma-3-12b-it-8bit

NaNK

—

229

Qwen2.5-1.5B-Instruct-8bit

NaNK

license:apache-2.0

228

Qwen3-VL-4B-Thinking-4bit

mlx-community/Qwen3-VL-4B-Thinking-4bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

228

Qwen2.5-14B-Instruct-4bit

NaNK

license:apache-2.0

227

Mixtral-8x7B-Instruct-v0.1

NaNK

license:apache-2.0

226

parakeet-tdt-1.1b

NaNK

license:cc-by-4.0

226

Qwen3-Next-80B-A3B-Instruct-8bit

NaNK

license:apache-2.0

225

Llama-4-Scout-17B-16E-Instruct-8bit

NaNK

llama4

224

Qwen3 VL 8B Thinking 6bit

mlx-community/Qwen3-VL-8B-Thinking-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

224

Qwen2-VL-7B-Instruct-4bit

NaNK

license:apache-2.0

223

gemma-3-27b-it-qat-8bit

NaNK

—

222

DeepSeek-V3.1-8bit

NaNK

license:mit

222

GLM-4.5V-8bit

NaNK

license:mit

222

Hermes-3-Llama-3.1-8B-4bit

NaNK

llama

221

Qwen3-VL-32B-Thinking-bf16

NaNK

license:apache-2.0

217

parakeet-ctc-0.6b

NaNK

license:cc-by-4.0

216

Llama-4-Scout-17B-16E-Instruct-6bit

NaNK

llama4

214

deepcogito-cogito-v1-preview-llama-8B-4bit

NaNK

llama

214

Qwen3-VL-30B-A3B-Instruct-6bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

213

mxbai-embed-large-v1

NaNK

license:apache-2.0

211

Llama-3-8B-Instruct-1048k-4bit

NaNK

llama

210

OpenELM-270M-Instruct

—

210

GLM-4.5-Air-8bit

This model mlx-community/GLM-4.5-Air-8bit was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.0.

NaNK

license:mit

209

Qwen3-VL-4B-Instruct-5bit

mlx-community/Qwen3-VL-4B-Instruct-5bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

208

Mistral-7B-Instruct-v0.2

NaNK

license:apache-2.0

207

DeepSeek-R1-Distill-Llama-70B-4bit

NaNK

llama

206

Qwen3-VL-32B-Thinking-8bit

NaNK

license:apache-2.0

206

GLM-4.5-Air-3bit-DWQ-v2

NaNK

license:mit

202

Qwen3-VL-8B-Instruct-8bit

mlx-community/Qwen3-VL-8B-Instruct-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

202

Nous-Hermes-2-Mixtral-8x7B-DPO-4bit

NaNK

license:apache-2.0

201

Phi-3-mini-128k-instruct-4bit

NaNK

license:mit

200

Qwen2.5-VL-72B-Instruct-4bit

NaNK

—

198

Meta-Llama-3.1-405B-4bit

NaNK

llama

198

Qwen3-Next-80B-A3B-Thinking-4bit

This model mlx-community/Qwen3-Next-80B-A3B-Thinking-4bit was converted to MLX format from Qwen/Qwen3-Next-80B-A3B-Thinking using mlx-lm version 0.27.1.

NaNK

license:apache-2.0

197

Jinx-gpt-oss-20b-mxfp4-mlx

This model mlx-community/Jinx-gpt-oss-20b-mxfp4-mlx was converted to MLX format from Jinx-org/Jinx-gpt-oss-20b-mxfp4 using mlx-lm version 0.27.1.

NaNK

license:apache-2.0

197

Llama-4-Scout-17B-16E-4bit

NaNK

llama4

194

Qwen3-14B-4bit

NaNK

license:apache-2.0

194

NVIDIA-Nemotron-Nano-9B-v2-4bits

NaNK

—

193

Kimi-K2-Instruct-0905-mlx-3bit

mlx-community/moonshotaiKimi-K2-Instruct-0905-mlx-3bit This model mlx-community/moonshotaiKimi-K2-Instruct-0905-mlx-3bit was converted to MLX format from moonshotai/Kimi-K2-Instruct-0905 using mlx-lm version 0.26.3.

NaNK

—

191

Llama-3_3-Nemotron-Super-49B-v1_5-mlx-4Bit

mlx-community/Llama-33-Nemotron-Super-49B-v15-mlx-4Bit The Model mlx-community/Llama-33-Nemotron-Super-49B-v15-mlx-4Bit was converted to MLX format from unsloth/Llama-33-Nemotron-Super-49B-v15 using mlx-lm version 0.26.4.

NaNK

unsloth - llama-3 - pytorch

189

gemma-2-27b-it-4bit

NaNK

—

188

Qwen3-VL-30B-A3B-Instruct-3bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-3bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

186

DeepSeek-Coder-V2-Lite-Instruct-4bit-AWQ

NaNK

—

185

chandra-bf16

—

185

Qwen3-1.7B-MLX-MXFP4

This model mlx-community/Qwen3-1.7B-MLX-MXFP4 was converted to MLX format from Qwen/Qwen3-1.7B using mlx-lm version 0.28.3.

NaNK

license:apache-2.0

183

Phi-3-mini-4k-instruct-4bit-no-q-embed

NaNK

license:mit

182

gemma-3-27b-it-8bit

NaNK

—

180

Qwen3-VL-30B-A3B-Thinking-6bit

mlx-community/Qwen3-VL-30B-A3B-Thinking-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

179

NousResearch_Hermes-4-14B-BF16-abliterated-mlx

NaNK

license:apache-2.0

178

gemma-3-4b-it-5bit

This model mlx-community/gemma-3-4b-it-5bit was converted to MLX format from google/gemma-3-4b-it using mlx-lm version 0.28.2.

NaNK

—

178

Chandra 4bit

mlx-community/chandra-4bit This model was converted to MLX format from [`datalab-to/chandra`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

—

177

olmOCR-2-7B-1025-8bit

mlx-community/olmOCR-2-7B-1025-8bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

176

Llama-3.1-Nemotron-70B-Instruct-HF-bf16

NaNK

llama

175

Qwen3-4B-6bit

NaNK

license:apache-2.0

174

Mistral-7B-Instruct-v0.2-4bit

NaNK

license:apache-2.0

172

Llama-3.2-90B-Vision-Instruct-4bit

NaNK

mllama

171

GLM-4.5V-abliterated-4bit

mlx-community/GLM-4.5V-abliterated-4bit This model was converted to MLX format from [`huihui-ai/Huihui-GLM-4.5V-abliterated`]() using mlx-vlm. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:mit

171

quantized-gemma-2b-it

NaNK

—

170

Meta-Llama-3-70B-Instruct-4bit

NaNK

llama

168

olmOCR-2-7B-1025-mlx-8bit

mlx-community/olmOCR-2-7B-1025-mlx-8bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

165

TinyLlama-1.1B-Chat-v1.0-4bit

NaNK

llama

164

Unsloth-Phi-4-4bit

NaNK

llama

162

Qwen2.5-Coder-14B-Instruct-4bit

NaNK

license:apache-2.0

162

GLM-4.5V-abliterated-8bit

mlx-community/GLM-4.5V-abliterated-8bit This model was converted to MLX format from [`huihui-ai/Huihui-GLM-4.5V-abliterated`]() using mlx-vlm. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:mit

162

jinaai-ReaderLM-v2

NaNK

license:mit

161

Apertus-8B-Instruct-2509-4bit

NaNK

license:apache-2.0

161

Meta-Llama-3.1-70B-Instruct-bf16-CORRECTED

NaNK

llama

161

Qwen3-VL-4B-Thinking-8bit

mlx-community/Qwen3-VL-4B-Thinking-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

160

paligemma-3b-mix-448-8bit

NaNK

—

159

whisper-tiny-mlx

—

159

phi-4-4bit

NaNK

license:mit

158

llava-phi-3-mini-4bit

NaNK

license:apache-2.0

158

GLM-4.5-Air-3bit-DWQ

NaNK

license:mit

158

Qwen2.5-Coder-1.5B-Instruct-4bit

NaNK

license:apache-2.0

157

granite-4.0-h-1b-6bit

This model mlx-community/granite-4.0-h-1b-6bit was converted to MLX format from ibm-granite/granite-4.0-h-1b using mlx-lm version 0.28.4.

NaNK

license:apache-2.0

157

Qwen2.5-32B-Instruct-4bit

NaNK

license:apache-2.0

156

Mistral-Large-Instruct-2407-4bit

NaNK

—

156

Apriel-1.5-15b-Thinker-8bit

NaNK

license:mit

156

Qwen3-14B-4bit-AWQ

NaNK

license:apache-2.0

155

DeepSeek-R1-Qwen3-0528-8B-4bit-AWQ

NaNK

license:mit

155

granite-4.0-h-1b-8bit

This model mlx-community/granite-4.0-h-1b-8bit was converted to MLX format from ibm-granite/granite-4.0-h-1b using mlx-lm version 0.28.4.

NaNK

license:apache-2.0

153

Qwen3-4B-Thinking-2507-fp16

NaNK

license:apache-2.0

153

granite-4.0-h-350m-8bit

NaNK

license:apache-2.0

153

Qwen2.5-Coder-32B-Instruct-4bit

NaNK

license:apache-2.0

152

Huihui-gemma-3n-E4B-it-abliterated-lm-8bit

NaNK

—

149

Phi-3-vision-128k-instruct-4bit

NaNK

license:mit

148

Nous-Hermes-2-Mistral-7B-DPO-4bit-MLX

NaNK

license:apache-2.0

148

Josiefied Qwen3 30B A3B Abliterated V2 4bit

NaNK

—

145

AI21-Jamba-Reasoning-3B-4bit

This model mlx-community/AI21-Jamba-Reasoning-3B-4bit was converted to MLX format from ai21labs/AI21-Jamba-Reasoning-3B using mlx-lm version 0.28.2.

NaNK

license:apache-2.0

145

DeepSeek-Coder-V2-Instruct-AQ4_1

—

144

Josiefied-Qwen3-4B-Instruct-2507-abliterated-v1-8bit

NaNK

—

143

Ministral-8B-Instruct-2410-4bit

NaNK

—

142

Josiefied-Qwen3-8B-abliterated-v1-4bit

NaNK

—

142

UTENA-7B-NSFW-V2-4bit

NaNK

—

142

olmOCR-2-7B-1025-mlx-4bit

mlx-community/olmOCR-2-7B-1025-mlx-4bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.5. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

142

parakeet-tdt_ctc-1.1b

NaNK

license:cc-by-4.0

142

DeepSeek-Coder-V2-Lite-Instruct-4bit

NaNK

—

142

SmolVLM2-2.2B-Instruct-mlx

NaNK

license:apache-2.0

141

Mistral-7B-v0.1-LoRA-Text2SQL

NaNK

license:mit

141

gemma-3n-E2B-it-lm-bf16

NaNK

—

141

Kimi-Linear-48B-A3B-Instruct-3bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-3bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK

license:mit

141

csm-1b

NaNK

license:apache-2.0

139

Llama-4-Maverick-17B-16E-Instruct-6bit

NaNK

llama4

139

SmolLM-135M-4bit

NaNK

llama

139

DeepSeek-V3.1-mlx-DQ5_K_M

This model mlx-community/DeepSeek-V3.1-mlx-DQ5KM was converted to MLX format from deepseek-ai/DeepSeek-V3.1 using mlx-lm version 0.26.3. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. With 512 GB, we can do better than the 4-bit version of DeepSeek V3.1. Using research results, we aim to get better than 5-bit performance using smarter quantization. We aim to not have the quant so large that it leaves no memory for a useful context window. The temperature of 1.3 is DeepSeek's recommendation for translations. For coding, you should probably use a temperature of 0.6 or lower. In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In this case we did not want a improved 3-bit quant, but rather the best possible "5-bit" quant. We therefore modified the `DQ3KM` quantization by replacing 3-bit by 5-bit, 4-bit by 6-bit, and 6-bit by 8-bit to create a new `DQ5KM` quant. This produces a quantization of 5.638 bpw (bits per weight). In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like Should you wish to squeeze more out of your quant, and you do not need to use a larger context window, you can change the last part of the above code to

NaNK

license:mit

139

Ring-flash-linear-2.0-128k-4bit

This model mlx-community/Ring-flash-linear-2.0-128k-4bit was converted to MLX format from inclusionAI/Ring-flash-linear-2.0-128k using mlx-lm version 0.28.2.

NaNK

license:mit

139

Qwen3-Coder-30B-A3B-Instruct-3bit

This model mlx-community/Qwen3-Coder-30B-A3B-Instruct-3bit was converted to MLX format from Qwen/Qwen3-Coder-30B-A3B-Instruct using mlx-lm version 0.26.1.

NaNK

license:apache-2.0

139

whisper-large-v3-mlx-8bit

NaNK

—

138

Qwen3-30B-A3B-bf16

NaNK

license:apache-2.0

138

Qwen3-30B-A3B-Instruct-2507-6bit

NaNK

license:apache-2.0

137

meta-llama-Llama-4-Scout-17B-16E-4bit

NaNK

llama4

136

Qwen3-235B-A22B-Thinking-2507-3bit-DWQ

mlx-community/Qwen3-235B-A22B-Thinking-2507-3bit-DWQ This model mlx-community/Qwen3-235B-A22B-Thinking-2507-3bit-DWQ was converted to MLX format from Qwen/Qwen3-235B-A22B-Thinking-2507 using mlx-lm version 0.26.0.

NaNK

license:apache-2.0

136

DeepSeek-R1-Distill-Qwen-14B-8bit

NaNK

—

136

gemma-3-27b-it-qat-bf16

NaNK

—

136

GLM-4.5-Air-2bit-DWQ

This model mlx-community/GLM-4.5-Air-2bit-DWQ was converted to MLX format from zai-org/GLM-4.5-Air using mlx-lm version 0.26.2.

NaNK

license:mit

136

GLM-4-9B-0414-8bit

NaNK

license:mit

135

DeepSeek-V3.1-Base-4bit

NaNK

license:mit

134

deepseek-coder-33b-instruct-hf-4bit-mlx

NaNK

llama

134

Qwen3-VL-30B-A3B-Instruct-5bit

mlx-community/Qwen3-VL-30B-A3B-Instruct-5bit This model was converted to MLX format from [`Qwen/Qwen3-VL-30B-A3B-Instruct`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

134

Qwen3-Next-80B-A3B-Thinking-8bit

NaNK

license:apache-2.0

133

moonshotai_Kimi-K2-Instruct-mlx-3bit

This model mlx-community/moonshotaiKimi-K2-Instruct-mlx-3bit was converted to MLX format from moonshotai/Kimi-K2-Instruct using mlx-lm version 0.26.3.

NaNK

—

133

UserLM-8b-8bit

NaNK

llama

133

Qwen2.5-7B-Instruct-1M-4bit

NaNK

license:apache-2.0

132

Llama-3.1-8B-Instruct

NaNK

llama

132

Llama-4-Maverick-17B-128E-Instruct-4bit

NaNK

llama4

132

Apriel 1.5 15b Thinker 6bit MLX

Apriel-1.5-15B-Thinker — MLX Quantized (Apple Silicon) Format: MLX (Apple Silicon) Variants: 6-bit (recommended) Base model: ServiceNow-AI/Apriel-1.5-15B-Thinker Architecture: Pixtral-style LLaVA (vision encoder → 2-layer projector → decoder) Intended use: image understanding & grounded reasoning; document/chart/OCR-style tasks; math/coding Q&A with visual context. > This repository provides MLX-format weights for Apple Silicon (M-series) built from the original Apriel-1.5-15B-Thinker release. It is optimized for on-device inference with small memory footprints and fast startup on macOS. Apriel-1.5-15B-Thinker is a 15B open-weights multimodal reasoning model trained via a data-centric mid-training recipe rather than RLHF/RM. Starting from Pixtral-12B as the base, the authors apply: 1) Depth Upscaling (capacity expansion without pretraining from scratch), 2) Two-stage multimodal continual pretraining (CPT) to build text + visual reasoning, and 3) High-quality SFT with explicit reasoning traces across math, coding, science, and tool use. This approach delivers frontier-level capability on compact compute. :contentReference[oaicite:0]{index=0} Key reported results (original model) - AAI Index: 52, matching DeepSeek-R1-0528 at far lower compute. :contentReference[oaicite:1]{index=1} - Multimodal: On 10 image benchmarks, within ~5 points of Gemini-2.5-Flash and Claude Sonnet-3.7 on average. :contentReference[oaicite:2]{index=2} - Designed for single-GPU / constrained deployment scenarios. :contentReference[oaicite:3]{index=3} > Notes above summarize the upstream paper; MLX quantization can slightly affect absolute scores. Always validate on your use case. - Backbone: Pixtral-12B-Base-2409 adapted to a larger 15B decoder via depth upscaling (layers 40 → 48), then re-aligned with a 2-layer projection network connecting the vision encoder and decoder. :contentReference[oaicite:4]{index=4} - Training stack: - CPT Stage-1: mixed tokens (≈50% text, 20% replay, 30% multimodal) for foundational reasoning & image understanding; 32k context; cosine LR with warmup; all components unfrozen; checkpoint averaging. :contentReference[oaicite:5]{index=5} - CPT Stage-2: targeted synthetic visual tasks (reconstruction, visual matching, detection, counting) to strengthen spatial/compositional/fine-grained reasoning; vision encoder frozen; loss on responses for instruct data; 16k context. :contentReference[oaicite:6]{index=6} - SFT: curated instruction-response pairs with explicit reasoning traces (math, coding, science, tools). :contentReference[oaicite:7]{index=7} - Why MLX? Native Apple-Silicon inference with small binaries, fast load, and low memory overhead. - What’s included: `config.json`, `mlxmodel.safetensors` (sharded), tokenizer & processor files, and metadata for VLM pipelines. - Quantization options: - 6-bit (recommended): best balance of quality & memory. > Tip: If you’re capacity-constrained on an M1/M2, try 6-bit first; ```bash Basic image caption python -m mlxvlm.generate \ --model \ --image /path/to/image.jpg \ --prompt "Describe this image." \ --max-tokens 128 --temperature 0.0 --device mps

NaNK

license:mit

132

DeepSeek-R1-0528-Qwen3-8B-4bit-DWQ

NaNK

license:mit

131

all-MiniLM-L6-v2-4bit

NaNK

license:apache-2.0

131

InternVL3_5-30B-A3B-4bit

mlx-community/InternVL35-30B-A3B-4bit This model was converted to MLX format from [`OpenGVLab/InternVL35-30B-A3B-HF`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

131

mistral-7B-v0.1

NaNK

license:apache-2.0

130

LFM2-8B-A1B-8bit

NaNK

—

130

Qwen3-VL-30B-A3B-Thinking-5bit

NaNK

license:apache-2.0

130

DeepSeek-R1-Distill-Qwen-14B-6bit

NaNK

—

129

Codestral-22B-v0.1-8bit

NaNK

—

128

GLM-Z1-32B-0414-4bit

NaNK

license:mit

128

Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8

NaNK

license:apache-2.0

128

bge-small-en-v1.5-4bit

NaNK

license:mit

128

DeepSeek-R1-3bit

NaNK

—

127

Nanonets-OCR2-3B-6bit

mlx-community/Nanonets-OCR2-3B-6bit This model was converted to MLX format from [`nanonets/Nanonets-OCR2-3B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

—

127

DeepSeek-v3-0324-8bit

NaNK

license:mit

126

Ring-1T-mlx-DQ3_K_M

This model mlx-community/Ring-1T-mlx-DQ3KM was converted to MLX format from inclusionAI/Ring-1T using mlx-lm version 0.28.1. This is created for people using a single Apple Mac Studio M3 Ultra with 512 GB. The 4-bit version of Ring 1T does not fit. Using research results, we aim to get 4-bit performance from a slightly smaller and smarter quantization. It should also not be so large that it leaves no memory for a useful context window. In the Arxiv paper Quantitative Analysis of Performance Drop in DeepSeek Model Quantization the authors write, > We further propose `DQ3KM`, a dynamic 3-bit quantization method that significantly outperforms traditional `Q3KM` variant on various benchmarks, which is also comparable with 4-bit quantization (`Q4KM`) approach in most tasks. > dynamic 3-bit quantization method (`DQ3KM`) that outperforms the 3-bit quantization implementation in `llama.cpp` and achieves performance comparable to 4-bit quantization across multiple benchmarks. The resulting multi-bitwidth quantization has been well tested and documented. In the `convert.py` file of mlx-lm on your system ( you can see the original code here ), replace the code inside `def mixedquantpredicate()` with something like

NaNK

—

126

olmOCR-2-7B-1025-5bit

mlx-community/olmOCR-2-7B-1025-5bit This model was converted to MLX format from [`allenai/olmOCR-2-7B-1025`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

126

DeepSeek-R1-Distill-Qwen-7B-8bit

NaNK

—

125

plamo-2-1b

NaNK

license:apache-2.0

125

Llama-3.2-3B-Instruct-abliterated-6bit

NaNK

llama

125

embeddinggemma-300m-qat-q8_0-unquantized-bf16

mlx-community/embeddinggemma-300m-qat-q80-unquantized-bf16 The Model mlx-community/embeddinggemma-300m-qat-q80-unquantized-bf16 was converted to MLX format from google/embeddinggemma-300m-qat-q80-unquantized using mlx-lm version 0.0.4.

—

125

Qwen3-4B-Instruct-2507-8bit

This model mlx-community/Qwen3-4B-Instruct-2507-8bit was converted to MLX format from Qwen/Qwen3-4B-Instruct-2507 using mlx-lm version 0.26.2.

NaNK

license:apache-2.0

124

Llama-3.3-70B-Instruct-bf16

NaNK

llama

124

Qwen3-VL-32B-Instruct-bf16

mlx-community/Qwen3-VL-32B-Instruct-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-32B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

124

codegemma-7b-it-8bit

NaNK

—

123

Llama-3.1-8B-Instruct-4bit

NaNK

llama

123

Qwen3-Next-80B-A3B-Instruct-5bit

NaNK

license:apache-2.0

123

granite-3.3-8b-instruct-4bit

NaNK

license:apache-2.0

122

Qwen3-8B-4bit-DWQ-053125

NaNK

license:apache-2.0

122

c4ai-command-r-plus-4bit

NaNK

license:cc-by-nc-4.0

121

Qwen2.5-72B-Instruct-4bit

NaNK

—

121

gemma-3-27b-it-4bit-DWQ

NaNK

—

121

dolphin-2.9-llama3-70b-4bit

NaNK

llama

120

Mistral-Small-24B-Instruct-2501-4bit

NaNK

license:apache-2.0

119

llava-v1.6-mistral-7b-4bit

NaNK

license:apache-2.0

119

gemma-3-1b-it-bf16

NaNK

—

119

dac-speech-24khz-1.5kbps

NaNK

—

119

Llama-OuteTTS-1.0-1B-4bit

NaNK

llama

119

LongCat-Flash-Chat-4bit

NaNK

license:mit

119

granite-4.0-h-1b-base-8bit

This model mlx-community/granite-4.0-h-1b-base-8bit was converted to MLX format from ibm-granite/granite-4.0-h-1b-base using mlx-lm version 0.28.4.

NaNK

license:apache-2.0

119

Llama-3.3-70B-Instruct-3bit

NaNK

llama

117

deepseek-coder-33b-instruct

NaNK

llama

117

bitnet-b1.58-2B-4T-4bit

NaNK

license:mit

116

Kimi-Linear-48B-A3B-Instruct-5bit

This model mlx-community/Kimi-Linear-48B-A3B-Instruct-5bit was converted to MLX format from moonshotai/Kimi-Linear-48B-A3B-Instruct using mlx-lm version 0.28.4.

NaNK

license:mit

116

MinerU2.5-2509-1.2B-bf16

mlx-community/MinerU2.5-2509-1.2B-bf16 This model was converted to MLX format from [`opendatalab/MinerU2.5-2509-1.2B`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:agpl-3.0

115

Mistral-Small-3.1-24B-Instruct-2503-4bit

NaNK

license:apache-2.0

114

Mixtral-8x7B-Instruct-v0.1-hf-4bit-mlx

NaNK

license:apache-2.0

114

Llama-3.1-Nemotron-Nano-4B-v1.1-4bit

NaNK

llama

114

Apriel-1.5-15b-Thinker-bf16

mlx-community/Apriel-1.5-15b-Thinker-bf16 This model was converted to MLX format from [`ServiceNow-AI/Apriel-1.5-15b-Thinker`]() using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:mit

114

Qwen3-30B-A3B-Thinking-2507-4bit

This model mlx-community/Qwen3-30B-A3B-Thinking-2507-4bit was converted to MLX format from Qwen/Qwen3-30B-A3B-Thinking-2507 using mlx-lm version 0.26.3.

NaNK

license:apache-2.0

113

LFM2-8B-A1B-fp16

NaNK

—

113

Qwen2.5-VL-32B-Instruct-4bit

NaNK

license:apache-2.0

112

Qwen3-14B-4bit-DWQ-053125

NaNK

license:apache-2.0

112

meta-llama-Llama-4-Scout-17B-16E-fp16

NaNK

llama4

112

gemma-3-4b-it-bf16

NaNK

—

112

deepseek-coder-6.7b-instruct-hf-4bit-mlx

NaNK

llama

112

gemma-3-1b-it-4bit-DWQ

NaNK

—

112

gemma-3n-E4B-it-bf16

NaNK

—

111

LongCat-Flash-Chat-mlx-DQ6_K_M

NaNK

—

111

gemma-3-270m-it-bf16

—

111

whisper-medium-mlx-4bit

NaNK

—

111

Qwen3-14B-6bit

NaNK

license:apache-2.0

111

gpt2-base-mlx

NaNK

license:mit

111

LFM2-VL-450M-8bit

NaNK

—

110

starcoder2-7b-4bit

NaNK

—

110

Ling-mini-2.0-4bit

This model mlx-community/Ling-mini-2.0-4bit was converted to MLX format from inclusionAI/Ling-mini-2.0 using mlx-lm version 0.27.1.

NaNK

license:mit

110

LLaDA2.0-mini-preview-4bit

This model mlx-community/LLaDA2.0-mini-preview-4bit was converted to MLX format from inclusionAI/LLaDA2.0-mini-preview using mlx-lm version 0.28.4.

NaNK

license:apache-2.0

110

Qwen3-4B-4bit-DWQ-053125

NaNK

license:apache-2.0

109

Dolphin-Mistral-24B-Venice-Edition-4bit

NaNK

license:apache-2.0

109

Llama-3-8B-Instruct-1048k-8bit

NaNK

llama

108

conikeec-deepseek-coder-6.7b-instruct

NaNK

llama

108

Josiefied-DeepSeek-R1-0528-Qwen3-8B-abliterated-v1-4bit

NaNK

—

108

Apertus-8B-Instruct-2509-8bit

NaNK

license:apache-2.0

108

Gemma-3-Glitter-12B-8bit

NaNK

—

108

gemma-3-12b-it-4bit-DWQ

NaNK

—

107

Gabliterated-Qwen3-0.6B-4bit

NaNK

license:apache-2.0

107

gemma-3-270m-4bit

NaNK

—

107

Qwen3-VL-2B-Thinking-bf16

mlx-community/Qwen3-VL-2B-Thinking-bf16 This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

107

gemma-2-27b-bf16

NaNK

—

106

Qwen3-VL-4B-Instruct-6bit

mlx-community/Qwen3-VL-4B-Instruct-6bit This model was converted to MLX format from [`Qwen/Qwen3-VL-4B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

106

Mistral-7B-Instruct-v0.3-8bit

NaNK

license:apache-2.0

105

LFM2-VL-3B-8bit

NaNK

—

105

nomicai-modernbert-embed-base-bf16

license:apache-2.0

105

bitnet-b1.58-2B-4T-8bit

NaNK

license:mit

105

Qwen3-Coder-30B-A3B-Instruct-bf16

This model mlx-community/Qwen3-Coder-30B-A3B-Instruct-bf16 was converted to MLX format from Qwen/Qwen3-Coder-30B-A3B-Instruct using mlx-lm version 0.26.2.

NaNK

license:apache-2.0

105

LFM2-8B-A1B-6bit

This model mlx-community/LFM2-8B-A1B-6bit was converted to MLX format from LiquidAI/LFM2-8B-A1B using mlx-lm version 0.28.2.

NaNK

—

105

gemma-3n-E4B-it-lm-bf16

NaNK

—

103

Qwen2.5-Coder-1.5B-4bit

NaNK

license:apache-2.0

103

gemma-3-270m-it-qat-4bit

This model mlx-community/gemma-3-270m-it-qat-4bit was converted to MLX format from google/gemma-3-270m-it-qat using mlx-lm version 0.26.3.

NaNK

—

103

DeepSeek-R1-Distill-Qwen-1.5B-6bit

NaNK

—

103

medgemma-27b-it-8bit

NaNK

—

103

gemma-3-27b-it-bf16

NaNK

—

102

orpheus-3b-0.1-ft-4bit

NaNK

llama

102

meta-llama-Llama-4-Scout-17B-16E-Instruct-bf16

NaNK

llama4

102

c4ai-command-r-v01-4bit

NaNK

—

101

Llama-3.2-8X4B-MOE-V2-Dark-Champion-Instruct-uncensored-abliterated-21B-MLX

NaNK

Llama 3.2

101

Qwen3-1.7B-4bit-DWQ-053125

NaNK

license:apache-2.0

100

Qwen3-4B-Instruct-2507-5bit

NaNK

license:apache-2.0

100

LFM2-8B-A1B-6bit-MLX

Maintainer / Publisher: Susant Achary Upstream model: LiquidAI/LFM2-8B-A1B This repo (MLX 6-bit): `mlx-community/LFM2-8B-A1B-6bit-MLX` This repository provides an Apple-Silicon-optimized MLX build of LFM2-8B-A1B at 6-bit quantization. Among quantized tiers, 6-bit is a strong fidelity sweet-spot for many Macs—noticeably smaller than FP16/8-bit while preserving answer quality for instruction following, summarization, and structured extraction. - Architecture: Mixture-of-Experts (MoE) Transformer. - Size: ~8B total parameters with ~1B active per token (A1B ≈ “~1B active”). - Why MoE? At each token, a subset of experts is activated, reducing compute per token while keeping a larger parameter pool for expressivity. > Single-device memory reality: Even though only ~1B are active per token, all experts typically reside in memory during inference on one device. That means RAM planning should track total parameters, not just the active slice. - `config.json` (MLX), `mlxmodel.safetensors` (6-bit shards) - Tokenizer files: `tokenizer.json`, `tokenizerconfig.json` - Model metadata (e.g., `modelindex.json`) Target: macOS on Apple Silicon (M-series) with Metal/MPS. - General instruction following, chat, and summarization - RAG and long-context assistants on device - Schema-guided structured outputs (JSON) - Quantization can cause small regressions vs FP16 on tricky math/code or tight formatting. - For very long contexts and/or batching, the KV-cache can dominate memory—tune `maxtokens` and batch size. - Add your own safety/guardrails for sensitive deployments. You asked to assume and decide realistic ranges. The following are practical starting points for a single-device MLX run; validate on your hardware. Rule-of-thumb components - Weights (6-bit): ≈ `totalparams × 0.75 byte` → for 8B params ≈ ~6.0 GB

NaNK

—

100

Josiefied-Qwen2.5-7B-Instruct-abliterated-v2

NaNK

license:apache-2.0

deepseek-coder-1.3b-instruct-mlx

NaNK

llama

Qwen2.5-Coder-32B-Instruct-8bit

NaNK

license:apache-2.0

Qwen2.5-VL-3B-Instruct-bf16

NaNK

—

gemma-3-4b-it-4bit-DWQ

NaNK

—

Qwen3-1.7B-8bit

NaNK

license:apache-2.0

Huihui-gemma-3n-E4B-it-abliterated-lm-6bit

mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-6bit The Model mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-6bit was converted to MLX format from huihui-ai/Huihui-gemma-3n-E4B-it-abliterated using mlx-lm version 0.26.4.

NaNK

—

Qwen3-VL-2B-Instruct-8bit

mlx-community/Qwen3-VL-2B-Instruct-8bit This model was converted to MLX format from [`Qwen/Qwen3-VL-2B-Instruct`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

GLM-4-32B-0414-4bit-DWQ

NaNK

license:mit

granite-4.0-h-tiny-5bit-MLX

NaNK

license:apache-2.0

Josiefied-Qwen3-30B-A3B-abliterated-v2-8bit

NaNK

—

Huihui-gemma-3n-E4B-it-abliterated-lm-4bit

mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-4bit The Model mlx-community/Huihui-gemma-3n-E4B-it-abliterated-lm-4bit was converted to MLX format from huihui-ai/Huihui-gemma-3n-E4B-it-abliterated using mlx-lm version 0.26.4.

NaNK

—

CodeLlama-7b-mlx

NaNK

llama

Qwen3-0.6B-4bit-AWQ

NaNK

license:apache-2.0

Josiefied-Qwen3-8B-abliterated-v1-8bit

NaNK

—

Llama-4-Scout-17B-16E-8bit

NaNK

llama4

Qwen3-4B-Instruct-2507-4bit-g32

This model mlx-community/Qwen3-4B-Instruct-2507-4bit-g32 was converted to MLX format from Qwen/Qwen3-4B-Instruct-2507 using mlx-lm version 0.28.2.

NaNK

license:apache-2.0

Qwen3-VL-8B-Thinking-5bit

mlx-community/Qwen3-VL-8B-Thinking-5bit This model was converted to MLX format from [`Qwen/Qwen3-VL-8B-Thinking`]() using mlx-vlm version 0.3.4. Refer to the original model card for more details on the model. Use with mlx

NaNK

license:apache-2.0

Qwen2.5-VL-3B-Instruct-8bit

NaNK

—

Qwen2.5-Coder-0.5B-Instruct-4bit

NaNK

license:apache-2.0

Llama-4-Maverick-17B-128E-Instruct-6bit

NaNK

llama4

Llama-160M-Chat-v1-4bit-mlx

NaNK

llama

olmOCR-7B-0225-preview-bf16

NaNK

license:apache-2.0

dolphin-2.9-llama3-8b-4bit-mlx

NaNK

llama

mlx-community

gpt-oss-20b-MXFP4-Q8

parakeet-tdt-0.6b-v3

parakeet-tdt-0.6b-v2

whisper-small-mlx

gemma-3-12b-it-qat-4bit

gemma-3-27b-it-qat-4bit

gemma-3-4b-it-qat-4bit

Qwen3-30B-A3B-Instruct-2507-4bit

gemma-3-1b-it-qat-4bit

Llama-3.2-1B-Instruct-4bit

Llama-3.2-3B-Instruct-4bit

gemma-2-2b-it-4bit

whisper-large-v3-mlx

Qwen3-1.7B-4bit

Meta-Llama-3.1-8B-Instruct-4bit

Qwen3-VL-4B-Instruct-4bit

DeepSeek-OCR-8bit

Qwen3-Embedding-0.6B-4bit-DWQ

Mistral-7B-Instruct-v0.3-4bit

gemma-3-1b-it-4bit

gemma-3n-E4B-it-lm-4bit

gemma-3n-E2B-it-lm-4bit

Kokoro-82M-bf16

Qwen3-4B-4bit

Qwen3-0.6B-4bit

MiniMax-M2-4bit

Qwen2-VL-2B-Instruct-4bit

whisper-large-v3-turbo

Qwen2.5-3B-Instruct-4bit

bge-small-en-v1.5-bf16

Qwen3-VL-2B-Instruct-4bit

Qwen3-235B-A22B-4bit

Dolphin3.0-Llama3.1-8B-4bit

DeepSeek-OCR-4bit

SmolLM3-3B-4bit

Qwen2.5-0.5B-Instruct-4bit

Qwen3-235B-A22B-8bit

Qwen1.5-0.5B-Chat-4bit

DeepSeek-R1-Distill-Qwen-1.5B-8bit

Qwen3-4B-Thinking-2507-4bit

GLM-4.6-4bit

DeepSeek-OCR-5bit

Qwen2.5-1.5B-Instruct-4bit

MiniMax-M2-3bit

DeepSeek-R1-Distill-Qwen-1.5B-4bit

Kimi-Linear-48B-A3B-Instruct-4bit

whisper-medium-mlx

Qwen3-4B-Instruct-2507-4bit

granite-4.0-h-micro-4bit

MiniMax-M2-8bit

Qwen3-8B-4bit

Llama-3.2-3B-Instruct-8bit

Meta-Llama-3-8B-Instruct-4bit

SmolVLM2-500M-Video-Instruct-mlx

Qwen3-Embedding-4B-4bit-DWQ

Josiefied-Qwen3-4B-abliterated-v1-4bit

Qwen2.5-7B-Instruct-4bit

Phi-4-mini-instruct-4bit

phi-2

MiniMax-M2-mlx-8bit-gs32

granite-3.3-2b-instruct-4bit

Llama-3.3-70B-Instruct-4bit

GLM-4.6-mlx-8bit-gs32

Qwen3-VL-235B-A22B-Instruct-3bit

Llama-2-7b-chat-mlx

granite-4.0-h-tiny-4bit

DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx

Qwen2.5-VL-3B-Instruct-4bit

gemma-3-270m-it-8bit

whisper-base-mlx

Qwen3-Embedding-8B-4bit-DWQ

parakeet-rnnt-0.6b

gpt-oss-120b-MXFP4-Q4

Phi-3.5-mini-instruct-4bit

LFM2-2.6B-4bit

LFM2-8B-A1B-4bit

gpt-oss-20b-MXFP4-Q4

GLM-4.6-bf16

Qwen3-0.6B-8bit