inferencerlabs

79 models • 4 total models in database
Sort by:

GLM-5.1-MLX-4.8bit

NaNK
6,273
3

GLM-5-MLX-4.8bit

NaNK
5,363
9

GLM-5.1-MLX-2.5bit-INF

NaNK
3,049
0

GLM-5-MLX-5.6bit-INF

NaNK
2,798
4

NVIDIA-Nemotron-3-Super-120B-A12B-MLX-9bit

NaNK
1,098
3

DeepSeek-V3.2-MLX-5.5bit

NaNK
974
2

Qwen3.5-397B-A17B-MLX-9bit

NaNK
897
4

MiniMax-M2.7-MLX-9bit

NaNK
848
0

NVIDIA-Nemotron-3-Super-120B-A12B-MLX-4.5bit

NaNK
847
3

gemma-4-31B-MLX-9bit

NaNK
785
1

GLM-5.1-MLX-4.8bit-INF

NaNK
773
0

Mistral-Small-4-119B-2603-MLX-4.5bit

NaNK
732
0

Kimi-K2-Instruct-MLX-3.9bit

NaNK
656
9

openai-gpt-oss-120b-MLX-6.5bit

See gpt-oss-120b 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8. | Quantization | Perplexity | |:------------:|:----------:| | q2 | 41.293 | | q3 | 1.900 | | q4 | 1.168 | | q6 | 1.128 | | q8 | 1.128 | Tested to run with Inferencer app Memory usage: ~95 GB (down from ~251GB required by native MXFP4 format) Expect ~60 tokens/s Quantized with a modified version of MLX 0.26 For more details see demonstration video or visit OpenAI gpt-oss-20b. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK
license:apache-2.0
589
2

Kimi-K2-Instruct-MLX-3.985bit

NaNK
580
7

openai-gpt-oss-20b-MLX-6.5bit

See gpt-oss-20b 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8. | Quantization | Perplexity | |:------------:|:----------:| | q2 | 41.293 | | q3 | 1.900 | | q4 | 1.168 | | q6 | 1.128 | | q8 | 1.128 | Tested to run with Inferencer app Memory usage: ~17 GB (down from ~46GB required by native MXFP4 format) Expect ~100 tokens/s Quantized with a modified version of MLX 0.26 For more details see demonstration video or visit OpenAI gpt-oss-20b. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK
license:apache-2.0
559
2

gemma-4-E4B-MLX-9bit

NaNK
544
0

gemma-4-26B-A4B-MLX-9bit

NaNK
515
1

Qwen3.5-122B-A10B-MLX-9bit

NaNK
514
0

Qwen3.5-122B-A10B-MLX-6.5bit

NaNK
484
0

gemma-4-E2B-MLX-9bit

NaNK
453
0

GLM-4.7-Flash-MLX-6.5bit

NaNK
434
2

DeepSeek-V3.2-Speciale-MLX-4.8bit

NaNK
420
0

Qwen3-Coder-480B-A35B-Instruct-MLX-8.5bit

NaNK
400
2

Kimi-K2-Instruct-0905-MLX-3.825bit

NaNK
375
2

Mistral-Small-4-119B-2603-MLX-9bit

NaNK
371
1

DeepSeek-V3.2-Speciale-MLX-5.5bit

NaNK
354
1

sarvamai-105b-MLX-10bit

NaNK
353
1

DeepSeek-V3.2-MLX-4.8bit

NaNK
322
0

deepseek-v3.1-MLX-5.5bit

See DeepSeek-V3.1 5.5bit MLX in action - demonstration video q5.5bit quant typically achieves 1.141 perplexity in our testing | Quantization | Perplexity | |:------------:|:----------:| | q2.5 | 41.293 | | q3.5 | 1.900 | | q4.5 | 1.168 | | q5.5 | 1.141 | | q6.5 | 1.128 | | q8.5 | 1.128 | Runs on a single M3 Ultra 512GB RAM using Inferencer app Memory usage: ~480 GB Expect ~13-19 tokens/s Quantized with a modified version of MLX 0.26 For more details see demonstration video or visit DeepSeek-V3.1. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK
license:mit
321
5

Devstral-Small-2-24B-Instruct-2512-MLX-6.5bit

NaNK
319
0

Kimi-K2-Thinking-MLX-4.25bit

See Kimi-K2-Thinking 4.25bit MLX in action - demonstration video q4.25bit quant perplexity TBA, but q4.5bit quant typically achieves 1.168 perplexity in our testing | Quantization | Perplexity | |:------------:|:----------:| | q2.5 | 41.293 | | q3.5 | 1.900 | | q3.95 | 1.243 | | q4.25 | TBA | | q4.5 | 1.168 | | q6.5 | 1.128 | | q8.5 | 1.128 | Tested on a M3 Ultra 512GB RAM connected to MBP 128GB RAM using Inferencer app v1.6 with distributed compute For more information on the distributed compute feature see: github.com/inferencer/issues/31 Memory usage: MBP ~80GB + Mac Studio ~450GB Expect ~22 tokens/s @ 1000 tokens Quantized with a modified version of MLX 0.28 For more details see demonstration video or visit Kimi-K2-Thinking. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK
314
3

deepseek-v3.1-Terminus-MLX-5.5bit

NaNK
license:mit
296
0

Devstral-2-123B-Instruct-2512-MLX-6.5bit

NaNK
288
0

GLM-4.7-Flash-MLX-5.5bit

NaNK
287
1

sarvamai-30b-MLX-10bit

NaNK
276
0

Qwen3-Coder-480B-A35B-Instruct-MLX-6.5bit

See Qwen3-Coder-480B-A35B-Instruct 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8. | Quantization | Perplexity | |:------------:|:----------:| | q2 | 41.293 | | q3 | 1.900 | | q4 | 1.168 | | q6 | 1.128 | | q8 | 1.128 | Tested to run with Inferencer app Memory usage: ~365 GB Expect ~19 tokens/s Quantized with a modified version of MLX 0.26 For more details see demonstration video or visit Qwen3-Coder. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK
license:apache-2.0
273
2

Qwen3.5-35B-A3B-MLX-9bit

NaNK
241
0

Kimi-K2-Instruct-0905-MLX-3.8bit

NaNK
234
2

Kimi-K2-Thinking-MLX-3.8bit

NaNK
217
1

Kimi-K2-Instruct-0905-MLX-3.824bit

NaNK
198
2

Qwen3-Coder-30B-A3B-Instruct-MLX-6.5bit

NaNK
license:apache-2.0
191
0

Kimi-K2.5-MLX-4.2bit

NaNK
174
0

LongCat-Flash-Thinking-2601-MLX-6.5bit

NaNK
167
2

MiniMax-M2-MLX-6.5bit

See MiniMax-M2 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8. | Quantization | Perplexity | |:------------:|:----------:| | q2.5 | 41.293 | | q3.5 | 1.900 | | q4.5 | 1.168 | | q5.5 | 1.141 | | q6.5 | 1.128 | | q8.5 | 1.128 | Tested on a MacBook Pro connecting to a M3 Ultra 512GB RAM over the internet using Inferencer app v1.5.4 Memory usage: ~175 GB Expect 42 tokens/s for small contexts (200 tokens) down to 12 token/s for large (6800 tokens) Note: Performance has been improved since original tests by 16.7% see: github.com/inferencer/issues/46 Quantized with a modified version of MLX 0.28 For more details see demonstration video or visit MiniMax-M2. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK
license:mit
156
2

GLM-4.7-MLX-9bit

NaNK
154
0

GLM-4.6-MLX-6.5bit

See GLM-4.6 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves the highest perplexity in our testing | Quantization | Perplexity | |:------------:|:----------:| | q2.5 | 41.293 | | q3.5 | 1.900 | | q4.5 | 1.168 | | q5.5 | 1.141 | | q6.5 | 1.128 | | q8.5 | 1.128 | Runs on a single M3 Ultra 512GB RAM using Inferencer app Memory usage: ~270 GB Expect ~16 tokens/s Quantized with a modified version of MLX 0.27 For more details see demonstration video or visit GLM-4.6. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK
license:mit
137
9

MiMo-V2-Flash-MLX-6.5bit

NaNK
136
0

Qwen3.5-35B-A3B-MLX-5.5bit

NaNK
116
0

GLM-4.7-MLX-6.5bit

NaNK
104
1

MiniMax-M2.1-MLX-6.5bit

NaNK
101
1

INTELLECT-3.1-MLX-7bit

NaNK
98
1

Solar-Open-100B-MLX-6.5bit

NaNK
89
0

GLM-4.7-REAP-218B-A32B-MLX-6.5bit

NaNK
80
0

LongCat-Flash-Lite-MLX-5.5bit

NaNK
75
2

IQuest-Coder-V1-40B-Loop-Instruct-MLX-6.5bit

NaNK
75
0

K-EXAONE-236B-A23B-MLX-6.5bit

NaNK
72
0

Qwen3-235B-A22B-Instruct-2507-MLX-6.5bit

NaNK
license:apache-2.0
69
1

Ling-2.5-1T-MLX-3.7bit

NaNK
64
1

GLM-4.7-REAP-268B-A32B-MLX-6.5bit

NaNK
62
0

LongCat-Flash-Lite-MLX-9bit

NaNK
61
1

Ring-2.5-1T-MLX-3.7bit

NaNK
56
1

INTELLECT-3.1-MLX-5.5bit

NaNK
52
1

Olmo-3.1-32B-Instruct-q6-MLX-6.5bit

NaNK
49
0

IQuest-Coder-V1-40B-Instruct-MLX-6.5bit

NaNK
45
0

Qwen3-235B-A22B-Thinking-2507-MLX-6.5bit

NaNK
license:apache-2.0
30
0

Olmo-3.1-32B-Instruct-MLX-6.5bit

NaNK
29
1

LongCat-Flash-Thinking-2601-MLX-5.5bit

NaNK
25
0

Olmo-3.1-32B-Think-q6-MLX-6.5bit

NaNK
24
0

INTELLECT-3.1-MLX-9bit

NaNK
20
0

Olmo-3.1-32B-Think-MLX-6.5bit

NaNK
14
1

Qwen3.5-27B-MLX-7bit

NaNK
11
0

Kimi-K2.5-MLX-3.6bit

NaNK
0
5

Qwen3-Coder-Next-MLX-9bit

NaNK
0
2

Bonsai-8B-MLX-2.25bit

NaNK
0
1

Bonsai-4B-MLX-2.25bit

NaNK
0
1

Mistral-Small-4-119B-2603-MLX-LM-9bit

NaNK
0
1

Qwen3.5-27B-MLX-9bit

NaNK
0
1

Kimi-K2-Thinking-MLX-3.825bit

NaNK
0
1