inferencerlabs

79 models • 4 total models in database

Sort by:

openai-gpt-oss-120b-MLX-6.5bit

See gpt-oss-120b 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8. | Quantization | Perplexity | |:------------:|:----------:| | q2 | 41.293 | | q3 | 1.900 | | q4 | 1.168 | | q6 | 1.128 | | q8 | 1.128 | Tested to run with Inferencer app Memory usage: ~95 GB (down from ~251GB required by native MXFP4 format) Expect ~60 tokens/s Quantized with a modified version of MLX 0.26 For more details see demonstration video or visit OpenAI gpt-oss-20b. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK

license:apache-2.0

589

Kimi-K2-Instruct-MLX-3.985bit

NaNK

—

580

openai-gpt-oss-20b-MLX-6.5bit

See gpt-oss-20b 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8. | Quantization | Perplexity | |:------------:|:----------:| | q2 | 41.293 | | q3 | 1.900 | | q4 | 1.168 | | q6 | 1.128 | | q8 | 1.128 | Tested to run with Inferencer app Memory usage: ~17 GB (down from ~46GB required by native MXFP4 format) Expect ~100 tokens/s Quantized with a modified version of MLX 0.26 For more details see demonstration video or visit OpenAI gpt-oss-20b. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK

license:apache-2.0

559

gemma-4-E4B-MLX-9bit

NaNK

—

544

gemma-4-26B-A4B-MLX-9bit

NaNK

—

515

Qwen3.5-122B-A10B-MLX-9bit

NaNK

—

514

Qwen3.5-122B-A10B-MLX-6.5bit

NaNK

—

484

gemma-4-E2B-MLX-9bit

NaNK

—

453

GLM-4.7-Flash-MLX-6.5bit

NaNK

—

434

DeepSeek-V3.2-Speciale-MLX-4.8bit

NaNK

—

420

Qwen3-Coder-480B-A35B-Instruct-MLX-8.5bit

NaNK

—

400

Kimi-K2-Instruct-0905-MLX-3.825bit

NaNK

—

375

Mistral-Small-4-119B-2603-MLX-9bit

NaNK

—

371

DeepSeek-V3.2-Speciale-MLX-5.5bit

NaNK

—

354

sarvamai-105b-MLX-10bit

NaNK

—

353

DeepSeek-V3.2-MLX-4.8bit

NaNK

—

322

deepseek-v3.1-MLX-5.5bit

See DeepSeek-V3.1 5.5bit MLX in action - demonstration video q5.5bit quant typically achieves 1.141 perplexity in our testing | Quantization | Perplexity | |:------------:|:----------:| | q2.5 | 41.293 | | q3.5 | 1.900 | | q4.5 | 1.168 | | q5.5 | 1.141 | | q6.5 | 1.128 | | q8.5 | 1.128 | Runs on a single M3 Ultra 512GB RAM using Inferencer app Memory usage: ~480 GB Expect ~13-19 tokens/s Quantized with a modified version of MLX 0.26 For more details see demonstration video or visit DeepSeek-V3.1. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK

license:mit

321

Devstral-Small-2-24B-Instruct-2512-MLX-6.5bit

NaNK

—

319

Kimi-K2-Thinking-MLX-4.25bit

See Kimi-K2-Thinking 4.25bit MLX in action - demonstration video q4.25bit quant perplexity TBA, but q4.5bit quant typically achieves 1.168 perplexity in our testing | Quantization | Perplexity | |:------------:|:----------:| | q2.5 | 41.293 | | q3.5 | 1.900 | | q3.95 | 1.243 | | q4.25 | TBA | | q4.5 | 1.168 | | q6.5 | 1.128 | | q8.5 | 1.128 | Tested on a M3 Ultra 512GB RAM connected to MBP 128GB RAM using Inferencer app v1.6 with distributed compute For more information on the distributed compute feature see: github.com/inferencer/issues/31 Memory usage: MBP ~80GB + Mac Studio ~450GB Expect ~22 tokens/s @ 1000 tokens Quantized with a modified version of MLX 0.28 For more details see demonstration video or visit Kimi-K2-Thinking. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK

—

314

deepseek-v3.1-Terminus-MLX-5.5bit

NaNK

license:mit

296

Devstral-2-123B-Instruct-2512-MLX-6.5bit

NaNK

—

288

GLM-4.7-Flash-MLX-5.5bit

NaNK

—

287

sarvamai-30b-MLX-10bit

NaNK

—

276

Qwen3-Coder-480B-A35B-Instruct-MLX-6.5bit

See Qwen3-Coder-480B-A35B-Instruct 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8. | Quantization | Perplexity | |:------------:|:----------:| | q2 | 41.293 | | q3 | 1.900 | | q4 | 1.168 | | q6 | 1.128 | | q8 | 1.128 | Tested to run with Inferencer app Memory usage: ~365 GB Expect ~19 tokens/s Quantized with a modified version of MLX 0.26 For more details see demonstration video or visit Qwen3-Coder. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK

license:apache-2.0

273

Qwen3.5-35B-A3B-MLX-9bit

NaNK

—

241

Kimi-K2-Instruct-0905-MLX-3.8bit

NaNK

—

234

Kimi-K2-Thinking-MLX-3.8bit

NaNK

—

217

Kimi-K2-Instruct-0905-MLX-3.824bit

NaNK

—

198

Qwen3-Coder-30B-A3B-Instruct-MLX-6.5bit

NaNK

license:apache-2.0

191

Kimi-K2.5-MLX-4.2bit

NaNK

—

174

LongCat-Flash-Thinking-2601-MLX-6.5bit

NaNK

—

167

MiniMax-M2-MLX-6.5bit

See MiniMax-M2 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8. | Quantization | Perplexity | |:------------:|:----------:| | q2.5 | 41.293 | | q3.5 | 1.900 | | q4.5 | 1.168 | | q5.5 | 1.141 | | q6.5 | 1.128 | | q8.5 | 1.128 | Tested on a MacBook Pro connecting to a M3 Ultra 512GB RAM over the internet using Inferencer app v1.5.4 Memory usage: ~175 GB Expect 42 tokens/s for small contexts (200 tokens) down to 12 token/s for large (6800 tokens) Note: Performance has been improved since original tests by 16.7% see: github.com/inferencer/issues/46 Quantized with a modified version of MLX 0.28 For more details see demonstration video or visit MiniMax-M2. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK

license:mit

156

GLM-4.7-MLX-9bit

NaNK

—

154

GLM-4.6-MLX-6.5bit

See GLM-4.6 6.5bit MLX in action - demonstration video q6.5bit quant typically achieves the highest perplexity in our testing | Quantization | Perplexity | |:------------:|:----------:| | q2.5 | 41.293 | | q3.5 | 1.900 | | q4.5 | 1.168 | | q5.5 | 1.141 | | q6.5 | 1.128 | | q8.5 | 1.128 | Runs on a single M3 Ultra 512GB RAM using Inferencer app Memory usage: ~270 GB Expect ~16 tokens/s Quantized with a modified version of MLX 0.27 For more details see demonstration video or visit GLM-4.6. We are not the creator, originator, or owner of any model listed. Each model is created and provided by third parties. Models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions. We are not liable for any damages, losses, or issues arising from its use, including data loss or inaccuracies in AI-generated content.

NaNK

license:mit

137