ilintar

12 models • 1 total models in database

Sort by:

Qwen3-Next-80B-A3B-Instruct-GGUF

Preliminary quants for the model (Q2KS is an early quant and is not imatrixed, the rest are). IQ2XXS: Final estimate: PPL = 10.2483 +/- 0.38654 (I'd guess for the desperate)

NaNK

license:apache-2.0

2,435

Qwen3-Nemotron-32B-160k-GGUF

Yarn extension of context to 160k since 40k for coding these days is pretty obsolete. | Quant | Perplexity | |---------|---------------------------| | Q80 | PPL = 5.6355 +/- 0.13322 | | Q6KM | PPL = 5.6169 +/- 0.13250 | | Q5KM | PPL = 5.6270 +/- 0.13270 | | Q4KM | PPL = 5.6435 +/- 0.13298 | | IQ4NL | PPL = 5.6717 +/- 0.13443 | | IQ3XS | PPL = 5.8865 +/- 0.13868 |

NaNK

—

741

MiniMax-M2-GGUF

NaNK

license:mit

654

ERNIE-4.5-21B-A3B-PT-gguf

GGUFs for Ernie4.5 MoE 21B-A3B Quantized with imatrix from Bartowski's Qwen data (https://gist.github.com/bartowski1182/f003237f2e8612278a6d01622af1cb6f)

NaNK

license:apache-2.0

106

NVIDIA-Nemotron-Nano-9B-v2-GGUF

IMatrix GGUFs calibrated on https://huggingface.co/datasets/eaddario/imatrix-calibration/tree/main combinedallsmall set. Note: Due to the nonstandard tensor sizes, some quantization types do not make sense. For example, due to fallbacks IQ2M is just 300MB smaller than IQ4NL. Thus, I only upload the quantizations that actually made sense.

NaNK

—

Dhanishtha-2.0-preview-0825-Q3_K_M-GGUF

ilintar/Dhanishtha-2.0-preview-0825-Q3KM-GGUF This model was converted to GGUF format from `HelpingAI/Dhanishtha-2.0-preview-0825` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

NaNK

llama-cpp