ilintar
Qwen3-Next-80B-A3B-Instruct-GGUF
Preliminary quants for the model (Q2KS is an early quant and is not imatrixed, the rest are). IQ2XXS: Final estimate: PPL = 10.2483 +/- 0.38654 (I'd guess for the desperate)
Qwen3-Nemotron-32B-160k-GGUF
Yarn extension of context to 160k since 40k for coding these days is pretty obsolete. | Quant | Perplexity | |---------|---------------------------| | Q80 | PPL = 5.6355 +/- 0.13322 | | Q6KM | PPL = 5.6169 +/- 0.13250 | | Q5KM | PPL = 5.6270 +/- 0.13270 | | Q4KM | PPL = 5.6435 +/- 0.13298 | | IQ4NL | PPL = 5.6717 +/- 0.13443 | | IQ3XS | PPL = 5.8865 +/- 0.13868 |
MiniMax-M2-GGUF
ERNIE-4.5-21B-A3B-PT-gguf
GGUFs for Ernie4.5 MoE 21B-A3B Quantized with imatrix from Bartowski's Qwen data (https://gist.github.com/bartowski1182/f003237f2e8612278a6d01622af1cb6f)
NVIDIA-Nemotron-Nano-9B-v2-GGUF
IMatrix GGUFs calibrated on https://huggingface.co/datasets/eaddario/imatrix-calibration/tree/main combinedallsmall set. Note: Due to the nonstandard tensor sizes, some quantization types do not make sense. For example, due to fallbacks IQ2M is just 300MB smaller than IQ4NL. Thus, I only upload the quantizations that actually made sense.
Dhanishtha-2.0-preview-0825-Q3_K_M-GGUF
ilintar/Dhanishtha-2.0-preview-0825-Q3KM-GGUF This model was converted to GGUF format from `HelpingAI/Dhanishtha-2.0-preview-0825` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Apriel-Nemotron-15b-Thinker-iGGUF
THUDM_GLM-Z1-9B-0414_iGGUF
THUDM-GLM-4-32B-0414-IQ2_S.GGUF
IQ2S quant done off an imatrix from a Q4K quant because I can't run any higher on my potato PC. Use at your own risk.