jeffcookio

4 models • 1 total models in database

Sort by:

Mistral-Small-3.2-24B-Instruct-2506-awq-sym

Created with `llm-compressor`'s latest changes, quantized on a GH200, works well for me with vLLM's `main` branch on my RTX 3090Ti as of 2025-07-01. Per https://vllm-dev.slack.com/archives/C07QP347J4D/p1751401629797809?threadts=1751399869.254259&cid=C07QP347J4D, there is currently no way to get tool calling with Mistral-HF formatted models. I've worked around this on a GitHub branch here: https://github.com/sjuxax/vllm/tree/Mistral3.1-rebase . It includes code to remap the weights from HF-Mistral to Mistral, allowing use of `MistralTokenizer`. I've updated the `config.json` to be compatible with this approach, and I'm about to push the `tekken.json` tokenizer. With that, if you build that branch, you should be able to run this checkpoint with `MistralTokenizer` and get tool calling. Note: I spoke a little too soon on the above. We also needed https://github.com/vllm-project/vllm/pull/20503 to get tool calling to work properly. I've merged and pushed this to the Mistral3.1-rebase branch.

NaNK

—

4,328

jeffcookio

Mistral-Small-3.2-24B-Instruct-2506-awq-sym

granite-3.3-8b-instruct-gptqmodel-4b-64g

Mistral-Small-3.1-24B-Instruct-2503-HF-gptqmodel-4b-128g

gemma-3-12b-it-abliterated-gptqmodel-4b-128g