LLMYourWay
ModelsDevices
Edge AI
CompareInsights
Enterprise

jeffcookio

4 models • 1 total models in database
Sort by:

Mistral-Small-3.2-24B-Instruct-2506-awq-sym

Created with `llm-compressor`'s latest changes, quantized on a GH200, works well for me with vLLM's `main` branch on my RTX 3090Ti as of 2025-07-01. Per https://vllm-dev.slack.com/archives/C07QP347J4D/p1751401629797809?threadts=1751399869.254259&cid=C07QP347J4D, there is currently no way to get tool calling with Mistral-HF formatted models. I've worked around this on a GitHub branch here: https://github.com/sjuxax/vllm/tree/Mistral3.1-rebase . It includes code to remap the weights from HF-Mistral to Mistral, allowing use of `MistralTokenizer`. I've updated the `config.json` to be compatible with this approach, and I'm about to push the `tekken.json` tokenizer. With that, if you build that branch, you should be able to run this checkpoint with `MistralTokenizer` and get tool calling. Note: I spoke a little too soon on the above. We also needed https://github.com/vllm-project/vllm/pull/20503 to get tool calling to work properly. I've merged and pushed this to the Mistral3.1-rebase branch.

NaNK
—
4,328
9

granite-3.3-8b-instruct-gptqmodel-4b-64g

NaNK
—
52
1

Mistral-Small-3.1-24B-Instruct-2503-HF-gptqmodel-4b-128g

NaNK
—
9
2

gemma-3-12b-it-abliterated-gptqmodel-4b-128g

NaNK
—
7
1
LLMYourWay

The definitive AI model comparison platform. Compare 12K+ models, track performance, and discover the perfect AI solution for your needs.

Made with AI
Real-time Data

Product

  • Find Your Device
  • Browse Models
  • Compare AI
  • Benchmarks
  • Pricing
  • API Access

Resources

  • Blog & Articles
  • Methodology
  • Changelog
  • Trending
  • Use Cases

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Cookie Policy
  • Terms of Service
12K+12,000+
AI Models Tracked & Updated Daily
© 2026 LLMYourWay. All rights reserved.
Data updated every 4 hours
Powered by real-time AI data
API