llmat

22 models • 1 total models in database

Sort by:

Qwen3-30B-A3B-Instruct-2507-NVFP4

NVFP4-quantized version of `Qwen/Qwen3-30B-A3B-Instruct-2507` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.

license:apache-2.0

Qwen3-4B-Instruct-2507-NVFP4

NVFP4-quantized version of `Qwen/Qwen3-4B-Instruct-2507` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.

license:apache-2.0

Apertus-8B-Instruct-2509-NVFP4

license:apache-2.0

Mistral-Small-24B-Instruct-2501-NVFP4

NVFP4-quantized version of `mistralai/Mistral-Small-24B-Instruct-2501` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM aslo supports OpenAI-compatible serving. See the documentation for more details.

license:apache-2.0

Qwen3-8B-NVFP4

NVFP4-quantized version of `Qwen/Qwen3-8B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048

license:apache-2.0

Qwen3-0.6B-NVFP4

NVFP4-quantized version of `Qwen/Qwen3-0.6B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.

license:apache-2.0

Qwen3-4B-NVFP4

NVFP4-quantized version of `Qwen/Qwen3-4B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.

license:apache-2.0

Mistral-Small-Instruct-2409-NVFP4

NVFP4-quantized version of `mistralai/Mistral-Small-Instruct-2409` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.

license:apache-2.0

Qwen3-30B-A3B-NVFP4

NVFP4-quantized version of `Qwen/Qwen3-30B-A3B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.

license:apache-2.0

Mistral-v0.3-7B-ORPO_Q4_K_M-GGUF

Qwen3-32B-NVFP4

license:apache-2.0

Qwen3-14B-NVFP4

license:apache-2.0

Qwen3-1.7B-NVFP4

NVFP4-quantized version of `Qwen/Qwen3-1.7B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.

license:apache-2.0

nanoVLM

nanoVLM is a minimal and lightweight Vision-Language Model (VLM) designed for efficient training and experimentation. Built using pure PyTorch, the entire model architecture and training logic fits within ~750 lines of code. It combines a ViT-based image encoder (SigLIP-B/16-224-85M) with a lightweight causal language model (SmolLM2-135M), resulting in a compact 222M parameter model. For more information, check out the base model on https://huggingface.co/lusxvr/nanoVLM-222M. Clone the nanoVLM repository: https://github.com/huggingface/nanoVLM. Follow the install instructions and run the following code:

Mistral-7B-Instruct-v0.3-NVFP4

NVFP4-quantized version of `mistralai/Mistral-7B-Instruct-v0.3` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.

license:apache-2.0

Mistral-v0.3-7B-ORPO

- Developed by: llmat - License: apache-2.0 - Finetuned from model : unsloth/mistral-7b-v0.3-bnb-4bit This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0

TinyLlama_v1.1-ORPO

Mistral-v0.3-7B-ORPO_q8_0-GGUF

germanBert-invoices-v1

license:apache-2.0

TinyLlama_v1.1-SFT

donut-invoices-v1

Moellmat-4x7b