llmat
Qwen3-30B-A3B-Instruct-2507-NVFP4
NVFP4-quantized version of `Qwen/Qwen3-30B-A3B-Instruct-2507` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.
Qwen3-4B-Instruct-2507-NVFP4
NVFP4-quantized version of `Qwen/Qwen3-4B-Instruct-2507` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.
Apertus-8B-Instruct-2509-NVFP4
Mistral-Small-24B-Instruct-2501-NVFP4
NVFP4-quantized version of `mistralai/Mistral-Small-24B-Instruct-2501` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM aslo supports OpenAI-compatible serving. See the documentation for more details.
Qwen3-8B-NVFP4
NVFP4-quantized version of `Qwen/Qwen3-8B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048
Qwen3-0.6B-NVFP4
NVFP4-quantized version of `Qwen/Qwen3-0.6B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.
Qwen3-4B-NVFP4
NVFP4-quantized version of `Qwen/Qwen3-4B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.
Mistral-Small-Instruct-2409-NVFP4
NVFP4-quantized version of `mistralai/Mistral-Small-Instruct-2409` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.
Qwen3-30B-A3B-NVFP4
NVFP4-quantized version of `Qwen/Qwen3-30B-A3B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.
Mistral-v0.3-7B-ORPO_Q4_K_M-GGUF
Qwen3-32B-NVFP4
Qwen3-14B-NVFP4
Qwen3-1.7B-NVFP4
NVFP4-quantized version of `Qwen/Qwen3-1.7B` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.
nanoVLM
nanoVLM is a minimal and lightweight Vision-Language Model (VLM) designed for efficient training and experimentation. Built using pure PyTorch, the entire model architecture and training logic fits within ~750 lines of code. It combines a ViT-based image encoder (SigLIP-B/16-224-85M) with a lightweight causal language model (SmolLM2-135M), resulting in a compact 222M parameter model. For more information, check out the base model on https://huggingface.co/lusxvr/nanoVLM-222M. Clone the nanoVLM repository: https://github.com/huggingface/nanoVLM. Follow the install instructions and run the following code:
Mistral-7B-Instruct-v0.3-NVFP4
NVFP4-quantized version of `mistralai/Mistral-7B-Instruct-v0.3` produced with llmcompressor. Notes - Quantization scheme: NVFP4 (linear layers, `lmhead` excluded) - Calibration samples: 512 - Max sequence length during calibration: 2048 This model can be deployed efficiently using the vLLM backend, as shown in the example below. vLLM also supports OpenAI-compatible serving. See the documentation for more details.
Mistral-v0.3-7B-ORPO
- Developed by: llmat - License: apache-2.0 - Finetuned from model : unsloth/mistral-7b-v0.3-bnb-4bit This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.