NexaAI

99 models • 8 total models in database

Sort by:

DeepSeek-OCR-GGUF

Qwen3-VL-8B-Thinking-GGUF

> [!NOTE] > Note currently only NexaSDK supports this model's GGUF. Run Qwen3-VL-8B-Thinking optimized for CPU/GPU with NexaSDK. 1. Install NexaSDK and create a free account at NexaSDK 2. Run the model locally with one line of code: Model Description Qwen3-VL-8B-Thinking is an 8-billion-parameter multimodal large language model from Alibaba Cloud’s Qwen team. As part of the Qwen3-VL (Vision-Language) family, it is designed for deep multimodal reasoning — combining visual understanding, long-context comprehension, and structured chain-of-thought generation across text, images, and videos. The Thinking variant focuses on advanced reasoning transparency and analytical precision. Compared to the Instruct version, it produces richer intermediate reasoning steps, enabling detailed explanation, planning, and multi-hop analysis across visual and textual inputs. Features - Deep Visual Reasoning: Interprets complex scenes, charts, and documents with multi-step logic. - Chain-of-Thought Generation: Produces structured reasoning traces for improved interpretability and insight. - Extended Context Handling: Maintains coherence across longer multimodal sequences. - Multilingual Competence: Understands and generates in multiple languages for global applicability. - High Accuracy at 8B Scale: Achieves strong benchmark performance in multimodal reasoning and analysis tasks. Use Cases - Research and analysis requiring visual reasoning transparency - Complex multimodal QA and scientific problem solving - Visual analytics and explanation generation - Advanced agent systems needing structured thought or planning steps - Educational tools requiring detailed, interpretable reasoning Inputs and Outputs Input: - Text, image(s), or multimodal combinations (including sequential frames or documents) - Optional context for multi-turn or multi-modal reasoning Output: - Structured reasoning outputs with intermediate steps - Detailed answers, explanations, or JSON-formatted reasoning traces License Refer to the official Qwen license for usage and redistribution details.

NexaAI

DeepSeek-OCR-GGUF

Qwen3-VL-8B-Thinking-GGUF

Qwen3-VL-4B-Thinking-GGUF

Qwen2-Audio-7B-GGUF

gemma-3n

Qwen3-VL-4B-Instruct-GGUF

OmniVLM-968M

Qwen3-0.6B-GGUF

Qwen3-VL-2B-Thinking-GGUF

Qwen3-4B-GGUF

OmniAudio-2.6B

gpt-oss-20b-GGUF

octo-net-gguf

Qwen3-VL-8B-Instruct-GGUF

Octopus-v2-gguf-awq

Qwen3-VL-2B-Instruct-GGUF

qwen3vl-30B-A3B-mlx

gemma-2-2b-it-GGUF

qwen2.5vl

Octopus-v2

Qwen2.5-Omni-3B-GGUF

octo-net

sdxl-turbo

whisper-large-v3-turbo-MLX

granite-4.0-micro-GGUF

DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant

DeepSeek-OCR-GGUF-CUDA

qwen3vl-4B-Instruct-4bit-mlx

octo-planner-gguf

qwen3vl-4B-Instruct-fp16-mlx

Qwen3-VL-4B-Instruct-NPU

jina-v2-rerank-npu

deepSeek-r1-distill-qwen-1.5B-intel-npu

OmniNeural-4B

qwen3vl-4B-Thinking-4bit-mlx

Llama3.2-3B-NPU-Turbo

Kokoro-82M-bf16-MLX

gpt-oss-20b-MLX-4bit

qwen3vl-4B-Thinking-fp16-mlx

DeepSeek-R1-Distill-Llama-8B-NexaQuant

embeddinggemma-300m-npu

qwen3vl-8B-Instruct-4bit-mlx

Prefect-illustrious-XL-v2.0p

Granite-4.0-h-350M-NPU

jan-v1-4B-npu

phi4-mini-npu-turbo

parakeet-tdt-0.6b-v3-npu

qwen3vl-8B-Instruct-fp16-mlx

llama3.2-1B-intel-npu

gemma-3n-E4B-it-4bit-MLX

deepSeek-r1-distill-qwen-7B-intel-npu

llama-3.1-8B-intel-npu

parakeet-tdt-0.6b-v2-MLX

LFM2.5-1.2B-GGUF

qwen3vl-8B-Thinking-4bit-mlx

gpt-oss-20b-MLX-8bit

llama3.2-3B-intel-npu

qwen3vl-8B-Thinking-fp16-mlx

Qwen3-4B-4bit-MLX

Granite-4-Micro-NPU

qwen3-4B-npu

phi3.5-mini-npu

sdxl-base

Qwen3-0.6B-ANE

Gemma3-1B-ANE

OmniNeural-4B-mobile

yolov12-npu

Qwen3-1.7B-4bit-MLX

SmolVLM-500M-Instruct-8bit-MLX

Pyannote-NPU

gemma-3-4b-it-8bit-MLX

LFM2-1.2B-npu

jina-v2-fp16-mlx

parakeet-tdt-0.6b-v3-ane

gemma-3n-E2B-it-4bit-MLX

Qwen3-0.6B-bf16-MLX

Qwen2.5-VL-7B-Instruct-4bit-MLX

SmolVLM-Instruct-8bit-MLX

paddleocr-npu