second-state
Gemma-2b-it-GGUF
stable-diffusion-v1-5-GGUF
StarCoder2-15B-GGUF
stable-diffusion-3.5-large-GGUF
stable-diffusion-3.5-medium-GGUF
All-MiniLM-L6-v2-Embedding-GGUF
StarCoder2-7B-GGUF
Llava-v1.5-7B-GGUF
3dAnimationDiffusion_v10-GGUF
StarCoder2-3B-GGUF
MiniCPM-V-4_5-GGUF
FinGPT-MT-Llama-3-8B-LoRA-GGUF
Mistral-Nemo-Instruct-2407-GGUF
FLUX.1-dev-GGUF
Deepseek-Coder-6.7B-Instruct-GGUF
embeddinggemma-300m-GGUF
E5 Mistral 7B Instruct Embedding GGUF
DeepSeek-Coder-V2-Lite-Instruct-GGUF
DeepSeek-V2-Lite-Chat-GGUF
Llama-3.2-1B-Instruct-GGUF
C4AI-Command-R-v01-GGUF
FLUX.1-schnell-GGUF
dolphin-2.6-mistral-7B-GGUF
Wizard-Vicuna-13B-Uncensored-GGUF
Nomic-embed-text-v1.5-Embedding-GGUF
SmolVLM2-2.2B-Instruct-GGUF
Gemma-7b-it-GGUF
gemma-3-4b-it-GGUF
CodeQwen1.5-7B-Chat-GGUF
Phi-3-mini-4k-instruct-GGUF
Llama-2-13B-Chat-GGUF
Llama-4-Scout-17B-16E-Instruct-GGUF
Mistral-7B-Instruct-v0.3-GGUF
Mistral-7B-Instruct-v0.2-GGUF
Meta-Llama-3.1-8B-Instruct-GGUF
Llama-3.2-3B-Instruct-GGUF
Meta-Llama-3.1-70B-Instruct-GGUF
Deepseek-LLM-7B-Chat-GGUF
moxin-instruct-7b-GGUF
CodeLlama-13B-Instruct-GGUF
jina-embeddings-v3-GGUF
- Embedding size: `32, 64, 128, 256, 512, 768, 1024`
Realistic_Vision_V6.0_B1-GGUF
Qwen2.5-0.5B-Instruct-GGUF
Nemotron-Mini-4B-Instruct-GGUF
Dolphin-2.7-mixtral-8x7b-GGUF
Qwen2-VL-7B-Instruct-GGUF
Yi-1.5-9B-Chat-GGUF
Llava-v1.6-Vicuna-7B-GGUF
Mistral-Large-Instruct-2407-GGUF
Falcon3-1B-Instruct-GGUF
gemma-3n-E4B-it-GGUF
Phi-4-mini-instruct-GGUF
Qwen2.5-Coder-32B-Instruct-GGUF
stable-diffusion-2-1-GGUF
Orca-2-13B-GGUF
gte-Qwen2-1.5B-instruct-GGUF
nomic-embed-text-v1.5-GGUF
gemma-3n-E2B-it-GGUF
stablelm-2-zephyr-1.6b-GGUF
medgemma-4b-it-GGUF
stable-diffusion-v-1-4-GGUF
Qwen1.5-0.5B-Chat-GGUF
Llama3-8B-Chinese-Chat-GGUF
Qwen3-4B-GGUF
Qwen2-Math-72B-Instruct-GGUF
Qwen1.5-14B-Chat-GGUF
Neural-Chat-7B-v3-1-GGUF
Llama-3-8B-Japanese-Instruct-GGUF
Llama-3-Taiwan-8B-Instruct-GGUF
ELYZA-japanese-Llama-2-13b-fast-instruct-GGUF
Yi-1.5-9B-Chat-16K-GGUF
Qwen2-7B-Instruct-GGUF
TinyLlama-1.1B-Chat-v1.0-GGUF
OuteTTS-0.2-500M-GGUF
Hermes-2-Pro-Llama-3-8B-GGUF
gemma-2-2b-it-GGUF
Mixtral-8x7B-Instruct-v0.1-GGUF
Seed-OSS-36B-Instruct-GGUF
jina-embeddings-v2-base-code-GGUF
FLUX.1-Fill-dev-GGUF
DeepSeek-R1-Distill-Llama-70B-GGUF
Qwen2-VL-2B-Instruct-GGUF
Qwen2.5-3B-Instruct-GGUF
DeepSeek-R1-Distill-Qwen-14B-GGUF
gemma-2-9b-it-GGUF
Qwen2.5-14B-Instruct-GGUF
Nous-Hermes-2-Mixtral-8x7B-SFT-GGUF
SmolVLM2-500M-Video-Instruct-GGUF
gemma-2-27b-it-GGUF
Qwen2-1.5B-Instruct-GGUF
DeepSeek-Coder-V2-Instruct-GGUF
MiniCPM-o-2_6-GGUF
Codestral-22B-v0.1-GGUF
stable-diffusion-3-medium-GGUF
Gemma-2-9B-Chinese-Chat-GGUF
Qwen1.5-1.8B-Chat-GGUF
DeepSeek-R1-Distill-Qwen-1.5B-GGUF
MiniCPM-V-4-GGUF
Qwen2-VL-72B-Instruct-GGUF
Qwen3-32B-GGUF
- Thinking: v0.17.0 and above - No Thinking: v0.18.2
c4ai-command-r-plus-08-2024-GGUF
Llama-3-Groq-8B-Tool-Use-GGUF
Qwen2-0.5B-Instruct-GGUF
Yi-1.5-34B-Chat-GGUF
Yi-34B-Chat-GGUF
MiniCPM-V-2_6-GGUF
Qwen2.5-Coder-7B-Instruct-GGUF
gemma-3-12b-it-GGUF
Qwen2.5-Coder-0.5B-Instruct-GGUF
gemma-3-1b-it-GGUF
Mistral-7B-Instruct-v0.1-GGUF
Phi-3-medium-4k-instruct-GGUF
DeepSeek-R1-Distill-Qwen-32B-GGUF
SmolLM3-3B-GGUF
FLUX.1-Canny-dev-GGUF
gemma-3-27b-it-GGUF
EXAONE-Deep-2.4B-GGUF
Qwen3-8B-GGUF
Starling-LM-7B-alpha-GGUF
Falcon3-10B-Instruct-GGUF
Qwen3-14B-GGUF
FLUX.1-Redux-dev-GGUF
Qwen2.5-VL-32B-Instruct-GGUF
Yi-Coder-9B-Chat-GGUF
Yi-1.5-6B-Chat-GGUF
Qwen3-30B-A3B-Instruct-2507-GGUF
Qwen3-0.6B-GGUF
Llama-3.3-70B-Instruct-GGUF
SmolVLM2-256M-Video-Instruct-GGUF
gpt-oss-120b-GGUF
- LlamaEdge version: v0.25.0 and above (0.25.1+ with tool call support)
Qwen1.5-7B-Chat-GGUF
Meta-Llama-3-70B-Instruct-GGUF
CodeLlama-70b-Instruct-hf-GGUF
Qwen2.5-72B-Instruct-GGUF
WizardLM-13B-V1.0-Uncensored-GGUF
EXAONE-3.5-2.4B-Instruct-GGUF
glm-4-9b-chat-GGUF
MiniCPM-Llama3-V-2_5-GGUF
DeepSeek-R1-Distill-Qwen-7B-GGUF
Llama-3.1-Nemotron-70B-Instruct-HF-GGUF
OpenHermes-2.5-Mistral-7B-GGUF
WizardCoder-Python-7B-v1.0-GGUF
QwQ-32B-Preview-GGUF
Qwen2.5-Coder-3B-Instruct-GGUF
glm-4-9b-chat-1m-GGUF
OrionStar-Yi-34B-Chat-Llama-GGUF
Phi-3-mini-128k-instruct-GGUF
moxin-chat-7b-GGUF
internlm2_5-20b-chat-GGUF
Octopus-v2-GGUF
Llama-3-Groq-70B-Tool-Use-GGUF
jina-embeddings-v2-base-de-GGUF
Qwen2-Audio-7B-Instruct-GGUF
Yi-6B-Chat-GGUF
OpenChat-3.5-GGUF
Qwen3-Coder-30B-A3B-Instruct-GGUF
system {systemmessage} user {usermessage1} assistant {assistantmessage1} user {usermessage2} assistant bash wasmedge --dir .:. --nn-preload default:GGML:AUTO:Qwen3-Coder-30B-A3B-Instruct-Q5KM.gguf \ llama-api-server.wasm \ --model-name Qwen3-Coder-30B-A3B-Instruct \ --prompt-template chatml \ --ctx-size 256000 ```
StarCoder2-15B-Instruct-v0.1-GGUF
llm-compiler-13b-GGUF
Yi-1.5-34B-Chat-16K-GGUF
llm-compiler-7b-ftd-GGUF
EXAONE-3.5-32B-Instruct-GGUF
DeepSeek-R1-Distill-Llama-8B-GGUF
OpenChat-3.5-0106-GGUF
NVIDIA-Nemotron-Nano-9B-v2-GGUF
gemma-1.1-2b-it-GGUF
Qwen2.5-Math-1.5B-Instruct-GGUF
gemma-1.1-7b-it-GGUF
Llama-3-8B-Instruct-GGUF
Llama-2-7B-Chat-GGUF
EXAONE-3.0-7.8B-Instruct-GGUF
Qwen2-72B-Instruct-GGUF
Bielik-4.5B-v3.0-Instruct-GGUF
c4ai-command-r-08-2024-GGUF
datagemma-rig-27b-it-GGUF
Qwen2.5-7B-Instruct-GGUF
internlm3-8b-instruct-GGUF
Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF
openchat-3.5-1210-GGUF
Qwen1.5-4B-Chat-GGUF
NeuralBeagle14-7B-GGUF
Qwen3-1.7B-GGUF
internlm2_5-7b-chat-GGUF
gpt-oss-20b-GGUF
Qwen2.5-1.5B-Instruct-GGUF
Llama-3_1-Nemotron-51B-Instruct-GGUF
Qwen2.5-VL-7B-Instruct-GGUF
Infinity-Instruct-7M-Gen-Llama3_1-70B-GGUF
jina-embeddings-v2-small-en-GGUF
OuteTTS-0.3-500M-GGUF
SmolLM-135M-Instruct-GGUF
Zephyr-7B-Beta-GGUF
Llama-3-Instruct-8B-SimPO-GGUF
Llama-3.1-Nemotron-70B-Reward-HF-GGUF
Phi-3.5-mini-instruct-GGUF
Palmyra-Fin-70B-32K-GGUF
Qwen2.5-32B-Instruct-GGUF
Falcon3-3B-Instruct-GGUF
ChatAllInOne-Yi-34B-200K-V1-GGUF
Palmyra-Med-70B-32K-GGUF
functionary-small-v3.2-GGUF
Llama-3-Taiwan-70B-Instruct-GGUF
Qwen2.5-Coder-14B-Instruct-GGUF
SOLAR-10.7B-Instruct-v1.0-GGUF
CodeGemma-7b-it-GGUF
Seed-Coder-8B-Reasoning-GGUF
internlm2_5-1_8b-chat-GGUF
Tessa-T1-3B-GGUF
Llama-3-8B-Instruct-Gradient-1048k-GGUF
Neural-Chat-7B-v3-3-GGUF
MistralLite-7B-GGUF
Tessa-T1-32B-GGUF
Qwen1.5-110B-Chat-GGUF
Osmosis-Structure-0.6B-GGUF
gemma-3-270m-it-GGUF
Reflection-Llama-3.1-70B-GGUF
Mistral-Small-24B-Instruct-2501-GGUF
Qwen2.5-Math-7B-Instruct-GGUF
Triplex-GGUF
Qwen2-Math-1.5B-Instruct-GGUF
xLAM-8x7b-r-GGUF
Samantha-1.2-Mistral-7B-GGUF
Qwen3-30B-A3B-Thinking-2507-GGUF
system {systemmessage} user {usermessage1} assistant {assistantmessage1} user {usermessage2} assistant bash wasmedge --dir .:. --nn-preload default:GGML:AUTO:Qwen3-30B-A3B-Thinking-2507-Q5KM.gguf \ llama-api-server.wasm \ --model-name Qwen3-30B-A3B-Thinking-2507 \ --prompt-template chatml \ --ctx-size 256000 ```