40+ AI models run 4+ tok/s on Raspberry Pi 5 (4GB)

Tested all 2,499 GGUF models. Here's what actually works.

🟢

EXCELLENT PERFORMANCE

Real-time capable • 4-6.5 tok/s

Llama 3.2-3B6.5 tok/s

Phi-3-mini6 tok/s

Qwen 2.5-3B5.5 tok/s

+ 40 more models

Cost vs Cloud AI

Raspberry Pi 4$60 USD

Cloud AI/month$20-200 USD

ROI in

3 months

Buy Raspberry Pi 5 (4GB) → $60 USD

Fast enough for:

✓

Real-time chat

Llama 3.2-3B at 6.5 tok/s

✓

Code review

Phi-3-mini at 6 tok/s

✓

Document Q&A

Qwen 2.5-3B at 5.5 tok/s

ℹ️

Raspberry Pi 4 vs Pi 5

Raspberry Pi 4 (4GB)

• 6.5 tok/s peak (small models)
• $60 USD (best value)
• 2.5 tok/s for 7B models
• Great for learning

Raspberry Pi 5 (8GB)

• 7 tok/s peak (10% faster)
• $80 USD
• 4 tok/s for 7B models (60% faster)
• Better for production

Recommendation: Pi 4 offers excellent value for experimentation. Upgrade to Pi 5 if you need faster 7B model inference or plan production deployments.

See all 40+ excellent models

Top Picks for Raspberry Pi 5 (4GB)

Curated models optimized for your hardware - ready to run

Best Performance

Fastest

Llama 3.2-3B

Speed on Raspberry Pi 5 (4GB)

7 tok/s

Memory

3.2 GB

Best For

Real-time chat, voice assistants

Quick Setup

ollama run llama3.2:3b

Try This Model Buy Raspberry Pi 5 (4GB) ($60)

Best Quality

Most Accurate

Qwen 2.5 (7B Q4)

Speed on Raspberry Pi 5 (4GB)

4 tok/s

Memory

5.1 GB

Best For

Document Q&A, analysis

Quick Setup

ollama run qwen2.5:7b-instruct-q4

Try This Model Buy Raspberry Pi 5 (4GB) ($60)

Coding Expert

Best for Code

Phi-3-mini

Speed on Raspberry Pi 5 (4GB)

6.5 tok/s

Memory

2.8 GB

Best For

Code completion, debugging

Quick Setup

ollama run phi3:mini

Try This Model Buy Raspberry Pi 5 (4GB) ($60)

Get Started with Edge AI

Hardware and cloud options for running models locally

RunPod

Rent GPU starting at $0.34/hour

Best Value

Deploy on cloud GPU or serverless. 70% cheaper than AWS.

Start from $0.34/hr

Amazon

Hardware for edge AI

Hardware

Get the devices you need to run models locally.

Shop Hardware

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.

All Compatible Models

Grouped by performance on Raspberry Pi 5 (4GB) • 200 total models

mxbai-embed-large-v1

mixedbread-ai

512

tokens

--- tags: - mteb - transformers.js - transformers model-index: - name: mxbai-angle-large-v1 results: - task: type: Classification dataset: type: mteb/amazon_counterfactual name: MTEB AmazonCounterfactualClassification (en) config: en split: test revision: e8379541af4e31359cca9fbcf4b00f2671dba205 metrics: - type: accuracy value: 75.044776119403 - type: ap value: 37.7362433623053 - type: f1 value: 68.92736573359774 - task: type: Classification dataset: type: mteb/amazon_polarity name: MTEB AmazonP

40+ AI models run 4+ tok/s on Raspberry Pi 5 (4GB)

EXCELLENT PERFORMANCE

Cost vs Cloud AI

Fast enough for:

Raspberry Pi 4 vs Pi 5

Top Picks for Raspberry Pi 5 (4GB)

Fastest

Most Accurate

Best for Code

Get Started with Edge AI

RunPod

Amazon

All Compatible Models

excellent (58)

mxbai-embed-large-v1

Vintern-1B-v3_5-GGUF

gemma-2b

Qwen3-1.7B-GGUF

Llama-3.2-1B-Instruct-GGUF

TinyLlama-1.1B-Chat-v1.0-GGUF

gemma-2-2b-it-GGUF

snowflake-arctic-embed-m-v1.5

gemma-3-1b-it-GGUF

firefunction-v2-GGUF

Qwen3-0.6B-GGUF

gemma-2-2b-it-GGUF

Qwen2.5-1.5B-Instruct-GGUF

Yi-Coder-1.5B-Chat-GGUF

INTELLECT-2-GGUF

Osmosis-Structure-0.6B

gemma-2b-it

Llama-3.2-1B-Instruct-GGUF

stories15M_MOE

Gemma-2b-it-GGUF

Qwen3-1.7B-GGUF

phi-2-GGUF

Qwen3-1.7B-GGUF

Qwen2.5-0.5B-Instruct-GGUF

Qwen3-0.6B-GGUF

hunyuanimage-gguf

Qwen2.5-1.5B-Instruct-GGUF

gemma-3-1b-it-GGUF

Llama-3.2-1B-Instruct-Q8_0-GGUF

Qwen3-0.6B-GGUF

Gemmasutra-Mini-2B-v1-GGUF

Llama-3.2-1B-Instruct-GGUF

gemma-3n-E2B-it-GGUF

Kimi-K2-Thinking-GGUF

Qwen3-VL-2B-Thinking-GGUF

Qwen3-VL-2B-Instruct-GGUF

DeepSeek-R1-Distill-Qwen-1.5B-GGUF

DeepSeek-R1-Distill-Qwen-1.5B-GGUF

SmolLM2-1.7B-Instruct-GGUF

Llama-OuteTTS-1.0-1B-GGUF

Qwen3-Embedding-0.6B-GGUF

gemma-3-1B-it-qat-GGUF

LFM2-1.2B-GGUF

Huihui-Ling-flash-2.0-abliterated-GGUF

Qwen3-0.6B-GGUF

Qwen3-VL-2B-Thinking-1M-GGUF

granite-3.3-2b-instruct-GGUF

LFM2-VL-1.6B-GGUF

granite-4.0-h-1b-GGUF

Ling-flash-2.0-i1-GGUF

HiDream-E1-1-GGUF

Llama-3.2-1B-Instruct-GGUF

MiniMax-M2-GGUF

Qwen2-0.5B-Instruct-GGUF

Wan2.1BasedModels

wan-1.3b-gguf

Huihui-Ling-mini-2.0-abliterated

bge-small-en-v1.5-gguf

good (49)

usable (93)

Getting Started

Recommended Sizes

Popular Models

Quick Setup

Get Started with Edge AI

RunPod