onmydevice
.ai
⌘
⌘
Devices
Models
Benchmarks
Apps
Replace
Blog
MODELS
All Models
75 local AI models across 8 types. Click a model to see which devices can run it.
BROWSE BY TIER
Low tier
4 GB
Mid tier
8 GB
High tier
16 GB
Ultra tier
32 GB
FILTER BY TASK
All
Chat
Reasoning
Code
Vision
Embedding
Image
Voice
Transcription
No models match your search.
Chat
32 models
SmolLM2 135M
99 MB
Ultra-tiny language model — runs anywhere
8K ctx
Low+ tier
→
SmolLM2 360M
267 MB
Ultra-small language model
8K ctx
Low+ tier
→
Qwen 2.5 0.5B
477 MB
Lightweight chat — surprisingly capable
32K ctx
Low+ tier
→
Gemma 3 1B
786 MB
Google's compact model — great reasoning for its size
32K ctx
Low+ tier
→
LFM2.5 1.2B
731 MB
Liquid AI's hybrid model — blazing fast CPU inference
32K ctx
Low+ tier
→
TinyLlama 1.1B
780 MB
Popular tiny model trained on 3T tokens
2048 ctx
Low+ tier
→
Llama 3.2 1B
791 MB
Meta's solid all-rounder
128K ctx
Mid+ tier
→
StableLM 2 1.6B
1.0 GB
Stability AI's efficient small model
4K ctx
Mid+ tier
→
SmolLM2 1.7B
1.0 GB
Largest SmolLM2 — punches above its weight
8K ctx
Mid+ tier
→
Gemma 3n E2B
3.0 GB
Google's on-device multimodal — 2B effective params with vision and audio
32K ctx
Mid+ tier
→
Qwen 2.5 3B
2.0 GB
Sweet-spot model — great quality-to-size ratio
32K ctx
Mid+ tier
→
Llama 3.2 3B
2.0 GB
Meta's 3B — best small model for many tasks
128K ctx
Mid+ tier
→
Gemma 3n E4B
4.5 GB
Google's most capable on-device model — 4B effective, multimodal
32K ctx
High+ tier
→
Gemma 3 4B
2.8 GB
Google's 4B — strong reasoning and instruction following
128K ctx
High+ tier
→
Qwen 2.5 7B
4.4 GB
Top-tier 7B chat — rivals much larger models
128K ctx
High+ tier
→
Llama 3.1 8B
4.9 GB
Meta's workhorse 8B — excellent all-around
128K ctx
High+ tier
→
Bonsai 8B (1-bit)
1.16 GB
PrismML's native 1-bit model — 8B params in 1.16 GB, needs forked llama.cpp
32K ctx
Mid+ tier
→
Ternary Bonsai 1.7B
380 MB
PrismML's 1.58-bit ternary model — 1.7B params in under 0.4 GB, runs anywhere
32K ctx
Low+ tier
→
Ternary Bonsai 4B
860 MB
PrismML's 1.58-bit ternary model — 4B-class intelligence in ~0.86 GB, ~9x smaller than fp16
32K ctx
Low+ tier
→
Ternary Bonsai 8B
1.6 GB
PrismML's 1.58-bit ternary model — top intelligence at 8B in ~1.6 GB
32K ctx
Mid+ tier
→
Gemma 3 12B
7.3 GB
Google's largest open model — near-frontier quality
128K ctx
Ultra+ tier
→
Mistral Nemo 12B
7.1 GB
Mistral's 12B — Tekken tokenizer, 128K context
128K ctx
Ultra+ tier
→
Gemma 4 E2B
1.5 GB
Google's 2026 on-device model — 2.3B active / 5.1B total, multimodal vision and audio
128K ctx
Mid+ tier
→
Gemma 4 E4B
3.0 GB
Google's most capable edge model — 4B effective, multimodal, 128K context
128K ctx
High+ tier
→
Gemma 4 26B A4B
16 GB
Google's sparse MoE — 26B total, 3.8B active per token, 256K context
128K ctx
Ultra+ tier
→
Gemma 4 31B
18 GB
Google's dense flagship open model — near server-grade quality, 256K context
128K ctx
Ultra+ tier
→
Qwen 3.5 0.8B
560 MB
Alibaba's tiny multimodal model — vision-capable, 262K context, runs on phones
128K ctx
Low+ tier
→
Qwen 3.5 2B
1.3 GB
Compact multimodal chat — native vision, 262K context
128K ctx
Mid+ tier
→
Qwen 3.5 4B
2.5 GB
Multimodal sweet spot — vision, strong reasoning, 262K context
128K ctx
Mid+ tier
→
Qwen 3.5 9B
5.5 GB
Flagship small Qwen — native multimodal, 262K context, rivals far larger models
128K ctx
High+ tier
→
Qwen 3.5 35B A3B
20 GB
Alibaba's sparse MoE — 35B total, 3B active per token, multimodal, 262K context
128K ctx
Ultra+ tier
→
LFM2.5 8B A1B
4.7 GB
Liquid AI's on-device MoE — 8.3B total, 1.5B active, reasoning + tool calling at 3–4B quality
128K ctx
Mid+ tier
→
Reasoning
10 models
🧠
DeepSeek R1 Distill 1.5B
1.1 GB
Distilled reasoning — chain-of-thought in a tiny package
128K ctx
Mid+ tier
→
Phi-4 Mini
8.7 GB
Microsoft's reasoning model — exceptional for its size
128K ctx
High+ tier
→
🧠
DeepSeek R1 Distill 7B
4.7 GB
Distilled from DeepSeek R1 — strong step-by-step reasoning
128K ctx
High+ tier
→
Mistral 7B
4.3 GB
High-quality reasoning and analysis
32K ctx
High+ tier
→
🧠
DeepSeek R1 Distill 8B
4.9 GB
Llama-based R1 distill — best open reasoning at 8B
128K ctx
High+ tier
→
🧠
Qwen3 8B
5.2 GB
Alibaba's latest — thinking mode with strong reasoning
128K ctx
High+ tier
→
Phi-4 Medium
8.0 GB
Microsoft's 14B reasoning model — frontier-class performance
128K ctx
Ultra+ tier
→
Llama 4 Scout
63 GB
Meta's MoE model — 109B total params, 17B active per token
128K ctx
Ultra+ tier
→
DeepSeek V4 Flash
156 GB
DeepSeek's V4 Flash MoE — 284B total, 13B active, 1M context (server-grade hardware)
128K ctx
Ultra+ tier
→
Kimi K2.6
594 GB
Moonshot's 1T MoE — 32B active, frontier agentic coding; needs a multi-GPU workstation
128K ctx
Ultra+ tier
→
Code
8 models
Qwen 2.5 Coder 0.5B
477 MB
Tiny code completion model — autocomplete on any device
32K ctx
Low+ tier
→
Qwen 2.5 Coder 1.5B
1.1 GB
Best small code model — great for autocomplete
32K ctx
Mid+ tier
→
👨💻
StarCoder2 3B
1.9 GB
BigCode's multilingual code model — 600+ languages
8K ctx
Mid+ tier
→
Qwen 2.5 Coder 3B
2.0 GB
Strong code generation and editing at 3B
32K ctx
Mid+ tier
→
👨💻
DeepSeek Coder 6.7B
3.8 GB
DeepSeek's code model — strong at generation and debugging
8K ctx
High+ tier
→
Qwen 2.5 Coder 7B
4.4 GB
Best open 7B code model — rivals GPT-4 on coding benchmarks
128K ctx
High+ tier
→
👨💻
Laguna XS.2
19 GB
Poolside's open agentic coding MoE — 33B total, 3B active, runs locally on 36GB Macs
128K ctx
Ultra+ tier
→
Qwen3 Coder Next
46 GB
Alibaba's local coding agent MoE — 80B total, 3B active, agentic long-horizon coding, 256K context
128K ctx
Ultra+ tier
→
Vision
4 models
👁️
SmolVLM 500M
490 MB
Tiny vision-language model — describe images on any device
4K ctx
Low+ tier
→
👁️
Moondream 2B
1.3 GB
Small but capable vision model — image Q&A and captioning
2048 ctx
Mid+ tier
→
👁️
LLaVA 1.6 7B
4.5 GB
Leading open vision model — image understanding and reasoning
4K ctx
High+ tier
→
Qwen 2.5 VL 7B
4.6 GB
State-of-the-art vision-language — image, video, document understanding
32K ctx
High+ tier
→
Embedding
4 models
🔢
all-MiniLM-L6-v2
23 MB
Fast sentence embeddings — ideal for semantic search
256 ctx
Low+ tier
→
🔢
BGE Small
34 MB
Compact BAAI embedding — great for RAG pipelines
512 ctx
Low+ tier
→
🔢
Nomic Embed Text
137 MB
Open-source embedding with 8K context — long document search
8K ctx
Low+ tier
→
🔢
GTE Large
335 MB
High-quality embeddings — top of MTEB benchmark at its size
512 ctx
Low+ tier
→
Image
5 models
🎨
Stable Diffusion Turbo
2.2 GB
1-step image generation — instant results
77 ctx
Mid+ tier
→
🎨
SDXL Turbo
6.5 GB
High-res 1-step generation — 1024×1024 in seconds
77 ctx
High+ tier
→
🎨
Stable Diffusion 3.5 Medium
4.6 GB
Latest SD architecture — excellent image quality
77 ctx
High+ tier
→
🎨
FLUX.1 Schnell
12.1 GB
Black Forest Labs' fast model — stunning quality in 4 steps
256 ctx
Ultra+ tier
→
Bonsai Image 4B
1.8 GB
PrismML's 1.58-bit ternary diffusion — first 4B image model to run on iPhone, ~3GB in-browser
256 ctx
Mid+ tier
→
Voice
6 models
🗣️
KittenTTS Nano
19 MB
Tiny, fast voice synthesis
256 ctx
Low+ tier
→
🗣️
Kokoro 82M
183 MB
24 natural voices — instant TTS
512 ctx
Low+ tier
→
🗣️
OuteTTS 0.3 500M
500 MB
Voice cloning and natural TTS — zero-shot voice synthesis
4K ctx
Mid+ tier
→
🗣️
Dia 1.6B
1.6 GB
Nari Labs dialogue TTS — multi-speaker with emotion
4K ctx
High+ tier
→
🗣️
NeuTTS Air
450 MB
Neuphonic's on-device TTS — 748M, instant voice cloning, runs on CPU via llama.cpp
2048 ctx
Low+ tier
→
🗣️
KittenTTS Mini
80 MB
KittenML's expressive 80M TTS — 8 voices, quantization-aware, runs on any CPU
512 ctx
Low+ tier
→
Transcription
6 models
Whisper Tiny
89 MB
Real-time speech recognition — runs on anything
448 ctx
Low+ tier
→
Wav2Vec2 Base
231 MB
High-accuracy English ASR
512 ctx
Low+ tier
→
Whisper Small
488 MB
Good accuracy-speed balance — 99 languages
448 ctx
Low+ tier
→
Whisper Medium
1.5 GB
Strong multilingual transcription
448 ctx
Mid+ tier
→
Whisper Large V3
3.1 GB
Best open transcription — near-human accuracy
448 ctx
High+ tier
→
🎙️
Distil-Whisper Large V3
1.5 GB
6x faster than Whisper Large — nearly same accuracy
448 ctx
Mid+ tier
→