onmydevice
.ai
⌘
⌘
Devices
Models
Benchmarks
Apps
Replace
Blog
MODELS
All Models
55 local AI models across 8 types. Click a model to see which devices can run it.
BROWSE BY TIER
Low tier
4 GB
Mid tier
8 GB
High tier
16 GB
Ultra tier
32 GB
FILTER BY TASK
All
Chat
Reasoning
Code
Vision
Embedding
Image
Voice
Transcription
Chat
19 models
SmolLM2 135M
99 MB
Ultra-tiny language model — runs anywhere
8K ctx
Low+ tier
→
SmolLM2 360M
267 MB
Ultra-small language model
8K ctx
Low+ tier
→
Qwen 2.5 0.5B
477 MB
Lightweight chat — surprisingly capable
32K ctx
Low+ tier
→
Gemma 3 1B
786 MB
Google's compact model — great reasoning for its size
32K ctx
Low+ tier
→
💬
LFM2.5 1.2B
731 MB
Liquid AI's hybrid model — blazing fast CPU inference
32K ctx
Low+ tier
→
TinyLlama 1.1B
780 MB
Popular tiny model trained on 3T tokens
2048 ctx
Low+ tier
→
Llama 3.2 1B
791 MB
Meta's solid all-rounder
128K ctx
Mid+ tier
→
StableLM 2 1.6B
1.0 GB
Stability AI's efficient small model
4K ctx
Mid+ tier
→
SmolLM2 1.7B
1.0 GB
Largest SmolLM2 — punches above its weight
8K ctx
Mid+ tier
→
Gemma 3n E2B
3.0 GB
Google's on-device multimodal — 2B effective params with vision and audio
32K ctx
Mid+ tier
→
Qwen 2.5 3B
2.0 GB
Sweet-spot model — great quality-to-size ratio
32K ctx
Mid+ tier
→
Llama 3.2 3B
2.0 GB
Meta's 3B — best small model for many tasks
128K ctx
Mid+ tier
→
Gemma 3n E4B
4.5 GB
Google's most capable on-device model — 4B effective, multimodal
32K ctx
High+ tier
→
Gemma 3 4B
2.8 GB
Google's 4B — strong reasoning and instruction following
128K ctx
High+ tier
→
Qwen 2.5 7B
4.4 GB
Top-tier 7B chat — rivals much larger models
128K ctx
High+ tier
→
Llama 3.1 8B
4.9 GB
Meta's workhorse 8B — excellent all-around
128K ctx
High+ tier
→
Bonsai 8B (1-bit)
1.16 GB
PrismML's native 1-bit model — 8B params in 1.16 GB, needs forked llama.cpp
32K ctx
Mid+ tier
→
Gemma 3 12B
7.3 GB
Google's largest open model — near-frontier quality
128K ctx
Ultra+ tier
→
Mistral Nemo 12B
7.1 GB
Mistral's 12B — Tekken tokenizer, 128K context
128K ctx
Ultra+ tier
→
Reasoning
8 models
🧠
DeepSeek R1 Distill 1.5B
1.1 GB
Distilled reasoning — chain-of-thought in a tiny package
128K ctx
Mid+ tier
→
Phi-4 Mini
8.7 GB
Microsoft's reasoning model — exceptional for its size
128K ctx
High+ tier
→
🧠
DeepSeek R1 Distill 7B
4.7 GB
Distilled from DeepSeek R1 — strong step-by-step reasoning
128K ctx
High+ tier
→
Mistral 7B
4.3 GB
High-quality reasoning and analysis
32K ctx
High+ tier
→
🧠
DeepSeek R1 Distill 8B
4.9 GB
Llama-based R1 distill — best open reasoning at 8B
128K ctx
High+ tier
→
🧠
Qwen3 8B
5.2 GB
Alibaba's latest — thinking mode with strong reasoning
128K ctx
High+ tier
→
Phi-4 Medium
8.0 GB
Microsoft's 14B reasoning model — frontier-class performance
128K ctx
Ultra+ tier
→
Llama 4 Scout
63 GB
Meta's MoE model — 109B total params, 17B active per token
128K ctx
Ultra+ tier
→
Code
6 models
Qwen 2.5 Coder 0.5B
477 MB
Tiny code completion model — autocomplete on any device
32K ctx
Low+ tier
→
Qwen 2.5 Coder 1.5B
1.1 GB
Best small code model — great for autocomplete
32K ctx
Mid+ tier
→
👨💻
StarCoder2 3B
1.9 GB
BigCode's multilingual code model — 600+ languages
8K ctx
Mid+ tier
→
Qwen 2.5 Coder 3B
2.0 GB
Strong code generation and editing at 3B
32K ctx
Mid+ tier
→
👨💻
DeepSeek Coder 6.7B
3.8 GB
DeepSeek's code model — strong at generation and debugging
8K ctx
High+ tier
→
Qwen 2.5 Coder 7B
4.4 GB
Best open 7B code model — rivals GPT-4 on coding benchmarks
128K ctx
High+ tier
→
Vision
4 models
👁️
SmolVLM 500M
490 MB
Tiny vision-language model — describe images on any device
4K ctx
Low+ tier
→
👁️
Moondream 2B
1.3 GB
Small but capable vision model — image Q&A and captioning
2048 ctx
Mid+ tier
→
👁️
LLaVA 1.6 7B
4.5 GB
Leading open vision model — image understanding and reasoning
4K ctx
High+ tier
→
Qwen 2.5 VL 7B
4.6 GB
State-of-the-art vision-language — image, video, document understanding
32K ctx
High+ tier
→
Embedding
4 models
🔢
all-MiniLM-L6-v2
23 MB
Fast sentence embeddings — ideal for semantic search
256 ctx
Low+ tier
→
🔢
BGE Small
34 MB
Compact BAAI embedding — great for RAG pipelines
512 ctx
Low+ tier
→
🔢
Nomic Embed Text
137 MB
Open-source embedding with 8K context — long document search
8K ctx
Low+ tier
→
🔢
GTE Large
335 MB
High-quality embeddings — top of MTEB benchmark at its size
512 ctx
Low+ tier
→
Image
4 models
🎨
Stable Diffusion Turbo
2.2 GB
1-step image generation — instant results
77 ctx
Mid+ tier
→
🎨
SDXL Turbo
6.5 GB
High-res 1-step generation — 1024×1024 in seconds
77 ctx
High+ tier
→
🎨
Stable Diffusion 3.5 Medium
4.6 GB
Latest SD architecture — excellent image quality
77 ctx
High+ tier
→
🎨
FLUX.1 Schnell
12.1 GB
Black Forest Labs' fast model — stunning quality in 4 steps
256 ctx
Ultra+ tier
→
Voice
4 models
🗣️
KittenTTS Nano
19 MB
Tiny, fast voice synthesis
256 ctx
Low+ tier
→
🗣️
Kokoro 82M
183 MB
24 natural voices — instant TTS
512 ctx
Low+ tier
→
🗣️
OuteTTS 0.3 500M
500 MB
Voice cloning and natural TTS — zero-shot voice synthesis
4K ctx
Mid+ tier
→
🗣️
Dia 1.6B
1.6 GB
Nari Labs dialogue TTS — multi-speaker with emotion
4K ctx
High+ tier
→
Transcription
6 models
Whisper Tiny
89 MB
Real-time speech recognition — runs on anything
448 ctx
Low+ tier
→
Wav2Vec2 Base
231 MB
High-accuracy English ASR
512 ctx
Low+ tier
→
Whisper Small
488 MB
Good accuracy-speed balance — 99 languages
448 ctx
Low+ tier
→
Whisper Medium
1.5 GB
Strong multilingual transcription
448 ctx
Mid+ tier
→
Whisper Large V3
3.1 GB
Best open transcription — near-human accuracy
448 ctx
High+ tier
→
🎙️
Distil-Whisper Large V3
1.5 GB
6x faster than Whisper Large — nearly same accuracy
448 ctx
Mid+ tier
→