onmydevice
.ai
⌘
⌘
Devices
Models
Benchmarks
Apps
Replace
Blog
📱
iPhone 15
6GB
iOS
$699
Buy on Apple ↗
Buy on Amazon ↗
Chip
Apple A16
RAM
6GB
GPU
5-core GPU
AI accelerator
16-core Neural Engine
2/5
Best for: Transcription and basic TTS on a budget iPhone
▾
📱
The iPhone 16 Pro has 8GB and a much faster Neural Engine — a significant upgrade for AI tasks.
iPhone 16 Pro
· $999
+16 models
→
PICK A TOOL
💬 Chat
🎙️ Transcription
LocallyAI
Beginner
Free
Run AI models privately on your iPhone and iPad
Download ↗
MODELS YOU CAN RUN
All models
Chat
Reasoning
Voice
Transcription
SmolLM2 135M
99 MB
Ultra-tiny language model — runs anywhere
FP16
·
~109 tok/s
·
0.3 GB
LM Studio
Ollama
+2
Runs great
▾
🔢
Nomic Embed Text
137 MB
Open-source embedding with 8K context — long document search
FP16
·
~109 tok/s
·
0.3 GB
Ollama
Runs great
▾
🗣️
Kokoro 82M
183 MB
24 natural voices — instant TTS
FP16
·
~82 tok/s
·
0.4 GB
Piper
Xybrid CLI
Runs great
▾
Wav2Vec2 Base
231 MB
High-accuracy English ASR
FP16
·
~128 tok/s
·
0.2 GB
MacWhisper
Whisper Transcription
Runs great
▾
Whisper Small
488 MB
Good accuracy-speed balance — 99 languages
FP16
·
~60 tok/s
·
0.5 GB
MacWhisper
Whisper Transcription
Runs great
▾
🔢
all-MiniLM-L6-v2
23 MB
Fast sentence embeddings — ideal for semantic search
FP16
·
~200 tok/s
·
0.0 GB
Ollama
Runs great
▾
🔢
BGE Small
34 MB
Compact BAAI embedding — great for RAG pipelines
FP16
·
~200 tok/s
·
0.1 GB
Ollama
Runs great
▾
🗣️
KittenTTS Nano
19 MB
Tiny, fast voice synthesis
FP16
·
~200 tok/s
·
0.0 GB
Piper
Xybrid CLI
Runs great
▾
Whisper Tiny
89 MB
Real-time speech recognition — runs on anything
FP16
·
~200 tok/s
·
0.1 GB
MacWhisper
Whisper Transcription
Runs great
▾
🔢
GTE Large
335 MB
High-quality embeddings — top of MTEB benchmark at its size
FP16
·
~44 tok/s
·
0.7 GB
Ollama
Runs great
▾
SmolLM2 360M
267 MB
Ultra-small language model
FP16
·
~32 tok/s
·
0.9 GB
LM Studio
Ollama
+2
Runs great
▾
Qwen 2.5 0.5B
477 MB
Lightweight chat — surprisingly capable
FP16
·
~25 tok/s
·
1.2 GB
LM Studio
Ollama
+2
Runs great
▾
Qwen 2.5 Coder 0.5B
477 MB
Tiny code completion model — autocomplete on any device
FP16
·
~25 tok/s
·
1.2 GB
LM Studio
Ollama
+1
Runs great
▾
🗣️
OuteTTS 0.3 500M
500 MB
Voice cloning and natural TTS — zero-shot voice synthesis
FP16
·
~29 tok/s
·
1.0 GB
Xybrid CLI
Runs great
▾
👁️
SmolVLM 500M
490 MB
Tiny vision-language model — describe images on any device
FP16
·
~27 tok/s
·
1.1 GB
LM Studio
Ollama
Runs great
▾
Whisper Medium
1.5 GB
Strong multilingual transcription
FP16
·
~20 tok/s
·
1.5 GB
MacWhisper
Whisper Transcription
Runs great
▾
Whisper Large V3
3.1 GB
Best open transcription — near-human accuracy
Q8
·
~18 tok/s
·
1.6 GB
MacWhisper
Whisper Transcription
Runs great
▾
🎙️
Distil-Whisper Large V3
1.5 GB
6x faster than Whisper Large — nearly same accuracy
FP16
·
~20 tok/s
·
1.5 GB
MacWhisper
Whisper Transcription
Runs great
▾
🧠
DeepSeek R1 Distill 1.5B
1.1 GB
Distilled reasoning — chain-of-thought in a tiny package
Q8
·
~16 tok/s
·
1.8 GB
LM Studio
Ollama
+1
Runs well
▾
Qwen 2.5 Coder 1.5B
1.1 GB
Best small code model — great for autocomplete
Q8
·
~17 tok/s
·
1.7 GB
LM Studio
Ollama
+1
Runs well
▾
SmolLM2 1.7B
1.0 GB
Largest SmolLM2 — punches above its weight
Q8
·
~16 tok/s
·
1.8 GB
LM Studio
Ollama
+2
Runs well
▾
👁️
Moondream 2B
1.3 GB
Small but capable vision model — image Q&A and captioning
Q8
·
~15 tok/s
·
2.0 GB
LM Studio
Ollama
Runs well
▾
StableLM 2 1.6B
1.0 GB
Stability AI's efficient small model
Q8
·
~16 tok/s
·
1.8 GB
LM Studio
Ollama
+2
Runs well
▾
🗣️
Dia 1.6B
1.6 GB
Nari Labs dialogue TTS — multi-speaker with emotion
Q8
·
~16 tok/s
·
1.8 GB
Xybrid CLI
Runs well
▾
🎨
Stable Diffusion Turbo
2.2 GB
1-step image generation — instant results
FP16
·
~14 tok/s
·
2.1 GB
Runs well
▾
Bonsai 8B (1-bit)
1.16 GB
PrismML's native 1-bit model — 8B params in 1.16 GB, needs forked llama.cpp
Q1
·
~25 tok/s
·
1.2 GB
Tight fit
▾
Qwen 2.5 7B
4.4 GB
Top-tier 7B chat — rivals much larger models
Q2
·
~12 tok/s
·
2.5 GB
LM Studio
Ollama
+1
Tight fit
▾
Llama 3.1 8B
4.9 GB
Meta's workhorse 8B — excellent all-around
Q2
·
~11 tok/s
·
2.7 GB
LM Studio
Ollama
+1
Tight fit
▾
🧠
DeepSeek R1 Distill 7B
4.7 GB
Distilled from DeepSeek R1 — strong step-by-step reasoning
Q2
·
~12 tok/s
·
2.5 GB
LM Studio
Ollama
+1
Tight fit
▾
🧠
DeepSeek R1 Distill 8B
4.9 GB
Llama-based R1 distill — best open reasoning at 8B
Q2
·
~11 tok/s
·
2.7 GB
LM Studio
Ollama
+1
Tight fit
▾
🧠
Qwen3 8B
5.2 GB
Alibaba's latest — thinking mode with strong reasoning
Q2
·
~11 tok/s
·
2.7 GB
LM Studio
Ollama
+1
Tight fit
▾
Qwen 2.5 Coder 7B
4.4 GB
Best open 7B code model — rivals GPT-4 on coding benchmarks
Q2
·
~12 tok/s
·
2.5 GB
LM Studio
Ollama
+1
Tight fit
▾
Llama 3.2 3B
2.0 GB
Meta's 3B — best small model for many tasks
Q6
·
~11 tok/s
·
2.6 GB
LM Studio
Ollama
+2
Tight fit
▾
Gemma 3 4B
2.8 GB
Google's 4B — strong reasoning and instruction following
Q4
·
~12 tok/s
·
2.5 GB
LM Studio
Ollama
+1
Tight fit
▾
Phi-4 Mini
8.7 GB
Microsoft's reasoning model — exceptional for its size
Q5
·
~11 tok/s
·
2.7 GB
LM Studio
Ollama
+1
Tight fit
▾
Llama 3.2 1B
791 MB
Meta's solid all-rounder
FP16
·
~11 tok/s
·
2.7 GB
LM Studio
Ollama
+2
Tight fit
▾
Mistral 7B
4.3 GB
High-quality reasoning and analysis
Q2
·
~13 tok/s
·
2.3 GB
LM Studio
Ollama
+1
Tight fit
▾
Qwen 2.5 VL 7B
4.6 GB
State-of-the-art vision-language — image, video, document understanding
Q2
·
~11 tok/s
·
2.6 GB
LM Studio
Ollama
Tight fit
▾
Qwen 2.5 3B
2.0 GB
Sweet-spot model — great quality-to-size ratio
Q6
·
~12 tok/s
·
2.5 GB
LM Studio
Ollama
+2
Tight fit
▾
Qwen 2.5 Coder 3B
2.0 GB
Strong code generation and editing at 3B
Q6
·
~12 tok/s
·
2.5 GB
LM Studio
Ollama
+1
Tight fit
▾
💬
LFM2.5 1.2B
731 MB
Liquid AI's hybrid model — blazing fast CPU inference
FP16
·
~13 tok/s
·
2.3 GB
LM Studio
Ollama
+1
Tight fit
▾
Gemma 3 1B
786 MB
Google's compact model — great reasoning for its size
FP16
·
~13 tok/s
·
2.2 GB
LM Studio
Ollama
+2
Tight fit
▾
👨💻
DeepSeek Coder 6.7B
3.8 GB
DeepSeek's code model — strong at generation and debugging
Q2
·
~13 tok/s
·
2.2 GB
LM Studio
Ollama
+1
Tight fit
▾
👨💻
StarCoder2 3B
1.9 GB
BigCode's multilingual code model — 600+ languages
Q6
·
~12 tok/s
·
2.4 GB
LM Studio
Ollama
+1
Tight fit
▾
👁️
LLaVA 1.6 7B
4.5 GB
Leading open vision model — image understanding and reasoning
Q2
·
~13 tok/s
·
2.3 GB
LM Studio
Ollama
Tight fit
▾
🎨
Stable Diffusion 3.5 Medium
4.6 GB
Latest SD architecture — excellent image quality
Q8
·
~11 tok/s
·
2.7 GB
Tight fit
▾
TinyLlama 1.1B
780 MB
Popular tiny model trained on 3T tokens
FP16
·
~13 tok/s
·
2.2 GB
LM Studio
Ollama
+2
Tight fit
▾
🎨
SDXL Turbo
6.5 GB
High-res 1-step generation — 1024×1024 in seconds
Q4
·
~13 tok/s
·
2.2 GB
Tight fit
▾
Gemma 3n E2B
3.0 GB
Google's on-device multimodal — 2B effective params with vision and audio
needs 3.0 GB
Too heavy
Gemma 3n E4B
4.5 GB
Google's most capable on-device model — 4B effective, multimodal
needs 4.5 GB
Too heavy
Gemma 3 12B
7.3 GB
Google's largest open model — near-frontier quality
needs 4.1 GB
Too heavy
Mistral Nemo 12B
7.1 GB
Mistral's 12B — Tekken tokenizer, 128K context
needs 4.0 GB
Too heavy
Phi-4 Medium
8.0 GB
Microsoft's 14B reasoning model — frontier-class performance
needs 4.7 GB
Too heavy
Llama 4 Scout
63 GB
Meta's MoE model — 109B total params, 17B active per token
needs 42.0 GB
Too heavy
🎨
FLUX.1 Schnell
12.1 GB
Black Forest Labs' fast model — stunning quality in 4 steps
needs 4.0 GB
Too heavy
Your
iPhone 15
can run
17 AI models
locally.
Share this page ↗