onmydevice
.ai
⌘
⌘
Devices
Models
Benchmarks
Apps
Replace
Blog
📱
Galaxy S24
8GB
Android
$799
Buy on Samsung ↗
Buy on Amazon ↗
Chip
Snapdragon 8 Gen 3
RAM
8GB
GPU
Adreno 750
AI accelerator
Hexagon NPU
2/5
Best for: Entry-level Android local AI — transcription focus
▾
📱
The S25 Ultra has 12GB and a much faster NPU, enabling proper on-device chat models.
Galaxy S25 Ultra
· $1,299
+16 models
→
PICK A TOOL
No apps available for this task on Android.
MODELS YOU CAN RUN
All models
Chat
Reasoning
Voice
Transcription
SmolLM2 135M
99 MB
Ultra-tiny language model — runs anywhere
FP16
·
~158 tok/s
·
0.3 GB
LM Studio
Ollama
+2
Runs great
▾
🔢
Nomic Embed Text
137 MB
Open-source embedding with 8K context — long document search
FP16
·
~158 tok/s
·
0.3 GB
Ollama
Runs great
▾
SmolLM2 360M
267 MB
Ultra-small language model
FP16
·
~46 tok/s
·
0.9 GB
LM Studio
Ollama
+2
Runs great
▾
Qwen 2.5 0.5B
477 MB
Lightweight chat — surprisingly capable
FP16
·
~36 tok/s
·
1.2 GB
LM Studio
Ollama
+2
Runs great
▾
Qwen 2.5 Coder 0.5B
477 MB
Tiny code completion model — autocomplete on any device
FP16
·
~36 tok/s
·
1.2 GB
LM Studio
Ollama
+1
Runs great
▾
🔢
GTE Large
335 MB
High-quality embeddings — top of MTEB benchmark at its size
FP16
·
~64 tok/s
·
0.7 GB
Ollama
Runs great
▾
🗣️
Kokoro 82M
183 MB
24 natural voices — instant TTS
FP16
·
~119 tok/s
·
0.4 GB
Piper
Xybrid CLI
Runs great
▾
Wav2Vec2 Base
231 MB
High-accuracy English ASR
FP16
·
~186 tok/s
·
0.2 GB
MacWhisper
Whisper Transcription
Runs great
▾
Whisper Small
488 MB
Good accuracy-speed balance — 99 languages
FP16
·
~87 tok/s
·
0.5 GB
MacWhisper
Whisper Transcription
Runs great
▾
🔢
all-MiniLM-L6-v2
23 MB
Fast sentence embeddings — ideal for semantic search
FP16
·
~200 tok/s
·
0.0 GB
Ollama
Runs great
▾
🔢
BGE Small
34 MB
Compact BAAI embedding — great for RAG pipelines
FP16
·
~200 tok/s
·
0.1 GB
Ollama
Runs great
▾
🗣️
KittenTTS Nano
19 MB
Tiny, fast voice synthesis
FP16
·
~200 tok/s
·
0.0 GB
Piper
Xybrid CLI
Runs great
▾
Whisper Tiny
89 MB
Real-time speech recognition — runs on anything
FP16
·
~200 tok/s
·
0.1 GB
MacWhisper
Whisper Transcription
Runs great
▾
🗣️
OuteTTS 0.3 500M
500 MB
Voice cloning and natural TTS — zero-shot voice synthesis
FP16
·
~43 tok/s
·
1.0 GB
Xybrid CLI
Runs great
▾
👁️
SmolVLM 500M
490 MB
Tiny vision-language model — describe images on any device
FP16
·
~39 tok/s
·
1.1 GB
LM Studio
Ollama
Runs great
▾
SmolLM2 1.7B
1.0 GB
Largest SmolLM2 — punches above its weight
Q8
·
~24 tok/s
·
1.8 GB
LM Studio
Ollama
+2
Runs great
▾
Whisper Medium
1.5 GB
Strong multilingual transcription
FP16
·
~28 tok/s
·
1.5 GB
MacWhisper
Whisper Transcription
Runs great
▾
🎙️
Distil-Whisper Large V3
1.5 GB
6x faster than Whisper Large — nearly same accuracy
FP16
·
~28 tok/s
·
1.5 GB
MacWhisper
Whisper Transcription
Runs great
▾
👁️
Moondream 2B
1.3 GB
Small but capable vision model — image Q&A and captioning
Q8
·
~21 tok/s
·
2.0 GB
LM Studio
Ollama
Runs great
▾
Llama 3.2 3B
2.0 GB
Meta's 3B — best small model for many tasks
Q6
·
~16 tok/s
·
2.6 GB
LM Studio
Ollama
+2
Runs well
▾
Qwen 2.5 3B
2.0 GB
Sweet-spot model — great quality-to-size ratio
Q6
·
~17 tok/s
·
2.5 GB
LM Studio
Ollama
+2
Runs well
▾
Qwen 2.5 Coder 3B
2.0 GB
Strong code generation and editing at 3B
Q6
·
~17 tok/s
·
2.5 GB
LM Studio
Ollama
+1
Runs well
▾
Gemma 3 1B
786 MB
Google's compact model — great reasoning for its size
FP16
·
~19 tok/s
·
2.2 GB
LM Studio
Ollama
+2
Runs well
▾
💬
LFM2.5 1.2B
731 MB
Liquid AI's hybrid model — blazing fast CPU inference
FP16
·
~18 tok/s
·
2.3 GB
LM Studio
Ollama
+1
Runs well
▾
TinyLlama 1.1B
780 MB
Popular tiny model trained on 3T tokens
FP16
·
~19 tok/s
·
2.2 GB
LM Studio
Ollama
+2
Runs well
▾
🎨
Stable Diffusion Turbo
2.2 GB
1-step image generation — instant results
FP16
·
~20 tok/s
·
2.1 GB
Runs well
▾
🎨
SDXL Turbo
6.5 GB
High-res 1-step generation — 1024×1024 in seconds
Q4
·
~19 tok/s
·
2.2 GB
Runs well
▾
Bonsai 8B (1-bit)
1.16 GB
PrismML's native 1-bit model — 8B params in 1.16 GB, needs forked llama.cpp
Q1
·
~37 tok/s
·
1.2 GB
Tight fit
▾
Qwen 2.5 VL 7B
4.6 GB
State-of-the-art vision-language — image, video, document understanding
Q2
·
~16 tok/s
·
2.6 GB
LM Studio
Ollama
Tight fit
▾
Qwen 2.5 7B
4.4 GB
Top-tier 7B chat — rivals much larger models
Q3
·
~13 tok/s
·
3.3 GB
LM Studio
Ollama
+1
Tight fit
▾
Llama 3.1 8B
4.9 GB
Meta's workhorse 8B — excellent all-around
Q2
·
~16 tok/s
·
2.7 GB
LM Studio
Ollama
+1
Tight fit
▾
🧠
DeepSeek R1 Distill 7B
4.7 GB
Distilled from DeepSeek R1 — strong step-by-step reasoning
Q3
·
~13 tok/s
·
3.3 GB
LM Studio
Ollama
+1
Tight fit
▾
🧠
DeepSeek R1 Distill 8B
4.9 GB
Llama-based R1 distill — best open reasoning at 8B
Q2
·
~16 tok/s
·
2.7 GB
LM Studio
Ollama
+1
Tight fit
▾
🧠
Qwen3 8B
5.2 GB
Alibaba's latest — thinking mode with strong reasoning
Q2
·
~16 tok/s
·
2.7 GB
LM Studio
Ollama
+1
Tight fit
▾
Qwen 2.5 Coder 7B
4.4 GB
Best open 7B code model — rivals GPT-4 on coding benchmarks
Q3
·
~13 tok/s
·
3.3 GB
LM Studio
Ollama
+1
Tight fit
▾
Gemma 3 4B
2.8 GB
Google's 4B — strong reasoning and instruction following
Q5
·
~14 tok/s
·
3.1 GB
LM Studio
Ollama
+1
Tight fit
▾
Phi-4 Mini
8.7 GB
Microsoft's reasoning model — exceptional for its size
Q6
·
~14 tok/s
·
3.1 GB
LM Studio
Ollama
+1
Tight fit
▾
Mistral 7B
4.3 GB
High-quality reasoning and analysis
Q3
·
~14 tok/s
·
3.1 GB
LM Studio
Ollama
+1
Tight fit
▾
Llama 3.2 1B
791 MB
Meta's solid all-rounder
FP16
·
~16 tok/s
·
2.7 GB
LM Studio
Ollama
+2
Tight fit
▾
🧠
DeepSeek R1 Distill 1.5B
1.1 GB
Distilled reasoning — chain-of-thought in a tiny package
FP16
·
~13 tok/s
·
3.3 GB
LM Studio
Ollama
+1
Tight fit
▾
👨💻
DeepSeek Coder 6.7B
3.8 GB
DeepSeek's code model — strong at generation and debugging
Q3
·
~15 tok/s
·
2.9 GB
LM Studio
Ollama
+1
Tight fit
▾
Qwen 2.5 Coder 1.5B
1.1 GB
Best small code model — great for autocomplete
FP16
·
~13 tok/s
·
3.2 GB
LM Studio
Ollama
+1
Tight fit
▾
👨💻
StarCoder2 3B
1.9 GB
BigCode's multilingual code model — 600+ languages
Q8
·
~13 tok/s
·
3.3 GB
LM Studio
Ollama
+1
Tight fit
▾
Gemma 3n E2B
3.0 GB
Google's on-device multimodal — 2B effective params with vision and audio
Q5
·
~13 tok/s
·
3.3 GB
LM Studio
Ollama
+1
Tight fit
▾
👁️
LLaVA 1.6 7B
4.5 GB
Leading open vision model — image understanding and reasoning
Q3
·
~14 tok/s
·
3.0 GB
LM Studio
Ollama
Tight fit
▾
🎨
Stable Diffusion 3.5 Medium
4.6 GB
Latest SD architecture — excellent image quality
Q8
·
~16 tok/s
·
2.7 GB
Tight fit
▾
Whisper Large V3
3.1 GB
Best open transcription — near-human accuracy
FP16
·
~14 tok/s
·
3.1 GB
MacWhisper
Whisper Transcription
Tight fit
▾
StableLM 2 1.6B
1.0 GB
Stability AI's efficient small model
FP16
·
~13 tok/s
·
3.3 GB
LM Studio
Ollama
+2
Tight fit
▾
🗣️
Dia 1.6B
1.6 GB
Nari Labs dialogue TTS — multi-speaker with emotion
FP16
·
~13 tok/s
·
3.3 GB
Xybrid CLI
Tight fit
▾
Gemma 3n E4B
4.5 GB
Google's most capable on-device model — 4B effective, multimodal
needs 4.5 GB
Too heavy
Gemma 3 12B
7.3 GB
Google's largest open model — near-frontier quality
needs 4.1 GB
Too heavy
Mistral Nemo 12B
7.1 GB
Mistral's 12B — Tekken tokenizer, 128K context
needs 4.0 GB
Too heavy
Phi-4 Medium
8.0 GB
Microsoft's 14B reasoning model — frontier-class performance
needs 4.7 GB
Too heavy
Llama 4 Scout
63 GB
Meta's MoE model — 109B total params, 17B active per token
needs 42.0 GB
Too heavy
🎨
FLUX.1 Schnell
12.1 GB
Black Forest Labs' fast model — stunning quality in 4 steps
needs 4.0 GB
Too heavy
Your
Galaxy S24
can run
17 AI models
locally.
Share this page ↗