onmydevice
.ai
⌘
⌘
Devices
Models
Benchmarks
Apps
Replace
Blog
📱
Galaxy S25 Ultra
12GB
Android
$1,299
Buy on Samsung ↗
Buy on Amazon ↗
Chip
Snapdragon 8 Elite
RAM
12GB
GPU
Adreno 830
AI accelerator
Hexagon NPU
3/5
Best for: Android power users who want the best mobile AI experience
▾
PICK A TOOL
No apps available for this task on Android.
MODELS YOU CAN RUN
All models
Chat
Reasoning
Voice
Transcription
Qwen 2.5 0.5B
477 MB
Lightweight chat — surprisingly capable
FP16
·
~45 tok/s
·
1.2 GB
LM Studio
Ollama
+2
Runs great
▾
Qwen 2.5 Coder 0.5B
477 MB
Tiny code completion model — autocomplete on any device
FP16
·
~45 tok/s
·
1.2 GB
LM Studio
Ollama
+1
Runs great
▾
SmolLM2 360M
267 MB
Ultra-small language model
FP16
·
~59 tok/s
·
0.9 GB
LM Studio
Ollama
+2
Runs great
▾
SmolLM2 135M
99 MB
Ultra-tiny language model — runs anywhere
FP16
·
~200 tok/s
·
0.3 GB
LM Studio
Ollama
+2
Runs great
▾
🔢
Nomic Embed Text
137 MB
Open-source embedding with 8K context — long document search
FP16
·
~200 tok/s
·
0.3 GB
Ollama
Runs great
▾
👁️
SmolVLM 500M
490 MB
Tiny vision-language model — describe images on any device
FP16
·
~49 tok/s
·
1.1 GB
LM Studio
Ollama
Runs great
▾
🔢
GTE Large
335 MB
High-quality embeddings — top of MTEB benchmark at its size
FP16
·
~80 tok/s
·
0.7 GB
Ollama
Runs great
▾
🗣️
Kokoro 82M
183 MB
24 natural voices — instant TTS
FP16
·
~150 tok/s
·
0.4 GB
Piper
Xybrid CLI
Runs great
▾
🗣️
OuteTTS 0.3 500M
500 MB
Voice cloning and natural TTS — zero-shot voice synthesis
FP16
·
~54 tok/s
·
1.0 GB
Xybrid CLI
Runs great
▾
Wav2Vec2 Base
231 MB
High-accuracy English ASR
FP16
·
~200 tok/s
·
0.2 GB
MacWhisper
Whisper Transcription
Runs great
▾
Whisper Small
488 MB
Good accuracy-speed balance — 99 languages
FP16
·
~110 tok/s
·
0.5 GB
MacWhisper
Whisper Transcription
Runs great
▾
🔢
all-MiniLM-L6-v2
23 MB
Fast sentence embeddings — ideal for semantic search
FP16
·
~200 tok/s
·
0.0 GB
Ollama
Runs great
▾
🔢
BGE Small
34 MB
Compact BAAI embedding — great for RAG pipelines
FP16
·
~200 tok/s
·
0.1 GB
Ollama
Runs great
▾
🗣️
KittenTTS Nano
19 MB
Tiny, fast voice synthesis
FP16
·
~200 tok/s
·
0.0 GB
Piper
Xybrid CLI
Runs great
▾
Whisper Tiny
89 MB
Real-time speech recognition — runs on anything
FP16
·
~200 tok/s
·
0.1 GB
MacWhisper
Whisper Transcription
Runs great
▾
Llama 3.2 1B
791 MB
Meta's solid all-rounder
FP16
·
~20 tok/s
·
2.7 GB
LM Studio
Ollama
+2
Runs great
▾
Gemma 3 1B
786 MB
Google's compact model — great reasoning for its size
FP16
·
~24 tok/s
·
2.2 GB
LM Studio
Ollama
+2
Runs great
▾
💬
LFM2.5 1.2B
731 MB
Liquid AI's hybrid model — blazing fast CPU inference
FP16
·
~23 tok/s
·
2.3 GB
LM Studio
Ollama
+1
Runs great
▾
Whisper Medium
1.5 GB
Strong multilingual transcription
FP16
·
~36 tok/s
·
1.5 GB
MacWhisper
Whisper Transcription
Runs great
▾
🎙️
Distil-Whisper Large V3
1.5 GB
6x faster than Whisper Large — nearly same accuracy
FP16
·
~36 tok/s
·
1.5 GB
MacWhisper
Whisper Transcription
Runs great
▾
TinyLlama 1.1B
780 MB
Popular tiny model trained on 3T tokens
FP16
·
~24 tok/s
·
2.2 GB
LM Studio
Ollama
+2
Runs great
▾
🎨
Stable Diffusion Turbo
2.2 GB
1-step image generation — instant results
FP16
·
~26 tok/s
·
2.1 GB
Runs great
▾
Llama 3.2 3B
2.0 GB
Meta's 3B — best small model for many tasks
Q8
·
~15 tok/s
·
3.6 GB
LM Studio
Ollama
+2
Runs well
▾
🧠
DeepSeek R1 Distill 1.5B
1.1 GB
Distilled reasoning — chain-of-thought in a tiny package
FP16
·
~16 tok/s
·
3.3 GB
LM Studio
Ollama
+1
Runs well
▾
Qwen 2.5 3B
2.0 GB
Sweet-spot model — great quality-to-size ratio
Q8
·
~15 tok/s
·
3.5 GB
LM Studio
Ollama
+2
Runs well
▾
Qwen 2.5 Coder 3B
2.0 GB
Strong code generation and editing at 3B
Q8
·
~15 tok/s
·
3.5 GB
LM Studio
Ollama
+1
Runs well
▾
Qwen 2.5 Coder 1.5B
1.1 GB
Best small code model — great for autocomplete
FP16
·
~17 tok/s
·
3.2 GB
LM Studio
Ollama
+1
Runs well
▾
👨💻
StarCoder2 3B
1.9 GB
BigCode's multilingual code model — 600+ languages
Q8
·
~16 tok/s
·
3.3 GB
LM Studio
Ollama
+1
Runs well
▾
SmolLM2 1.7B
1.0 GB
Largest SmolLM2 — punches above its weight
FP16
·
~16 tok/s
·
3.4 GB
LM Studio
Ollama
+2
Runs well
▾
StableLM 2 1.6B
1.0 GB
Stability AI's efficient small model
FP16
·
~16 tok/s
·
3.3 GB
LM Studio
Ollama
+2
Runs well
▾
👁️
Moondream 2B
1.3 GB
Small but capable vision model — image Q&A and captioning
FP16
·
~15 tok/s
·
3.7 GB
LM Studio
Ollama
Runs well
▾
🎨
SDXL Turbo
6.5 GB
High-res 1-step generation — 1024×1024 in seconds
Q8
·
~14 tok/s
·
3.8 GB
Runs well
▾
🗣️
Dia 1.6B
1.6 GB
Nari Labs dialogue TTS — multi-speaker with emotion
FP16
·
~16 tok/s
·
3.3 GB
Xybrid CLI
Runs well
▾
Whisper Large V3
3.1 GB
Best open transcription — near-human accuracy
FP16
·
~17 tok/s
·
3.1 GB
MacWhisper
Whisper Transcription
Runs well
▾
Bonsai 8B (1-bit)
1.16 GB
PrismML's native 1-bit model — 8B params in 1.16 GB, needs forked llama.cpp
Q1
·
~46 tok/s
·
1.2 GB
Tight fit
▾
Mistral Nemo 12B
7.1 GB
Mistral's 12B — Tekken tokenizer, 128K context
Q2
·
~13 tok/s
·
4.0 GB
LM Studio
Ollama
+1
Tight fit
▾
Llama 3.1 8B
4.9 GB
Meta's workhorse 8B — excellent all-around
Q4
·
~12 tok/s
·
4.6 GB
LM Studio
Ollama
+1
Tight fit
▾
🧠
DeepSeek R1 Distill 8B
4.9 GB
Llama-based R1 distill — best open reasoning at 8B
Q4
·
~12 tok/s
·
4.6 GB
LM Studio
Ollama
+1
Tight fit
▾
Qwen 2.5 7B
4.4 GB
Top-tier 7B chat — rivals much larger models
Q4
·
~12 tok/s
·
4.4 GB
LM Studio
Ollama
+1
Tight fit
▾
🧠
DeepSeek R1 Distill 7B
4.7 GB
Distilled from DeepSeek R1 — strong step-by-step reasoning
Q4
·
~12 tok/s
·
4.4 GB
LM Studio
Ollama
+1
Tight fit
▾
🧠
Qwen3 8B
5.2 GB
Alibaba's latest — thinking mode with strong reasoning
Q4
·
~11 tok/s
·
4.7 GB
LM Studio
Ollama
+1
Tight fit
▾
Qwen 2.5 Coder 7B
4.4 GB
Best open 7B code model — rivals GPT-4 on coding benchmarks
Q4
·
~12 tok/s
·
4.4 GB
LM Studio
Ollama
+1
Tight fit
▾
Gemma 3 4B
2.8 GB
Google's 4B — strong reasoning and instruction following
Q8
·
~11 tok/s
·
4.9 GB
LM Studio
Ollama
+1
Tight fit
▾
Gemma 3 12B
7.3 GB
Google's largest open model — near-frontier quality
Q2
·
~13 tok/s
·
4.1 GB
LM Studio
Ollama
+1
Tight fit
▾
Phi-4 Mini
8.7 GB
Microsoft's reasoning model — exceptional for its size
Q8
·
~12 tok/s
·
4.4 GB
LM Studio
Ollama
+1
Tight fit
▾
Mistral 7B
4.3 GB
High-quality reasoning and analysis
Q5
·
~11 tok/s
·
4.9 GB
LM Studio
Ollama
+1
Tight fit
▾
Qwen 2.5 VL 7B
4.6 GB
State-of-the-art vision-language — image, video, document understanding
Q4
·
~12 tok/s
·
4.5 GB
LM Studio
Ollama
Tight fit
▾
Phi-4 Medium
8.0 GB
Microsoft's 14B reasoning model — frontier-class performance
Q2
·
~11 tok/s
·
4.7 GB
LM Studio
Ollama
+1
Tight fit
▾
🎨
FLUX.1 Schnell
12.1 GB
Black Forest Labs' fast model — stunning quality in 4 steps
Q2
·
~13 tok/s
·
4.0 GB
Tight fit
▾
👨💻
DeepSeek Coder 6.7B
3.8 GB
DeepSeek's code model — strong at generation and debugging
Q5
·
~12 tok/s
·
4.5 GB
LM Studio
Ollama
+1
Tight fit
▾
Gemma 3n E4B
4.5 GB
Google's most capable on-device model — 4B effective, multimodal
Q5
·
~11 tok/s
·
5.0 GB
LM Studio
Ollama
+1
Tight fit
▾
Gemma 3n E2B
3.0 GB
Google's on-device multimodal — 2B effective params with vision and audio
Q8
·
~11 tok/s
·
4.8 GB
LM Studio
Ollama
+1
Tight fit
▾
👁️
LLaVA 1.6 7B
4.5 GB
Leading open vision model — image understanding and reasoning
Q5
·
~11 tok/s
·
4.7 GB
LM Studio
Ollama
Tight fit
▾
🎨
Stable Diffusion 3.5 Medium
4.6 GB
Latest SD architecture — excellent image quality
FP16
·
~12 tok/s
·
4.6 GB
Tight fit
▾
Llama 4 Scout
63 GB
Meta's MoE model — 109B total params, 17B active per token
needs 42.0 GB
Too heavy
BENCHMARKS
View 2 real-world benchmarks
Measured tok/s, RAM usage, and more from community tests
→
Your
Galaxy S25 Ultra
can run
33 AI models
locally.
Share this page ↗