All Models — onmydevice.ai

BROWSE BY TIER

FILTER BY TASK

Chat 32 models

SmolLM2 135M 99 MB

Ultra-tiny language model — runs anywhere

8K ctx Low+ tier

SmolLM2 360M 267 MB

Ultra-small language model

8K ctx Low+ tier

Qwen 2.5 0.5B 477 MB

Lightweight chat — surprisingly capable

32K ctx Low+ tier

Gemma 3 1B 786 MB

Google's compact model — great reasoning for its size

32K ctx Low+ tier

LFM2.5 1.2B 731 MB

Liquid AI's hybrid model — blazing fast CPU inference

32K ctx Low+ tier

TinyLlama 1.1B 780 MB

Popular tiny model trained on 3T tokens

2048 ctx Low+ tier

Llama 3.2 1B 791 MB

Meta's solid all-rounder

128K ctx Mid+ tier

StableLM 2 1.6B 1.0 GB

Stability AI's efficient small model

4K ctx Mid+ tier

SmolLM2 1.7B 1.0 GB

Largest SmolLM2 — punches above its weight

8K ctx Mid+ tier

Gemma 3n E2B 3.0 GB

Google's on-device multimodal — 2B effective params with vision and audio

32K ctx Mid+ tier

Qwen 2.5 3B 2.0 GB

Sweet-spot model — great quality-to-size ratio

32K ctx Mid+ tier

Llama 3.2 3B 2.0 GB

Meta's 3B — best small model for many tasks

128K ctx Mid+ tier

Gemma 3n E4B 4.5 GB

Google's most capable on-device model — 4B effective, multimodal

32K ctx High+ tier

Gemma 3 4B 2.8 GB

Google's 4B — strong reasoning and instruction following

128K ctx High+ tier

Qwen 2.5 7B 4.4 GB

Top-tier 7B chat — rivals much larger models

128K ctx High+ tier

Llama 3.1 8B 4.9 GB

Meta's workhorse 8B — excellent all-around

128K ctx High+ tier

Bonsai 8B (1-bit) 1.16 GB

PrismML's native 1-bit model — 8B params in 1.16 GB, needs forked llama.cpp

32K ctx Mid+ tier

Ternary Bonsai 1.7B 380 MB

PrismML's 1.58-bit ternary model — 1.7B params in under 0.4 GB, runs anywhere

32K ctx Low+ tier

Ternary Bonsai 4B 860 MB

PrismML's 1.58-bit ternary model — 4B-class intelligence in ~0.86 GB, ~9x smaller than fp16

32K ctx Low+ tier

Ternary Bonsai 8B 1.6 GB

PrismML's 1.58-bit ternary model — top intelligence at 8B in ~1.6 GB

32K ctx Mid+ tier

Gemma 3 12B 7.3 GB

Google's largest open model — near-frontier quality

128K ctx Ultra+ tier

Mistral Nemo 12B 7.1 GB

Mistral's 12B — Tekken tokenizer, 128K context

128K ctx Ultra+ tier

Gemma 4 E2B 1.5 GB

Google's 2026 on-device model — 2.3B active / 5.1B total, multimodal vision and audio

128K ctx Mid+ tier

Gemma 4 E4B 3.0 GB

Google's most capable edge model — 4B effective, multimodal, 128K context

128K ctx High+ tier

Gemma 4 26B A4B 16 GB

Google's sparse MoE — 26B total, 3.8B active per token, 256K context

128K ctx Ultra+ tier

Gemma 4 31B 18 GB

Google's dense flagship open model — near server-grade quality, 256K context

128K ctx Ultra+ tier

Qwen 3.5 0.8B 560 MB

Alibaba's tiny multimodal model — vision-capable, 262K context, runs on phones

128K ctx Low+ tier

Qwen 3.5 2B 1.3 GB

Compact multimodal chat — native vision, 262K context

128K ctx Mid+ tier

Qwen 3.5 4B 2.5 GB

Multimodal sweet spot — vision, strong reasoning, 262K context

128K ctx Mid+ tier

Qwen 3.5 9B 5.5 GB

Flagship small Qwen — native multimodal, 262K context, rivals far larger models

128K ctx High+ tier

Qwen 3.5 35B A3B 20 GB

Alibaba's sparse MoE — 35B total, 3B active per token, multimodal, 262K context

128K ctx Ultra+ tier

LFM2.5 8B A1B 4.7 GB

Liquid AI's on-device MoE — 8.3B total, 1.5B active, reasoning + tool calling at 3–4B quality

128K ctx Mid+ tier

Reasoning 10 models

DeepSeek R1 Distill 1.5B 1.1 GB

Distilled reasoning — chain-of-thought in a tiny package

128K ctx Mid+ tier

Phi-4 Mini 8.7 GB

Microsoft's reasoning model — exceptional for its size

128K ctx High+ tier

DeepSeek R1 Distill 7B 4.7 GB

Distilled from DeepSeek R1 — strong step-by-step reasoning

128K ctx High+ tier

Mistral 7B 4.3 GB

High-quality reasoning and analysis

32K ctx High+ tier

DeepSeek R1 Distill 8B 4.9 GB

Llama-based R1 distill — best open reasoning at 8B

128K ctx High+ tier

Qwen3 8B 5.2 GB

Alibaba's latest — thinking mode with strong reasoning

128K ctx High+ tier

Phi-4 Medium 8.0 GB

Microsoft's 14B reasoning model — frontier-class performance

128K ctx Ultra+ tier

Llama 4 Scout 63 GB

Meta's MoE model — 109B total params, 17B active per token

128K ctx Ultra+ tier

DeepSeek V4 Flash 156 GB

DeepSeek's V4 Flash MoE — 284B total, 13B active, 1M context (server-grade hardware)

128K ctx Ultra+ tier

Kimi K2.6 594 GB

Moonshot's 1T MoE — 32B active, frontier agentic coding; needs a multi-GPU workstation

128K ctx Ultra+ tier

Code 8 models

Qwen 2.5 Coder 0.5B 477 MB

Tiny code completion model — autocomplete on any device

32K ctx Low+ tier

Qwen 2.5 Coder 1.5B 1.1 GB

Best small code model — great for autocomplete

32K ctx Mid+ tier

→ 👨‍💻

StarCoder2 3B 1.9 GB

BigCode's multilingual code model — 600+ languages

8K ctx Mid+ tier

Qwen 2.5 Coder 3B 2.0 GB

Strong code generation and editing at 3B

32K ctx Mid+ tier

→ 👨‍💻

DeepSeek Coder 6.7B 3.8 GB

DeepSeek's code model — strong at generation and debugging

8K ctx High+ tier

Qwen 2.5 Coder 7B 4.4 GB

Best open 7B code model — rivals GPT-4 on coding benchmarks

128K ctx High+ tier

→ 👨‍💻

Laguna XS.2 19 GB

Poolside's open agentic coding MoE — 33B total, 3B active, runs locally on 36GB Macs

128K ctx Ultra+ tier

Qwen3 Coder Next 46 GB

Alibaba's local coding agent MoE — 80B total, 3B active, agentic long-horizon coding, 256K context

128K ctx Ultra+ tier

Vision 4 models

SmolVLM 500M 490 MB

Tiny vision-language model — describe images on any device

4K ctx Low+ tier

Moondream 2B 1.3 GB

Small but capable vision model — image Q&A and captioning

2048 ctx Mid+ tier

LLaVA 1.6 7B 4.5 GB

Leading open vision model — image understanding and reasoning

4K ctx High+ tier

Qwen 2.5 VL 7B 4.6 GB

State-of-the-art vision-language — image, video, document understanding

32K ctx High+ tier

Embedding 4 models

all-MiniLM-L6-v2 23 MB

Fast sentence embeddings — ideal for semantic search

256 ctx Low+ tier

BGE Small 34 MB

Compact BAAI embedding — great for RAG pipelines

512 ctx Low+ tier

Nomic Embed Text 137 MB

Open-source embedding with 8K context — long document search

8K ctx Low+ tier

GTE Large 335 MB

High-quality embeddings — top of MTEB benchmark at its size

512 ctx Low+ tier

Image 5 models

Stable Diffusion Turbo 2.2 GB

1-step image generation — instant results

77 ctx Mid+ tier

SDXL Turbo 6.5 GB

High-res 1-step generation — 1024×1024 in seconds

77 ctx High+ tier

Stable Diffusion 3.5 Medium 4.6 GB

Latest SD architecture — excellent image quality

77 ctx High+ tier

FLUX.1 Schnell 12.1 GB

Black Forest Labs' fast model — stunning quality in 4 steps

256 ctx Ultra+ tier

Bonsai Image 4B 1.8 GB

PrismML's 1.58-bit ternary diffusion — first 4B image model to run on iPhone, ~3GB in-browser

256 ctx Mid+ tier

Voice 6 models

KittenTTS Nano 19 MB

Tiny, fast voice synthesis

256 ctx Low+ tier

Kokoro 82M 183 MB

24 natural voices — instant TTS

512 ctx Low+ tier

OuteTTS 0.3 500M 500 MB

Voice cloning and natural TTS — zero-shot voice synthesis

4K ctx Mid+ tier

Dia 1.6B 1.6 GB

Nari Labs dialogue TTS — multi-speaker with emotion

4K ctx High+ tier

NeuTTS Air 450 MB

Neuphonic's on-device TTS — 748M, instant voice cloning, runs on CPU via llama.cpp

2048 ctx Low+ tier

KittenTTS Mini 80 MB

KittenML's expressive 80M TTS — 8 voices, quantization-aware, runs on any CPU

512 ctx Low+ tier

Transcription 6 models

Whisper Tiny 89 MB

Real-time speech recognition — runs on anything

448 ctx Low+ tier

Wav2Vec2 Base 231 MB

High-accuracy English ASR

512 ctx Low+ tier

Whisper Small 488 MB

Good accuracy-speed balance — 99 languages

448 ctx Low+ tier

Whisper Medium 1.5 GB

Strong multilingual transcription

448 ctx Mid+ tier

Whisper Large V3 3.1 GB

Best open transcription — near-human accuracy

448 ctx High+ tier

Distil-Whisper Large V3 1.5 GB

6x faster than Whisper Large — nearly same accuracy

448 ctx Mid+ tier