Chat 19 models
SmolLM2 135M
SmolLM2 135M 99 MB
Ultra-tiny language model — runs anywhere
8K ctx Low+ tier
SmolLM2 360M
SmolLM2 360M 267 MB
Ultra-small language model
8K ctx Low+ tier
Qwen 2.5 0.5B
Qwen 2.5 0.5B 477 MB
Lightweight chat — surprisingly capable
32K ctx Low+ tier
Gemma 3 1B
Gemma 3 1B 786 MB
Google's compact model — great reasoning for its size
32K ctx Low+ tier
💬
LFM2.5 1.2B 731 MB
Liquid AI's hybrid model — blazing fast CPU inference
32K ctx Low+ tier
TinyLlama 1.1B
TinyLlama 1.1B 780 MB
Popular tiny model trained on 3T tokens
2048 ctx Low+ tier
Llama 3.2 1B
Llama 3.2 1B 791 MB
Meta's solid all-rounder
128K ctx Mid+ tier
StableLM 2 1.6B
StableLM 2 1.6B 1.0 GB
Stability AI's efficient small model
4K ctx Mid+ tier
SmolLM2 1.7B
SmolLM2 1.7B 1.0 GB
Largest SmolLM2 — punches above its weight
8K ctx Mid+ tier
Gemma 3n E2B
Gemma 3n E2B 3.0 GB
Google's on-device multimodal — 2B effective params with vision and audio
32K ctx Mid+ tier
Qwen 2.5 3B
Qwen 2.5 3B 2.0 GB
Sweet-spot model — great quality-to-size ratio
32K ctx Mid+ tier
Llama 3.2 3B
Llama 3.2 3B 2.0 GB
Meta's 3B — best small model for many tasks
128K ctx Mid+ tier
Gemma 3n E4B
Gemma 3n E4B 4.5 GB
Google's most capable on-device model — 4B effective, multimodal
32K ctx High+ tier
Gemma 3 4B
Gemma 3 4B 2.8 GB
Google's 4B — strong reasoning and instruction following
128K ctx High+ tier
Qwen 2.5 7B
Qwen 2.5 7B 4.4 GB
Top-tier 7B chat — rivals much larger models
128K ctx High+ tier
Llama 3.1 8B
Llama 3.1 8B 4.9 GB
Meta's workhorse 8B — excellent all-around
128K ctx High+ tier
Bonsai 8B (1-bit)
Bonsai 8B (1-bit) 1.16 GB
PrismML's native 1-bit model — 8B params in 1.16 GB, needs forked llama.cpp
32K ctx Mid+ tier
Gemma 3 12B
Gemma 3 12B 7.3 GB
Google's largest open model — near-frontier quality
128K ctx Ultra+ tier
Mistral Nemo 12B
Mistral Nemo 12B 7.1 GB
Mistral's 12B — Tekken tokenizer, 128K context
128K ctx Ultra+ tier
Reasoning 8 models
Code 6 models
Vision 4 models
Embedding 4 models
Image 4 models
Voice 4 models
Transcription 6 models