Type
Reasoning
Chat
Parameters
17B
7.62B
Context
128K
128K
Min tier
Ultra
High
Runs on
8 / 47 devices
47 / 47 devices
Quant Llama 4 Scout Qwen 2.5 7B
FP16 218.0 GB 15.5 GB Qwen 2.5 7B smaller
Q8 115.0 GB 8.1 GB Qwen 2.5 7B smaller
Q6 84.0 GB 5.9 GB Qwen 2.5 7B smaller
Q5 75.0 GB 5.1 GB Qwen 2.5 7B smaller
Q4 63.0 GB 4.4 GB Qwen 2.5 7B smaller
Q3 52.0 GB 3.3 GB Qwen 2.5 7B smaller
Q2 42.0 GB 2.5 GB Qwen 2.5 7B smaller
Device Llama 4 Scout Qwen 2.5 7B
๐Ÿ’ป
MacBook Air M4 macOS
Too heavy Runs well Q8 ยท ~15 tok/s
๐Ÿ’ป
MacBook Air M3 macOS
Too heavy Runs well Q8 ยท ~12 tok/s
๐Ÿ’ป
MacBook Air M2 macOS
Too heavy Tight fit Q5 ยท ~20 tok/s
๐Ÿ’ป
MacBook Pro M4 Pro macOS
Too heavy Runs great FP16 ยท ~18 tok/s
๐Ÿ’ป
MacBook Air M1 macOS
Too heavy Tight fit Q5 ยท ~13 tok/s
๐Ÿ’ป
MacBook Air M1 macOS
Too heavy Runs well Q8 ยท ~8 tok/s
๐Ÿ’ป
MacBook Pro M1 macOS
Too heavy Runs well Q8 ยท ~8 tok/s
๐Ÿ’ป
MacBook Pro M1 Pro macOS
Too heavy Runs well Q8 ยท ~25 tok/s
๐Ÿ’ป
MacBook Pro M1 Pro macOS
Too heavy Runs well FP16 ยท ~13 tok/s
๐Ÿ’ป
MacBook Pro M1 Max macOS
Too heavy Runs well FP16 ยท ~26 tok/s
๐Ÿ’ป
MacBook Pro M1 Max macOS
Tight fit Q2 ยท ~10 tok/s Runs great FP16 ยท ~26 tok/s
๐Ÿ’ป
MacBook Pro M2 Pro macOS
Too heavy Runs well Q8 ยท ~25 tok/s
๐Ÿ’ป
MacBook Pro M2 Pro macOS
Too heavy Runs well FP16 ยท ~13 tok/s
๐Ÿ’ป
MacBook Pro M2 Max macOS
Too heavy Runs well FP16 ยท ~26 tok/s
๐Ÿ’ป
MacBook Pro M2 Max macOS
Tight fit Q2 ยท ~10 tok/s Runs great FP16 ยท ~26 tok/s
๐Ÿ’ป
MacBook Pro M3 Pro macOS
Too heavy Runs well Q8 ยท ~19 tok/s
๐Ÿ’ป
MacBook Pro M3 Pro macOS
Too heavy Runs great FP16 ยท ~10 tok/s
๐Ÿ’ป
MacBook Pro M3 Max macOS
Too heavy Runs great FP16 ยท ~26 tok/s
๐Ÿ’ป
MacBook Pro M3 Max macOS
Tight fit Q4 ยท ~6 tok/s Runs great FP16 ยท ~26 tok/s
๐Ÿ“ฑ
iPhone 16 Pro iOS
Too heavy Tight fit Q3 ยท ~14 tok/s
๐Ÿ“ฑ
iPhone 15 iOS
Too heavy Tight fit Q2 ยท ~12 tok/s
๐Ÿ“ฑ
Galaxy S25 Ultra Android
Too heavy Tight fit Q4 ยท ~12 tok/s
๐Ÿ“ฑ
Galaxy S24 Android
Too heavy Tight fit Q3 ยท ~13 tok/s
๐Ÿ“ฑ
Pixel 9 Pro Android
Too heavy Tight fit Q6 ยท ~8 tok/s
๐ŸŽฎ
Steam Deck OLED Linux
Too heavy Runs well Q8 ยท ~11 tok/s
๐Ÿ–ฅ๏ธ
Gaming PC (RTX 4070) Windows
Too heavy Runs well Q8 ยท ~62 tok/s
๐Ÿ–ฅ๏ธ
Gaming PC (RTX 3060) Windows
Too heavy Runs well Q8 ยท ~44 tok/s
๐Ÿ–ฅ๏ธ
Gaming PC (RTX 4080) Windows
Too heavy Runs great Q8 ยท ~89 tok/s
๐Ÿ–ฅ๏ธ
Gaming PC (RTX 4090) Windows
Tight fit Q2 ยท ~1 tok/s Runs well FP16 ยท ~65 tok/s
๐Ÿค–
Atom 1 Linux
Too heavy Runs well FP16 ยท ~13 tok/s
๐Ÿค–
Atom 1 Linux
Tight fit Q2 ยท ~7 tok/s Runs great FP16 ยท ~18 tok/s
๐Ÿค–
Atom 1 Linux
Tight fit Q6 ยท ~3 tok/s Runs great FP16 ยท ~18 tok/s
๐Ÿ“ฑ
iPad Pro M4 iOS
Too heavy Tight fit Q6 ยท ~14 tok/s
๐Ÿ–ฅ๏ธ
Mac Mini M1 macOS
Too heavy Tight fit Q5 ยท ~13 tok/s
๐Ÿ–ฅ๏ธ
Mac Mini M1 macOS
Too heavy Runs well Q8 ยท ~8 tok/s
๐Ÿ–ฅ๏ธ
Mac Mini M2 macOS
Too heavy Tight fit Q5 ยท ~20 tok/s
๐Ÿ–ฅ๏ธ
Mac Mini M2 Pro macOS
Too heavy Runs well Q8 ยท ~25 tok/s
๐Ÿ–ฅ๏ธ
Mac Mini M2 Pro macOS
Too heavy Runs well FP16 ยท ~13 tok/s
๐Ÿ–ฅ๏ธ
Mac Mini M4 macOS
Too heavy Runs well Q8 ยท ~15 tok/s
๐Ÿ–ฅ๏ธ
Mac Mini M4 macOS
Too heavy Runs well FP16 ยท ~8 tok/s
๐Ÿ–ฅ๏ธ
Mac Mini M4 Pro macOS
Too heavy Tight fit FP16 ยท ~18 tok/s
๐Ÿ–ฅ๏ธ
Mac Mini M4 Pro macOS
Too heavy Runs great FP16 ยท ~18 tok/s
๐Ÿ–ฅ๏ธ
Mac Studio M4 Max macOS
Tight fit Q2 ยท ~13 tok/s Runs great FP16 ยท ~35 tok/s
๐Ÿ–ฅ๏ธ
Mac Pro M2 Ultra macOS
Tight fit Q8 ยท ~7 tok/s Runs great FP16 ยท ~52 tok/s
๐Ÿ’ป
Snapdragon X Elite Laptop Windows
Too heavy Runs well Q8 ยท ~17 tok/s
๐Ÿ“ฑ
OnePlus 13 Android
Too heavy Tight fit Q6 ยท ~9 tok/s
๐Ÿ“
Raspberry Pi 5 Linux
Too heavy Tight fit Q6 ยท ~5 tok/s

Qwen 2.5 7B fits more devices (47 vs 8). Llama 4 Scout is the larger model and may produce better quality outputs, while Qwen 2.5 7B is lighter on resources. For memory-constrained devices, Qwen 2.5 7B is smaller at its lowest quant (2.5 GB vs 42.0 GB).