Which runs better on your device? Side-by-side comparison of specs, quantization sizes, and device compatibility.
| Quant | Qwen 2.5 3B | Llama 3.2 3B | |
|---|---|---|---|
| FP16 | 6.4 GB | 6.6 GB | Qwen 2.5 3B smaller |
| Q8 | 3.5 GB | 3.6 GB | Qwen 2.5 3B smaller |
| Q6 | 2.5 GB | 2.6 GB | Qwen 2.5 3B smaller |
| Q5 | 2.2 GB | 2.3 GB | Qwen 2.5 3B smaller |
| Q4 | 1.8 GB | 1.9 GB | Qwen 2.5 3B smaller |
| Q3 | 1.4 GB | 1.4 GB | |
| Q2 | 1.1 GB | 1.1 GB |
| Device | Qwen 2.5 3B | Llama 3.2 3B |
|---|---|---|
| 💻 MacBook Air M4 macOS | Runs great FP16 · ~19 tok/s | Runs great FP16 · ~18 tok/s |
| 💻 MacBook Air M3 macOS | Runs great FP16 · ~16 tok/s | Runs great FP16 · ~15 tok/s |
| 💻 MacBook Air M2 macOS | Runs well Q8 · ~29 tok/s | Runs well Q8 · ~28 tok/s |
| 💻 MacBook Pro M4 Pro macOS | Runs great FP16 · ~43 tok/s | Runs great FP16 · ~41 tok/s |
| 💻 MacBook Air M1 macOS | Runs well Q8 · ~20 tok/s | Runs well Q8 · ~19 tok/s |
| 💻 MacBook Air M1 macOS | Runs great FP16 · ~11 tok/s | Runs great FP16 · ~10 tok/s |
| 💻 MacBook Pro M1 macOS | Runs great FP16 · ~11 tok/s | Runs great FP16 · ~10 tok/s |
| 💻 MacBook Pro M1 Pro macOS | Runs great FP16 · ~31 tok/s | Runs great FP16 · ~30 tok/s |
| 💻 MacBook Pro M1 Pro macOS | Runs great FP16 · ~31 tok/s | Runs great FP16 · ~30 tok/s |
| 💻 MacBook Pro M1 Max macOS | Runs great FP16 · ~63 tok/s | Runs great FP16 · ~61 tok/s |
| 💻 MacBook Pro M1 Max macOS | Runs great FP16 · ~63 tok/s | Runs great FP16 · ~61 tok/s |
| 💻 MacBook Pro M2 Pro macOS | Runs great FP16 · ~31 tok/s | Runs great FP16 · ~30 tok/s |
| 💻 MacBook Pro M2 Pro macOS | Runs great FP16 · ~31 tok/s | Runs great FP16 · ~30 tok/s |
| 💻 MacBook Pro M2 Max macOS | Runs great FP16 · ~63 tok/s | Runs great FP16 · ~61 tok/s |
| 💻 MacBook Pro M2 Max macOS | Runs great FP16 · ~63 tok/s | Runs great FP16 · ~61 tok/s |
| 💻 MacBook Pro M3 Pro macOS | Runs great FP16 · ~23 tok/s | Runs great FP16 · ~23 tok/s |
| 💻 MacBook Pro M3 Pro macOS | Runs great FP16 · ~23 tok/s | Runs great FP16 · ~23 tok/s |
| 💻 MacBook Pro M3 Max macOS | Runs great FP16 · ~63 tok/s | Runs great FP16 · ~61 tok/s |
| 💻 MacBook Pro M3 Max macOS | Runs great FP16 · ~63 tok/s | Runs great FP16 · ~61 tok/s |
| 📱 iPhone 16 Pro iOS | Tight fit Q8 · ~14 tok/s | Tight fit Q8 · ~13 tok/s |
| 📱 iPhone 15 iOS | Tight fit Q6 · ~12 tok/s | Tight fit Q6 · ~11 tok/s |
| 📱 Galaxy S25 Ultra Android | Runs well Q8 · ~15 tok/s | Runs well Q8 · ~15 tok/s |
| 📱 Galaxy S24 Android | Runs well Q6 · ~17 tok/s | Runs well Q6 · ~16 tok/s |
| 📱 Pixel 9 Pro Android | Tight fit FP16 · ~7 tok/s | Tight fit FP16 · ~7 tok/s |
| 🎮 Steam Deck OLED Linux | Runs great FP16 · ~14 tok/s | Runs great FP16 · ~13 tok/s |
| 🖥️ Gaming PC (RTX 4070) Windows | Runs great FP16 · ~79 tok/s | Runs well FP16 · ~76 tok/s |
| 🖥️ Gaming PC (RTX 3060) Windows | Runs great FP16 · ~56 tok/s | Runs well FP16 · ~55 tok/s |
| 🖥️ Gaming PC (RTX 4080) Windows | Runs great FP16 · ~112 tok/s | Runs great FP16 · ~109 tok/s |
| 🖥️ Gaming PC (RTX 4090) Windows | Runs great FP16 · ~158 tok/s | Runs great FP16 · ~153 tok/s |
| 🤖 Atom 1 Linux | Runs great FP16 · ~32 tok/s | Runs great FP16 · ~31 tok/s |
| 🤖 Atom 1 Linux | Runs great FP16 · ~43 tok/s | Runs great FP16 · ~41 tok/s |
| 🤖 Atom 1 Linux | Runs great FP16 · ~43 tok/s | Runs great FP16 · ~41 tok/s |
| 📱 iPad Pro M4 iOS | Tight fit FP16 · ~13 tok/s | Tight fit FP16 · ~13 tok/s |
| 🖥️ Mac Mini M1 macOS | Runs well Q8 · ~20 tok/s | Runs well Q8 · ~19 tok/s |
| 🖥️ Mac Mini M1 macOS | Runs great FP16 · ~11 tok/s | Runs great FP16 · ~10 tok/s |
| 🖥️ Mac Mini M2 macOS | Runs well Q8 · ~29 tok/s | Runs well Q8 · ~28 tok/s |
| 🖥️ Mac Mini M2 Pro macOS | Runs great FP16 · ~31 tok/s | Runs great FP16 · ~30 tok/s |
| 🖥️ Mac Mini M2 Pro macOS | Runs great FP16 · ~31 tok/s | Runs great FP16 · ~30 tok/s |
| 🖥️ Mac Mini M4 macOS | Runs great FP16 · ~19 tok/s | Runs great FP16 · ~18 tok/s |
| 🖥️ Mac Mini M4 macOS | Runs great FP16 · ~19 tok/s | Runs great FP16 · ~18 tok/s |
| 🖥️ Mac Mini M4 Pro macOS | Runs great FP16 · ~43 tok/s | Runs great FP16 · ~41 tok/s |
| 🖥️ Mac Mini M4 Pro macOS | Runs great FP16 · ~43 tok/s | Runs great FP16 · ~41 tok/s |
| 🖥️ Mac Studio M4 Max macOS | Runs great FP16 · ~85 tok/s | Runs great FP16 · ~83 tok/s |
| 🖥️ Mac Pro M2 Ultra macOS | Runs great FP16 · ~125 tok/s | Runs great FP16 · ~121 tok/s |
| 💻 Snapdragon X Elite Laptop Windows | Runs great FP16 · ~21 tok/s | Runs well FP16 · ~21 tok/s |
| 📱 OnePlus 13 Android | Tight fit FP16 · ~8 tok/s | Tight fit FP16 · ~8 tok/s |
| 🍓 Raspberry Pi 5 Linux | Runs great Q8 · ~9 tok/s | Runs great Q8 · ~9 tok/s |
Both models run on 47 of 47 devices. Llama 3.2 3B has a larger context window (128K vs 32K). Llama 3.2 3B is the larger model and may produce better quality outputs, while Qwen 2.5 3B is lighter on resources.