Which runs better on your device? Side-by-side comparison of specs, quantization sizes, and device compatibility.
| Quant | Qwen 2.5 7B | Llama 3.1 8B | |
|---|---|---|---|
| FP16 | 15.5 GB | 16.1 GB | Qwen 2.5 7B smaller |
| Q8 | 8.1 GB | 8.5 GB | Qwen 2.5 7B smaller |
| Q6 | 5.9 GB | 6.2 GB | Qwen 2.5 7B smaller |
| Q5 | 5.1 GB | 5.3 GB | Qwen 2.5 7B smaller |
| Q4 | 4.4 GB | 4.6 GB | Qwen 2.5 7B smaller |
| Q3 | 3.3 GB | 3.4 GB | Qwen 2.5 7B smaller |
| Q2 | 2.5 GB | 2.7 GB | Qwen 2.5 7B smaller |
| Device | Qwen 2.5 7B | Llama 3.1 8B |
|---|---|---|
| 💻 MacBook Air M4 macOS | Runs well Q8 · ~15 tok/s | Runs well Q8 · ~14 tok/s |
| 💻 MacBook Air M3 macOS | Runs well Q8 · ~12 tok/s | Runs well Q8 · ~12 tok/s |
| 💻 MacBook Air M2 macOS | Tight fit Q5 · ~20 tok/s | Tight fit Q5 · ~19 tok/s |
| 💻 MacBook Pro M4 Pro macOS | Runs great FP16 · ~18 tok/s | Runs well FP16 · ~17 tok/s |
| 💻 MacBook Air M1 macOS | Tight fit Q5 · ~13 tok/s | Tight fit Q5 · ~13 tok/s |
| 💻 MacBook Air M1 macOS | Runs well Q8 · ~8 tok/s | Runs well Q8 · ~8 tok/s |
| 💻 MacBook Pro M1 macOS | Runs well Q8 · ~8 tok/s | Runs well Q8 · ~8 tok/s |
| 💻 MacBook Pro M1 Pro macOS | Runs well Q8 · ~25 tok/s | Runs well Q8 · ~24 tok/s |
| 💻 MacBook Pro M1 Pro macOS | Runs well FP16 · ~13 tok/s | Runs well FP16 · ~12 tok/s |
| 💻 MacBook Pro M1 Max macOS | Runs well FP16 · ~26 tok/s | Runs well FP16 · ~25 tok/s |
| 💻 MacBook Pro M1 Max macOS | Runs great FP16 · ~26 tok/s | Runs great FP16 · ~25 tok/s |
| 💻 MacBook Pro M2 Pro macOS | Runs well Q8 · ~25 tok/s | Runs well Q8 · ~24 tok/s |
| 💻 MacBook Pro M2 Pro macOS | Runs well FP16 · ~13 tok/s | Runs well FP16 · ~12 tok/s |
| 💻 MacBook Pro M2 Max macOS | Runs well FP16 · ~26 tok/s | Runs well FP16 · ~25 tok/s |
| 💻 MacBook Pro M2 Max macOS | Runs great FP16 · ~26 tok/s | Runs great FP16 · ~25 tok/s |
| 💻 MacBook Pro M3 Pro macOS | Runs well Q8 · ~19 tok/s | Runs well Q8 · ~18 tok/s |
| 💻 MacBook Pro M3 Pro macOS | Runs great FP16 · ~10 tok/s | Runs well FP16 · ~9 tok/s |
| 💻 MacBook Pro M3 Max macOS | Runs great FP16 · ~26 tok/s | Runs well FP16 · ~25 tok/s |
| 💻 MacBook Pro M3 Max macOS | Runs great FP16 · ~26 tok/s | Runs great FP16 · ~25 tok/s |
| 📱 iPhone 16 Pro iOS | Tight fit Q3 · ~14 tok/s | Tight fit Q3 · ~14 tok/s |
| 📱 iPhone 15 iOS | Tight fit Q2 · ~12 tok/s | Tight fit Q2 · ~11 tok/s |
| 📱 Galaxy S25 Ultra Android | Tight fit Q4 · ~12 tok/s | Tight fit Q4 · ~12 tok/s |
| 📱 Galaxy S24 Android | Tight fit Q3 · ~13 tok/s | Tight fit Q2 · ~16 tok/s |
| 📱 Pixel 9 Pro Android | Tight fit Q6 · ~8 tok/s | Tight fit Q6 · ~8 tok/s |
| 🎮 Steam Deck OLED Linux | Runs well Q8 · ~11 tok/s | Runs well Q8 · ~10 tok/s |
| 🖥️ Gaming PC (RTX 4070) Windows | Runs well Q8 · ~62 tok/s | Runs well Q8 · ~59 tok/s |
| 🖥️ Gaming PC (RTX 3060) Windows | Runs well Q8 · ~44 tok/s | Runs well Q8 · ~42 tok/s |
| 🖥️ Gaming PC (RTX 4080) Windows | Runs great Q8 · ~89 tok/s | Runs great Q8 · ~84 tok/s |
| 🖥️ Gaming PC (RTX 4090) Windows | Runs well FP16 · ~65 tok/s | Runs well FP16 · ~63 tok/s |
| 🤖 Atom 1 Linux | Runs well FP16 · ~13 tok/s | Runs well FP16 · ~13 tok/s |
| 🤖 Atom 1 Linux | Runs great FP16 · ~18 tok/s | Runs great FP16 · ~17 tok/s |
| 🤖 Atom 1 Linux | Runs great FP16 · ~18 tok/s | Runs great FP16 · ~17 tok/s |
| 📱 iPad Pro M4 iOS | Tight fit Q6 · ~14 tok/s | Tight fit Q6 · ~14 tok/s |
| 🖥️ Mac Mini M1 macOS | Tight fit Q5 · ~13 tok/s | Tight fit Q5 · ~13 tok/s |
| 🖥️ Mac Mini M1 macOS | Runs well Q8 · ~8 tok/s | Runs well Q8 · ~8 tok/s |
| 🖥️ Mac Mini M2 macOS | Tight fit Q5 · ~20 tok/s | Tight fit Q5 · ~19 tok/s |
| 🖥️ Mac Mini M2 Pro macOS | Runs well Q8 · ~25 tok/s | Runs well Q8 · ~24 tok/s |
| 🖥️ Mac Mini M2 Pro macOS | Runs well FP16 · ~13 tok/s | Runs well FP16 · ~12 tok/s |
| 🖥️ Mac Mini M4 macOS | Runs well Q8 · ~15 tok/s | Runs well Q8 · ~14 tok/s |
| 🖥️ Mac Mini M4 macOS | Runs well FP16 · ~8 tok/s | Runs well FP16 · ~7 tok/s |
| 🖥️ Mac Mini M4 Pro macOS | Tight fit FP16 · ~18 tok/s | Tight fit FP16 · ~17 tok/s |
| 🖥️ Mac Mini M4 Pro macOS | Runs great FP16 · ~18 tok/s | Runs great FP16 · ~17 tok/s |
| 🖥️ Mac Studio M4 Max macOS | Runs great FP16 · ~35 tok/s | Runs great FP16 · ~34 tok/s |
| 🖥️ Mac Pro M2 Ultra macOS | Runs great FP16 · ~52 tok/s | Runs great FP16 · ~50 tok/s |
| 💻 Snapdragon X Elite Laptop Windows | Runs well Q8 · ~17 tok/s | Runs well Q8 · ~16 tok/s |
| 📱 OnePlus 13 Android | Tight fit Q6 · ~9 tok/s | Tight fit Q6 · ~9 tok/s |
| 🍓 Raspberry Pi 5 Linux | Tight fit Q6 · ~5 tok/s | Tight fit Q5 · ~6 tok/s |
| 💻 MacBook Air M2 macOS | Runs well Q8 · ~12 tok/s | Runs well Q8 · ~12 tok/s |
| 💻 MacBook Air M3 macOS | Tight fit Q5 · ~20 tok/s | Tight fit Q5 · ~19 tok/s |
| 🖥️ Mac Studio M1 Ultra macOS | Runs great FP16 · ~52 tok/s | Runs great FP16 · ~50 tok/s |
| 🖥️ Mac Studio M2 Ultra macOS | Runs great FP16 · ~52 tok/s | Runs great FP16 · ~50 tok/s |
| 🖥️ Mac Studio M3 Ultra macOS | Runs great FP16 · ~53 tok/s | Runs great FP16 · ~51 tok/s |
| 💻 MacBook Pro M4 Max macOS | Runs great FP16 · ~35 tok/s | Runs great FP16 · ~34 tok/s |
| 💻 MacBook Pro M5 macOS | Runs well Q8 · ~19 tok/s | Runs well Q8 · ~18 tok/s |
| 💻 MacBook Pro M5 Pro macOS | Tight fit FP16 · ~19 tok/s | Tight fit FP16 · ~19 tok/s |
| 💻 MacBook Pro M5 Max macOS | Runs great FP16 · ~39 tok/s | Runs great FP16 · ~37 tok/s |
| 🖥️ Gaming PC (RTX 4060) Windows | Tight fit Q6 · ~46 tok/s | Tight fit Q6 · ~44 tok/s |
| 🖥️ Gaming PC (RTX 3070) Windows | Tight fit Q6 · ~76 tok/s | Tight fit Q6 · ~72 tok/s |
| 🖥️ Gaming PC (RTX 3080) Windows | Tight fit Q8 · ~94 tok/s | Tight fit Q8 · ~89 tok/s |
| 🖥️ Gaming PC (RTX 3090) Windows | Runs well FP16 · ~60 tok/s | Runs well FP16 · ~58 tok/s |
| 🖥️ Gaming PC (RTX 5070) Windows | Runs well Q8 · ~83 tok/s | Runs well Q8 · ~79 tok/s |
| 🖥️ Gaming PC (RTX 5080) Windows | Runs great Q8 · ~119 tok/s | Runs great Q8 · ~113 tok/s |
| 🖥️ Gaming PC (RTX 5090) Windows | Runs great FP16 · ~116 tok/s | Runs great FP16 · ~111 tok/s |
| 🖥️ Gaming PC (RX 7800 XT) Windows | Runs great Q8 · ~77 tok/s | Runs great Q8 · ~73 tok/s |
| 🖥️ Gaming PC (RX 7900 XTX) Windows | Runs well FP16 · ~62 tok/s | Runs well FP16 · ~60 tok/s |
| 🖥️ Gaming PC (Arc B580) Windows | Runs well Q8 · ~56 tok/s | Runs well Q8 · ~54 tok/s |
| 🖥️ Gaming PC (Arc A770) Windows | Runs great Q8 · ~69 tok/s | Runs great Q8 · ~66 tok/s |
Both models run on 67 of 67 devices. Llama 3.1 8B is the larger model and may produce better quality outputs, while Qwen 2.5 7B is lighter on resources. For memory-constrained devices, Qwen 2.5 7B is smaller at its lowest quant (2.5 GB vs 2.7 GB).