Ollama's model library has hundreds of options. Most articles recommend models without telling you whether they'll fit on your machine. This ranking is grouped by MacBook RAM — 8 GB, 16 GB, 24 GB, 32 GB+ — so you can scan to your exact hardware and pick a model that actually runs well, not a model that technically loads and then crashes Activity Monitor. All 12 are available via ollama pull.
Short version: on 8 GB use llama3.2:3b. On 16 GB use qwen2.5:7b. On 24 GB step up to deepseek-coder-v2:16b for code or mistral-nemo:12b for general. On 32 GB+ use qwen2.5:32b or llama3.3:70b. Jump to the table. And if you want these same models on your iPhone, PocketLLM has you covered.
How to read this list
Ollama automatically quantizes most models to Q4 on Mac unless you request a different precision. The file sizes and memory footprints below reflect Q4 — the default. If you're on a tight memory budget, you can sometimes drop to Q3 or Q2 with meaningful quality loss, or if you have headroom, bump to Q5 or Q8 for marginal quality gains. The sweet spot is Q4 on Apple Silicon, full stop.
For 8 GB MacBooks (Air M1/M2, older models)
1. llama3.2:3b — Best 8 GB pick
Meta's Llama 3.2 3B is the single best model for 8 GB MacBooks. ~2 GB on disk at Q4, uses about 3 GB of RAM while running, and hits 28-35 tok/s on an M2 Air. Quality is competitive with 7B models from 2024. Pull: ollama pull llama3.2:3b.
2. phi3.5:3.8b-mini-instruct-q4_0 — Best reasoning on 8 GB
Microsoft's Phi-3.5 Mini. ~2.4 GB on disk. Better than Llama 3.2 3B at structured reasoning, code, and math; slightly worse at general conversation. Run both, use whichever feels right for the task.
3. gemma2:2b — Fastest on 8 GB
Google's Gemma 2 2B. ~1.6 GB on disk. Highest tok/s of any model on this list (40-50 on an M2 Air) at the cost of some quality. Best multilingual performance in the small-model category. Great default for chat UIs that care about latency.
For 16 GB MacBooks (Pro, Air, Mini)
4. qwen2.5:7b — Best 16 GB pick
Alibaba's Qwen 2.5 7B is the best general-purpose model in this range. Apache 2.0 license, ~4.5 GB on disk, beats Llama 3.1 8B on most benchmarks. Pull: ollama pull qwen2.5:7b. On an M3 Pro, expect 18-25 tok/s.
5. llama3.1:8b — Most battle-tested 16 GB pick
The workhorse of the current Llama generation. ~5 GB on disk. Slightly behind Qwen 2.5 on benchmarks, ahead on fine-tune ecosystem and tool integration. Pick this if you're using LangChain, LiteLLM, or another tool with Llama-specific optimizations.
6. qwen2.5-coder:7b — Best 16 GB coding pick
The coding variant of Qwen 2.5 7B. HumanEval in the high 70s from a 4.5 GB file. If you use Continue.dev, Cody, or Cursor with Ollama, this is the current best option for "fits on a laptop and writes good code." See our best local LLMs for coding post for the full coding-focused ranking.
For 24 GB MacBooks (Pro M3/M4 base, Air with upgrades)
7. mistral-nemo:12b — Best 24 GB general pick
Mistral + NVIDIA's Mistral Nemo. 128K context window, Apache 2.0, ~7.5 GB at Q4. The best open model for long-context tasks that will run comfortably on 24 GB. Excellent at summarization and document Q&A because of the context window.
8. deepseek-coder-v2:16b — Best 24 GB coding pick
Mixture-of-experts coding model with only ~2.4 B active parameters per token, so it runs fast relative to its file size. ~10 GB on disk. The best open coding model that fits comfortably on 24 GB of RAM. For a laptop used primarily for development work, this is the top recommendation.
9. gemma2:9b — Solid 24 GB alternative
Google's mid-tier Gemma 2. ~5.5 GB. Competitive with Llama 3.1 8B, better multilingual. Good choice if you want Google's safety tuning and particularly good non-English performance.
For 32 GB+ MacBooks (Pro M3 Max, Studio, Max-spec Pros)
10. qwen2.5:32b — Best 32 GB general pick
Qwen 2.5's 32B variant. ~20 GB on disk. This is where local open-weights get genuinely competitive with hosted frontier models on general tasks. Apache 2.0. If your work tolerates the wait for larger model inference, this is the top non-coding choice.
11. qwen2.5-coder:32b — Best 32 GB coding pick
The coding variant of #10. HumanEval in the high 80s from a file that fits in 20 GB of RAM. This is the closest you'll get to Claude 3.5 Sonnet on local hardware without workstation-class setup. Covered in depth in our 15 Best LLMs for Coding roundup.
12. llama3.3:70b — Best 64 GB+ frontier pick
Meta's 70B "compact" model. ~42 GB on disk at Q4. Needs 48+ GB of RAM to run comfortably — this one is for Max-spec and Studio Macs only. Quality is approximately Llama 3.1 405B from 2024 at a fraction of the cost. Only pick this if you have the RAM and want the biggest available.
The summary table
| # | Model | Pull command | Size | Best for | MacBook RAM |
|---|---|---|---|---|---|
| 1 | Llama 3.2 3B | llama3.2:3b | 2.0 GB | General chat | 8 GB |
| 2 | Phi-3.5 Mini | phi3.5 | 2.4 GB | Reasoning, code | 8 GB |
| 3 | Gemma 2 2B | gemma2:2b | 1.6 GB | Speed, multilingual | 8 GB |
| 4 | Qwen 2.5 7B | qwen2.5:7b | 4.5 GB | Best general-purpose | 16 GB |
| 5 | Llama 3.1 8B | llama3.1:8b | 5.0 GB | Ecosystem-heavy | 16 GB |
| 6 | Qwen 2.5 Coder 7B | qwen2.5-coder:7b | 4.5 GB | Coding | 16 GB |
| 7 | Mistral Nemo 12B | mistral-nemo | 7.5 GB | Long context | 24 GB |
| 8 | DeepSeek Coder V2 16B | deepseek-coder-v2 | 10 GB | Coding | 24 GB |
| 9 | Gemma 2 9B | gemma2:9b | 5.5 GB | Multilingual | 24 GB |
| 10 | Qwen 2.5 32B | qwen2.5:32b | 20 GB | Frontier-adjacent general | 32 GB+ |
| 11 | Qwen 2.5 Coder 32B | qwen2.5-coder:32b | 20 GB | Frontier-adjacent coding | 32 GB+ |
| 12 | Llama 3.3 70B | llama3.3:70b | 42 GB | Biggest available | 64 GB+ |
How to pick one quickly
Figure out your RAM (Apple menu → About This Mac). Look at the matching row. Pull the recommended model. That's it. Don't run larger models than your RAM supports — Ollama will technically load them using swap, but the speed will drop 10x and you'll blame the model when you should be blaming the file size.
One edge case: if you have 16 GB and you want to occasionally run a 12B model, you can — just close every other application first. 12B at Q4 needs ~9 GB of RAM for the model plus overhead for the KV cache, and it will work on a 16 GB machine with nothing else running. It's not a daily-driver configuration.
Want the same models on your iPhone?
Ollama doesn't run on iOS — see What Is Ollama? 8 Things iPhone Users Should Know for why. On iPhone, the same models (Llama 3.2, Phi-3.5 Mini, Qwen 2.5 Coder 1.5B, etc.) run through native on-device apps. PocketLLM is the easiest one to install — join the waitlist. Our iPhone app roundup covers all the native alternatives.
The quick answer
The best Ollama model for your MacBook depends only on how much RAM you have. On 8 GB, llama3.2:3b. On 16 GB, qwen2.5:7b. On 24 GB, mistral-nemo or deepseek-coder-v2. On 32 GB+, qwen2.5:32b. Don't overthink it — the default Q4 quantization is fine, and the right model for your machine is the biggest one that comfortably fits.