← Back to blog

12 Best Ollama Models to Run on a MacBook in 2026

Ollama's model library has hundreds of options. Most articles recommend models without telling you whether they'll fit on your machine. This ranking is grouped by MacBook RAM — 8 GB, 16 GB, 24 GB, 32 GB+ — so you can scan to your exact hardware and pick a model that actually runs well, not a model that technically loads and then crashes Activity Monitor. All 12 are available via ollama pull.

Short version: on 8 GB use llama3.2:3b. On 16 GB use qwen2.5:7b. On 24 GB step up to deepseek-coder-v2:16b for code or mistral-nemo:12b for general. On 32 GB+ use qwen2.5:32b or llama3.3:70b. Jump to the table. And if you want these same models on your iPhone, PocketLLM has you covered.

How to read this list

Ollama automatically quantizes most models to Q4 on Mac unless you request a different precision. The file sizes and memory footprints below reflect Q4 — the default. If you're on a tight memory budget, you can sometimes drop to Q3 or Q2 with meaningful quality loss, or if you have headroom, bump to Q5 or Q8 for marginal quality gains. The sweet spot is Q4 on Apple Silicon, full stop.

For 8 GB MacBooks (Air M1/M2, older models)

1. llama3.2:3b — Best 8 GB pick

Meta's Llama 3.2 3B is the single best model for 8 GB MacBooks. ~2 GB on disk at Q4, uses about 3 GB of RAM while running, and hits 28-35 tok/s on an M2 Air. Quality is competitive with 7B models from 2024. Pull: ollama pull llama3.2:3b.

2. phi3.5:3.8b-mini-instruct-q4_0 — Best reasoning on 8 GB

Microsoft's Phi-3.5 Mini. ~2.4 GB on disk. Better than Llama 3.2 3B at structured reasoning, code, and math; slightly worse at general conversation. Run both, use whichever feels right for the task.

3. gemma2:2b — Fastest on 8 GB

Google's Gemma 2 2B. ~1.6 GB on disk. Highest tok/s of any model on this list (40-50 on an M2 Air) at the cost of some quality. Best multilingual performance in the small-model category. Great default for chat UIs that care about latency.

For 16 GB MacBooks (Pro, Air, Mini)

4. qwen2.5:7b — Best 16 GB pick

Alibaba's Qwen 2.5 7B is the best general-purpose model in this range. Apache 2.0 license, ~4.5 GB on disk, beats Llama 3.1 8B on most benchmarks. Pull: ollama pull qwen2.5:7b. On an M3 Pro, expect 18-25 tok/s.

5. llama3.1:8b — Most battle-tested 16 GB pick

The workhorse of the current Llama generation. ~5 GB on disk. Slightly behind Qwen 2.5 on benchmarks, ahead on fine-tune ecosystem and tool integration. Pick this if you're using LangChain, LiteLLM, or another tool with Llama-specific optimizations.

6. qwen2.5-coder:7b — Best 16 GB coding pick

The coding variant of Qwen 2.5 7B. HumanEval in the high 70s from a 4.5 GB file. If you use Continue.dev, Cody, or Cursor with Ollama, this is the current best option for "fits on a laptop and writes good code." See our best local LLMs for coding post for the full coding-focused ranking.

For 24 GB MacBooks (Pro M3/M4 base, Air with upgrades)

7. mistral-nemo:12b — Best 24 GB general pick

Mistral + NVIDIA's Mistral Nemo. 128K context window, Apache 2.0, ~7.5 GB at Q4. The best open model for long-context tasks that will run comfortably on 24 GB. Excellent at summarization and document Q&A because of the context window.

8. deepseek-coder-v2:16b — Best 24 GB coding pick

Mixture-of-experts coding model with only ~2.4 B active parameters per token, so it runs fast relative to its file size. ~10 GB on disk. The best open coding model that fits comfortably on 24 GB of RAM. For a laptop used primarily for development work, this is the top recommendation.

9. gemma2:9b — Solid 24 GB alternative

Google's mid-tier Gemma 2. ~5.5 GB. Competitive with Llama 3.1 8B, better multilingual. Good choice if you want Google's safety tuning and particularly good non-English performance.

For 32 GB+ MacBooks (Pro M3 Max, Studio, Max-spec Pros)

10. qwen2.5:32b — Best 32 GB general pick

Qwen 2.5's 32B variant. ~20 GB on disk. This is where local open-weights get genuinely competitive with hosted frontier models on general tasks. Apache 2.0. If your work tolerates the wait for larger model inference, this is the top non-coding choice.

11. qwen2.5-coder:32b — Best 32 GB coding pick

The coding variant of #10. HumanEval in the high 80s from a file that fits in 20 GB of RAM. This is the closest you'll get to Claude 3.5 Sonnet on local hardware without workstation-class setup. Covered in depth in our 15 Best LLMs for Coding roundup.

12. llama3.3:70b — Best 64 GB+ frontier pick

Meta's 70B "compact" model. ~42 GB on disk at Q4. Needs 48+ GB of RAM to run comfortably — this one is for Max-spec and Studio Macs only. Quality is approximately Llama 3.1 405B from 2024 at a fraction of the cost. Only pick this if you have the RAM and want the biggest available.

The summary table

#ModelPull commandSizeBest forMacBook RAM
1Llama 3.2 3Bllama3.2:3b2.0 GBGeneral chat8 GB
2Phi-3.5 Miniphi3.52.4 GBReasoning, code8 GB
3Gemma 2 2Bgemma2:2b1.6 GBSpeed, multilingual8 GB
4Qwen 2.5 7Bqwen2.5:7b4.5 GBBest general-purpose16 GB
5Llama 3.1 8Bllama3.1:8b5.0 GBEcosystem-heavy16 GB
6Qwen 2.5 Coder 7Bqwen2.5-coder:7b4.5 GBCoding16 GB
7Mistral Nemo 12Bmistral-nemo7.5 GBLong context24 GB
8DeepSeek Coder V2 16Bdeepseek-coder-v210 GBCoding24 GB
9Gemma 2 9Bgemma2:9b5.5 GBMultilingual24 GB
10Qwen 2.5 32Bqwen2.5:32b20 GBFrontier-adjacent general32 GB+
11Qwen 2.5 Coder 32Bqwen2.5-coder:32b20 GBFrontier-adjacent coding32 GB+
12Llama 3.3 70Bllama3.3:70b42 GBBiggest available64 GB+

How to pick one quickly

Figure out your RAM (Apple menu → About This Mac). Look at the matching row. Pull the recommended model. That's it. Don't run larger models than your RAM supports — Ollama will technically load them using swap, but the speed will drop 10x and you'll blame the model when you should be blaming the file size.

One edge case: if you have 16 GB and you want to occasionally run a 12B model, you can — just close every other application first. 12B at Q4 needs ~9 GB of RAM for the model plus overhead for the KV cache, and it will work on a 16 GB machine with nothing else running. It's not a daily-driver configuration.

Want the same models on your iPhone?

Ollama doesn't run on iOS — see What Is Ollama? 8 Things iPhone Users Should Know for why. On iPhone, the same models (Llama 3.2, Phi-3.5 Mini, Qwen 2.5 Coder 1.5B, etc.) run through native on-device apps. PocketLLM is the easiest one to install — join the waitlist. Our iPhone app roundup covers all the native alternatives.

The quick answer

The best Ollama model for your MacBook depends only on how much RAM you have. On 8 GB, llama3.2:3b. On 16 GB, qwen2.5:7b. On 24 GB, mistral-nemo or deepseek-coder-v2. On 32 GB+, qwen2.5:32b. Don't overthink it — the default Q4 quantization is fine, and the right model for your machine is the biggest one that comfortably fits.

The same Ollama-friendly models, on your iPhone.

PocketLLM runs Llama 3.2, Phi-3.5 Mini, and Qwen 2.5 Coder natively on iOS. No account, no cloud, no telemetry. Join the waitlist.

Join the waitlist