Best Ollama Models for Mac in 2026 (M1–M4, by RAM)

Ollama's model library has hundreds of options. Most articles recommend models without telling you whether they'll fit on your machine. This ranking is grouped by MacBook RAM — 8 GB, 16 GB, 24 GB, 32 GB+ — so you can scan to your exact hardware and pick a model that actually runs well, not a model that technically loads and then crashes Activity Monitor. All 12 are available via ollama pull.

Short version: on 8 GB use llama3.2:3b. On 16 GB use qwen2.5:7b. On 24 GB step up to deepseek-coder-v2:16b for code or mistral-nemo:12b for general. On 32 GB+ use qwen2.5:32b or llama3.3:70b. Jump to the table. And if you want these same models on your iPhone, PocketLLM is built to cover that (coming soon).

PocketLLM is launching soon. Private, on-device AI, starting on iPhone and iPad with more platforms planned. No account, no tracking, no cloud. Join the launch list and be first in.

Join the launch list

How to pick an Ollama model for your Mac

Ollama automatically quantizes most models to Q4 on Mac unless you request a different precision. The file sizes and memory footprints below reflect Q4 — the default. If you're on a tight memory budget, you can sometimes drop to Q3 or Q2 with meaningful quality loss, or if you have headroom, bump to Q5 or Q8 for marginal quality gains. The sweet spot is Q4 on Apple Silicon, full stop.

For 8 GB Macs (MacBook Air M1/M2, Mac mini)

1. llama3.2:3b — Best 8 GB pick

Meta's Llama 3.2 3B is the single best model for 8 GB MacBooks. ~2 GB on disk at Q4, uses about 3 GB of RAM while running, and hits 28-35 tok/s on an M2 Air. Quality is competitive with 7B models from 2024. Pull: ollama pull llama3.2:3b.

2. phi3.5:3.8b-mini-instruct-q4_0 — Best reasoning on 8 GB

Microsoft's Phi-3.5 Mini. ~2.4 GB on disk. Better than Llama 3.2 3B at structured reasoning, code, and math; slightly worse at general conversation. Run both, use whichever feels right for the task.

3. gemma2:2b — Fastest on 8 GB

Google's Gemma 2 2B. ~1.6 GB on disk. Highest tok/s of any model on this list (40-50 on an M2 Air) at the cost of some quality. Best multilingual performance in the small-model category. Great default for chat UIs that care about latency.

For 16 GB Macs (MacBook Air/Pro M1–M4, Mac mini M4)

4. qwen2.5:7b — Best 16 GB pick

Alibaba's Qwen 2.5 7B is the best general-purpose model in this range. Apache 2.0 license, ~4.5 GB on disk, beats Llama 3.1 8B on most benchmarks. Pull: ollama pull qwen2.5:7b. On an M3 Pro, expect 18-25 tok/s.

5. llama3.1:8b — Most battle-tested 16 GB pick

The workhorse of the current Llama generation. ~5 GB on disk. Slightly behind Qwen 2.5 on benchmarks, ahead on fine-tune ecosystem and tool integration. Pick this if you're using LangChain, LiteLLM, or another tool with Llama-specific optimizations.

6. qwen2.5-coder:7b — Best 16 GB coding pick

The coding variant of Qwen 2.5 7B. HumanEval in the high 70s from a 4.5 GB file. If you use Continue.dev, Cody, or Cursor with Ollama, this is the current best option for "fits on a laptop and writes good code." See our best local LLMs for coding post for the full coding-focused ranking.

For 24 GB Macs (MacBook Pro M3/M4, M4 Pro)

7. mistral-nemo:12b — Best 24 GB general pick

Mistral + NVIDIA's Mistral Nemo. 128K context window, Apache 2.0, ~7.5 GB at Q4. The best open model for long-context tasks that will run comfortably on 24 GB. Excellent at summarization and document Q&A because of the context window.

8. deepseek-coder-v2:16b — Best 24 GB coding pick

Mixture-of-experts coding model with only ~2.4 B active parameters per token, so it runs fast relative to its file size. ~10 GB on disk. The best open coding model that fits comfortably on 24 GB of RAM. For a laptop used primarily for development work, this is the top recommendation.

9. gemma2:9b — Solid 24 GB alternative

Google's mid-tier Gemma 2. ~5.5 GB. Competitive with Llama 3.1 8B, better multilingual. Good choice if you want Google's safety tuning and particularly good non-English performance.

For 32 GB+ Macs (MacBook Pro M3/M4 Max, Mac Studio)

10. qwen2.5:32b — Best 32 GB general pick

Qwen 2.5's 32B variant. ~20 GB on disk. This is where local open-weights get genuinely competitive with hosted frontier models on general tasks. Apache 2.0. If your work tolerates the wait for larger model inference, this is the top non-coding choice.

11. qwen2.5-coder:32b — Best 32 GB coding pick

The coding variant of #10. HumanEval in the high 80s from a file that fits in 20 GB of RAM. This is the closest you'll get to Claude 3.5 Sonnet on local hardware without workstation-class setup. Covered in depth in our 15 Best LLMs for Coding roundup.

12. llama3.3:70b — Best 64 GB+ frontier pick

Meta's 70B "compact" model. ~42 GB on disk at Q4. Needs 48+ GB of RAM to run comfortably — this one is for Max-spec and Studio Macs only. Quality is approximately Llama 3.1 405B from 2024 at a fraction of the cost. Only pick this if you have the RAM and want the biggest available.

The summary table

#	Model	Pull command	Size	Best for	MacBook RAM
1	Llama 3.2 3B	`llama3.2:3b`	2.0 GB	General chat	8 GB
2	Phi-3.5 Mini	`phi3.5`	2.4 GB	Reasoning, code	8 GB
3	Gemma 2 2B	`gemma2:2b`	1.6 GB	Speed, multilingual	8 GB
4	Qwen 2.5 7B	`qwen2.5:7b`	4.5 GB	Best general-purpose	16 GB
5	Llama 3.1 8B	`llama3.1:8b`	5.0 GB	Ecosystem-heavy	16 GB
6	Qwen 2.5 Coder 7B	`qwen2.5-coder:7b`	4.5 GB	Coding	16 GB
7	Mistral Nemo 12B	`mistral-nemo`	7.5 GB	Long context	24 GB
8	DeepSeek Coder V2 16B	`deepseek-coder-v2`	10 GB	Coding	24 GB
9	Gemma 2 9B	`gemma2:9b`	5.5 GB	Multilingual	24 GB
10	Qwen 2.5 32B	`qwen2.5:32b`	20 GB	Frontier-adjacent general	32 GB+
11	Qwen 2.5 Coder 32B	`qwen2.5-coder:32b`	20 GB	Frontier-adjacent coding	32 GB+
12	Llama 3.3 70B	`llama3.3:70b`	42 GB	Biggest available	64 GB+

Best Ollama models for coding on Mac (by RAM)

If your Mac is mainly a coding machine, the picks narrow down. These are the same models from the list above, filtered to the best code model at each RAM tier — all run through Ollama with Continue.dev, Cody, or Cursor.

Mac RAM	Coding model	Pull command	Why
8 GB	Phi-3.5 Mini	`ollama pull phi3.5`	At ~2.4 GB it is better than Llama 3.2 3B at structured reasoning, code, and math — the most code-oriented of the 8 GB options.
16 GB	Qwen 2.5 Coder 7B	`ollama pull qwen2.5-coder:7b`	HumanEval in the high 70s from a 4.5 GB file — the best 16 GB coding pick.
24 GB	DeepSeek Coder V2 16B	`ollama pull deepseek-coder-v2`	Mixture-of-experts (~2.4 B active params), fast for its size; the best open coding model that fits comfortably on 24 GB.
32 GB+	Qwen 2.5 Coder 32B	`ollama pull qwen2.5-coder:32b`	HumanEval in the high 80s — the closest to Claude 3.5 Sonnet on local hardware without workstation-class gear.

Short version: on a 16 GB Mac, qwen2.5-coder:7b is the one to pull; on 32 GB+, qwen2.5-coder:32b is genuinely frontier-adjacent. For the full runtime-agnostic ranking, see our best local LLM for coding 2026 deep dive. To drive these models with the Claude Code CLI, see Use Claude Code with local models.

Which Ollama model for your Mac's RAM?

Figure out your RAM (Apple menu → About This Mac). Look at the matching row. Pull the recommended model. That's it. Don't run larger models than your RAM supports — Ollama will technically load them using swap, but the speed will drop 10x and you'll blame the model when you should be blaming the file size.

One edge case: if you have 16 GB and you want to occasionally run a 12B model, you can — just close every other application first. 12B at Q4 needs ~9 GB of RAM for the model plus overhead for the KV cache, and it will work on a 16 GB machine with nothing else running. It's not a daily-driver configuration.

Want the same models on your iPhone?

Ollama doesn't run on iOS — see What Is Ollama? 8 Things iPhone Users Should Know for why. On iPhone, the same models (Llama 3.2, Phi-3.5 Mini, Qwen 2.5 Coder 1.5B, etc.) run through native on-device apps. PocketLLM is designed to be the easiest one to install (coming soon) — join the launch list. Our iPhone app roundup covers all the native alternatives. Wondering if Ollama itself is private and safe? See Is Ollama private and safe?

The quick answer

The best Ollama model for your MacBook depends only on how much RAM you have. On 8 GB, llama3.2:3b. On 16 GB, qwen2.5:7b. On 24 GB, mistral-nemo or deepseek-coder-v2. On 32 GB+, qwen2.5:32b. Don't overthink it — the default Q4 quantization is fine, and the right model for your machine is the biggest one that comfortably fits.

Frequently asked questions

What is the best Ollama model for a Mac with 8 GB of RAM?

On an 8 GB Mac, stick to a small 3B-class model like llama3.2:3b at the default Q4 quantization. It leaves room for macOS and other apps while still handling everyday chat, drafting, and summarizing. Larger 7B models can technically load on 8 GB but leave little headroom and lean on swap, which slows generation.

How much RAM do I need to run Ollama models well on a Mac?

A rough guide: 8 GB comfortably runs 3B models, 16 GB runs 7B-class models like qwen2.5:7b, 24 GB opens up 12B to 14B models, and 32 GB or more lets you run 32B models. Apple Silicon's unified memory is shared with the GPU, so leave a few gigabytes free for the system. The right model is the largest one that fits comfortably, not the largest that loads.

Does Ollama run on iPhone or iPad?

No. Ollama is a desktop tool for macOS, Windows, and Linux; it does not run on iOS or iPadOS. To run the same open models such as Llama 3.2, Phi-3.5 Mini, and Qwen 2.5 on an iPhone or iPad, you need a native on-device app that uses a mobile inference engine. PocketLLM is one such app, built to run these models fully on-device.

What quantization should I use for Ollama models on Mac?

The default Q4 quantization is the right choice for almost everyone. It roughly halves memory use versus higher-precision formats with only a small quality trade-off, which is what makes larger models fit on consumer Macs. Only step up to Q5, Q6, or Q8 if you have RAM to spare and need the extra fidelity for a specific task.

Is running a model with Ollama private?

Yes, in the sense that inference happens locally on your Mac and prompts are not sent to a cloud provider. The model runs on your own hardware, so your conversations stay on the device. On-device apps like PocketLLM extend the same local-only approach to iPhone and iPad.

The 12 Best Ollama Models for Mac in 2026

How to pick an Ollama model for your Mac

For 8 GB Macs (MacBook Air M1/M2, Mac mini)

1. llama3.2:3b — Best 8 GB pick

2. phi3.5:3.8b-mini-instruct-q4_0 — Best reasoning on 8 GB

3. gemma2:2b — Fastest on 8 GB

For 16 GB Macs (MacBook Air/Pro M1–M4, Mac mini M4)

4. qwen2.5:7b — Best 16 GB pick

5. llama3.1:8b — Most battle-tested 16 GB pick

6. qwen2.5-coder:7b — Best 16 GB coding pick

For 24 GB Macs (MacBook Pro M3/M4, M4 Pro)

7. mistral-nemo:12b — Best 24 GB general pick

8. deepseek-coder-v2:16b — Best 24 GB coding pick

9. gemma2:9b — Solid 24 GB alternative

For 32 GB+ Macs (MacBook Pro M3/M4 Max, Mac Studio)

10. qwen2.5:32b — Best 32 GB general pick

11. qwen2.5-coder:32b — Best 32 GB coding pick

12. llama3.3:70b — Best 64 GB+ frontier pick

The summary table

Best Ollama models for coding on Mac (by RAM)

Which Ollama model for your Mac's RAM?

Want the same models on your iPhone?

The quick answer

Frequently asked questions

What is the best Ollama model for a Mac with 8 GB of RAM?

How much RAM do I need to run Ollama models well on a Mac?

Does Ollama run on iPhone or iPad?

What quantization should I use for Ollama models on Mac?

Is running a model with Ollama private?

The same Ollama-friendly models, on your iPhone.

The 12 Best Ollama Models for Mac in 2026

How to pick an Ollama model for your Mac

For 8 GB Macs (MacBook Air M1/M2, Mac mini)

1. llama3.2:3b — Best 8 GB pick

2. phi3.5:3.8b-mini-instruct-q4_0 — Best reasoning on 8 GB

3. gemma2:2b — Fastest on 8 GB

For 16 GB Macs (MacBook Air/Pro M1–M4, Mac mini M4)

4. qwen2.5:7b — Best 16 GB pick

5. llama3.1:8b — Most battle-tested 16 GB pick

6. qwen2.5-coder:7b — Best 16 GB coding pick

For 24 GB Macs (MacBook Pro M3/M4, M4 Pro)

7. mistral-nemo:12b — Best 24 GB general pick

8. deepseek-coder-v2:16b — Best 24 GB coding pick

9. gemma2:9b — Solid 24 GB alternative

For 32 GB+ Macs (MacBook Pro M3/M4 Max, Mac Studio)

10. qwen2.5:32b — Best 32 GB general pick

11. qwen2.5-coder:32b — Best 32 GB coding pick

12. llama3.3:70b — Best 64 GB+ frontier pick

The summary table

Best Ollama models for coding on Mac (by RAM)

Which Ollama model for your Mac's RAM?

Want the same models on your iPhone?

The quick answer

Frequently asked questions

What is the best Ollama model for a Mac with 8 GB of RAM?

How much RAM do I need to run Ollama models well on a Mac?

Does Ollama run on iPhone or iPad?

What quantization should I use for Ollama models on Mac?

Is running a model with Ollama private?

The same Ollama-friendly models, on your iPhone.

Related

What Is Ollama? 8 Things iPhone Users Should Know

Ollama vs LM Studio vs PocketLLM

15 Best Local LLM Models