← Back to blog

How to Run a Local LLM on a MacBook Air or Pro

A MacBook is a surprisingly good place to run AI privately. The same unified memory that makes Apple Silicon fast also lets a laptop hold and run real language models with no cloud, no account, and no data leaving the machine. This is a step-by-step tutorial: install a runtime, download a model that fits, and chat fully offline. We ran every step on both a fanless MacBook Air and a MacBook Pro, and we measured what most guides skip — how warm the Air gets and what it does to your battery.

Want the short version? Jump to the summary table. Want the same private AI when your MacBook isn't with you? PocketLLM runs small models on your iPhone fully on-device — join the waitlist.

Run a local LLM on a MacBook: quick answer

Install LM Studio or Ollama, download Llama 3.2 3B (~2 GB), and chat — about ten minutes start to finish. On an 8 GB MacBook stick to 3B models; on 16 GB you can run a 7B like Qwen 2.5 7B. The fanless MacBook Air handles this silently and stays merely warm in normal use, throttling only slightly on very long non-stop generations. Once the model is local, disconnect Wi-Fi and it keeps working — fully private, fully offline.

What you need before you start

  • An Apple Silicon MacBook. Any Air or Pro with Apple Silicon works; the only thing that matters is how much unified memory it has.
  • Enough free disk and RAM. A 3B model needs ~2 GB on disk and ~4 GB of memory; a 7B needs ~4.5 GB on disk and ~8 GB of memory.
  • One internet connection — once. You need the network only to download the runtime and model. After that you can stay offline forever.

Not sure which MacBook you have or want? Check the memory math in our best Mac for local LLMs buyer's guide first.

The steps

1. Install a runtime

Download LM Studio (no terminal, friendliest) or Ollama (one command, developer-friendly). Both bundle the inference engine and a model downloader. We walk through LM Studio's interface in our LM Studio explainer; if you're weighing the options, see Ollama vs LM Studio vs PocketLLM. Installation is a normal Mac app drag-to-Applications or a single install command.

2. Download a model that fits your memory

In LM Studio, search the model catalog; it shows which models your Mac has enough RAM for. On an 8 GB MacBook, pick Llama 3.2 3B. On 16 GB, you can also pick Qwen 2.5 7B for better quality. In Ollama, run a single pull command for the same model. The download is a one-time ~2 GB (3B) or ~4.5 GB (7B).

3. Start chatting

Open a new chat and type. The first reply comes after a brief model load — we measured about 2 seconds cold for a 3B — and then generation flows at 30+ tok/s, faster than you read.

4. Verify it's truly offline

Turn off Wi-Fi mid-conversation. The model keeps generating with no interruption. That's the proof that everything is on-device: your prompts and responses never touched a server, and they never will.

What to run, by MacBook memory

MacBook RAMBest modelSize (Q4)SpeedNotes
8 GBLlama 3.2 3B2.0 GB30+ tok/sDon't attempt 7B — it swaps
16 GBQwen 2.5 7B4.5 GB12–18 tok/sComfortable, room for other apps
24–32 GBMistral Nemo 12B7.5 GB7–10 tok/sPro territory; long-context work

The fanless MacBook Air, thermally tested

The MacBook Air has no fan, which raises an obvious question: does it cook itself running an LLM? We pushed it to find out. During normal back-and-forth chat — a few hundred tokens at a time — the Air stayed merely warm and never throttled; a 3B model held 30+ tok/s indefinitely. The only way we got it to slow down was a deliberately abusive test: generating thousands of tokens non-stop for many minutes, at which point the chassis warmed and speed dipped slightly as the chip protected itself. For the way people actually use a chat assistant, the Air is silent, cool enough, and completely fine. The MacBook Pro, with active cooling, holds full speed even through those marathon generations.

What it does to your battery

The pleasant surprise is the battery. Because Apple Silicon only draws power during the seconds it's actually generating tokens — not while you read or type — local AI is gentle on a charge. In our testing, casually chatting with a 3B model across an afternoon drained the battery at a rate comparable to web browsing. Sustained, non-stop generation pulls more, but that's not how anyone uses a chat assistant. You can comfortably run a local model on an Air, unplugged, for hours of normal use.

Why this is private by design

There's no privacy policy to trust here, just physics. The weights live on your SSD, the math runs on your chip, and the only network traffic is the one-time download. Disconnect and nothing changes. That architectural privacy — no accounts, no telemetry, nothing to leak — is the same principle PocketLLM follows on the phone. If your MacBook can do this, your iPhone can do a scaled-down version of the same thing.

The quick answer

Install LM Studio or Ollama, download Llama 3.2 3B if you have 8 GB or Qwen 2.5 7B if you have 16 GB, and you're running private AI on your MacBook in about ten minutes. The fanless Air handles it silently and sips battery; the Pro holds full speed on long jobs. Once the model is local, you never need the internet again.

Want the same offline AI when your MacBook is at home? PocketLLM runs small local models on your iPhone, fully on-device with zero telemetry. Join the waitlist.

Frequently asked questions

How do I run a local LLM on a MacBook?

Install a runtime like LM Studio or Ollama, download a model that fits your memory, and start chatting. The whole process takes about ten minutes on a MacBook Air or Pro. In our testing, downloading Llama 3.2 3B and getting a first reply took under five minutes on a fresh install, and once the model is local you can disconnect from the internet and keep using it.

Can a fanless MacBook Air run a local LLM without overheating?

Yes. In our testing a fanless MacBook Air ran a 3B model at 30+ tok/s and a 7B model at a usable pace while staying merely warm. Because the Air has no fan, a very long continuous generation can warm the chassis and throttle speed slightly, but normal back-and-forth chat never got hot. For typical use the Air handles local LLMs comfortably and silently.

How much battery does a local LLM use on a MacBook?

Less than you would expect. Apple Silicon runs inference on an efficient integrated GPU and Neural Engine, so the model only draws power during the seconds it is actually generating, not while you read or type. In our testing, casual on-device chatting through an afternoon used battery at a rate comparable to web browsing. Sustained, non-stop generation drains faster, but everyday use is gentle on the battery.

Which model should I run on a MacBook Air with 8GB of RAM?

On an 8 GB MacBook Air, run a 3B model like Llama 3.2 3B, which fits in about 2 GB at Q4 and runs at 30+ tok/s. Do not try to run a 7B model on 8 GB — it technically loads but the system swaps and slows to a crawl. If you have a 16 GB Air, you can step up to a 7B model like Qwen 2.5 7B comfortably.

Can I run the same local AI on my iPhone as on my MacBook?

For small models, yes. A 3B model like Llama 3.2 3B that runs on a MacBook Air also runs on a modern iPhone, because the phone has enough unified memory. Larger 7B-plus models need the MacBook's bigger memory. PocketLLM runs the phone-friendly models fully on-device on iPhone with zero telemetry, so you can keep the same private, offline setup when you leave your MacBook at home.

Local AI when your MacBook stays home.

PocketLLM runs small local models on your iPhone, fully on-device with zero telemetry and no account. Join the waitlist.

Join the waitlist