DeepSeek's reasoning models made headlines, and a lot of people want them without the cloud app attached. That's exactly the right instinct — and it's doable. You can't fit the full DeepSeek models on a phone or laptop, but the distilled variants run beautifully on-device, and they keep DeepSeek's strong math-and-code reasoning while never sending a word off your machine. This tutorial covers which distill and quant to pick for your device, the RAM each needs, the actual setup steps, and — critically — the difference between running DeepSeek locally (private) and using the DeepSeek cloud service (not private). Those two things are constantly confused, so we draw a hard line between them.
Want the short version? Jump to the summary table. Want a DeepSeek distill that runs fully on-device with no cloud? PocketLLM packages compatible models for iPhone — join the waitlist.
Don't try to run the full DeepSeek — run a distilled variant. On a phone or 8 GB Mac, use DeepSeek-R1-Distill-Qwen-1.5B at Q4 (under 1 GB, fits everywhere). On a 16 GB Mac, step up to a 7B/8B distill at Q4 (~4.5–5 GB) for stronger reasoning. Running these locally is fully private — prompts never leave your device. That is entirely separate from the DeepSeek cloud app, which sends prompts to DeepSeek's servers.
Local DeepSeek vs the DeepSeek cloud — read this first
These are two different things and the privacy gap between them is total. The DeepSeek app and website are a cloud service: your prompts travel to DeepSeek's servers, tied to a session, and are subject to that service's data handling. Running DeepSeek locally means downloading the open-weight distilled model and running it on your own iPhone or Mac, completely offline — no account, no server, nothing transmitted. This guide is exclusively about the second path. You get DeepSeek's reasoning ability without handing any text to a third party.
Which DeepSeek variant fits your device
DeepSeek-R1-Distill-Qwen-1.5B — the phone pick
This distill packs the reasoning training into a 1.5B body. At Q4 it's under 1 GB on disk and loads in about 2 GB of RAM, so it runs on a phone and any Mac. It's particularly strong on math and step-by-step problems for its size. This is the variant we recommend for iPhone and 8 GB Macs.
DeepSeek 7B/8B distill — the Mac pick
The 7B and 8B distills, built on Qwen and Llama bases, are noticeably stronger on multi-step reasoning. At Q4 they're roughly 4.5–5 GB and need about 8 GB of RAM, so they're 16 GB-Mac territory. They'll technically load on a high-memory iPhone but run slower. Choose these when you have the RAM and want better answers on hard problems.
What to avoid on consumer hardware
The full-size DeepSeek models are not laptop or phone models. They need far more memory than consumer machines have, and trying to load them just swaps to disk and crawls. If you've read our best open-source LLM roundup, the same rule applies here: match the model to your RAM rather than chasing the biggest name.
Steps to run a DeepSeek distill on-device
- Pick your variant by RAM. Phone or 8 GB Mac → 1.5B distill at Q4. 16 GB Mac → 7B/8B distill at Q4.
- Get the GGUF. The distills are published as GGUF files; pick the Q4 build that matches your variant. If GGUF tags are unfamiliar, our GGUF guide explains the format and quant levels.
- Choose a local runtime. On Mac, LM Studio or Ollama load the GGUF and run inference offline. On iPhone, use an app built on llama.cpp.
- Verify it's offline. Confirm no network calls — the whole point is that prompts stay on-device. Turn on Airplane Mode and the model should still answer.
- Or skip the manual steps. An app that bundles compatible models picks a fitting quant and downloads in one tap, so you never touch a GGUF file.
The summary table
| Variant | Quant | File size | RAM to run | Best device |
|---|---|---|---|---|
| R1-Distill-Qwen 1.5B | Q4 | ~0.9 GB | 2 GB | iPhone, 8 GB Mac |
| R1-Distill-Qwen 1.5B | Q8 | ~1.7 GB | 3 GB | Any Mac, quality bump |
| 7B distill | Q4 | ~4.5 GB | 8 GB | 16 GB Mac |
| 8B distill | Q4 | ~5.0 GB | 8 GB | 16 GB Mac |
| Full DeepSeek | — | very large | far beyond consumer | not on phone/laptop |
What the local distills are good at
The distills inherit DeepSeek's reasoning focus, so they shine on math, logic, and step-by-step coding relative to their size — the 1.5B punches above its weight on structured problems. They're weaker on broad world knowledge than a general model of the same size, which is the usual distillation trade-off. For everyday chat plus solid reasoning on a phone, the 1.5B distill is a strong, private option.
Keeping it private end to end
The privacy win only holds if inference truly stays local. Run the model fully offline, confirm with Airplane Mode, and avoid any wrapper that quietly proxies prompts to a server. A purpose-built on-device app makes this the default — no account, no telemetry, no prompt ever leaving the device. That's the model PocketLLM follows: compatible models packaged for iPhone, inference entirely on-device, zero telemetry on your conversations.
Want DeepSeek-style reasoning on your iPhone without the cloud app? PocketLLM runs compatible distills fully on-device with zero telemetry. Join the waitlist.
Frequently asked questions
Can I run DeepSeek locally on an iPhone?
Yes, but use a small distilled variant. The full DeepSeek models are far too large for a phone, so you run a distill such as DeepSeek-R1-Distill-Qwen-1.5B, which at Q4 is under 1 GB and fits comfortably on an iPhone. A 7B distill at Q4 is about 4.5 GB and runs on higher-memory iPhones but slower. PocketLLM packages compatible small models as one-tap downloads and runs them fully on-device with zero telemetry.
Is running DeepSeek locally private?
Yes. When you run a DeepSeek distill locally, the model and all inference stay on your device — your prompts never leave the phone or Mac, there is no account, and nothing is logged or sent anywhere. This is completely separate from DeepSeek's cloud app and website, which send your prompts to remote servers. Running the open weights locally gives you the model's capability without the cloud service's data collection.
Which DeepSeek model should I run on a Mac?
On an 8 GB Mac, run the DeepSeek-R1-Distill-Qwen-1.5B at Q4 — it is small, fast, and strong on math and code. On a 16 GB Mac, step up to a 7B or 8B distill at Q4, around 4.5 to 5 GB, for noticeably better reasoning. Avoid the very large DeepSeek models on a laptop; they need far more memory than consumer machines have and will swap badly if they load at all.
How much RAM do I need to run DeepSeek locally?
It depends on the distill size and quant. A 1.5B distill at Q4 loads in about 2 GB of RAM, a 7B or 8B distill at Q4 needs about 8 GB, and larger variants need much more. The on-disk file size of the GGUF is close to the RAM the model uses, so check that number before downloading. For phones and 8 GB Macs, stick to the 1.5B distill at Q4.
Is the local DeepSeek the same as the DeepSeek app?
No, and the difference matters for privacy. The DeepSeek app and website are a cloud service that sends your prompts to DeepSeek's servers. Running DeepSeek locally means downloading the open-weight distilled model and running it on your own device, fully offline, with nothing transmitted. The local route gives you the model's reasoning ability without sending any of your text to a third party.