8 Best Local LLMs for Coding in 2026

When we say "local LLMs for coding," we mean models you can run on your own laptop, Mac, or gaming PC — no API calls, no cloud, no sending your proprietary codebase to a third party. This ranking is the 8 open-weights coding models currently worth running in 2026, scored on HumanEval, real-world tasks we ran through each model, file size, and license. For the broader ranking that includes hosted proprietary models (Claude 3.5 Sonnet, GPT-4o, Gemini), see our 15 Best LLMs for Coding roundup.

Short version: Qwen 2.5 Coder 32B is the best local coding model if you have the hardware. DeepSeek Coder V2 Lite is the best one that fits on a 16 GB laptop. Qwen 2.5 Coder 1.5B is the best sub-2B option. Jump to the table. Want to drive these from the Claude Code CLI? See Use Claude Code with local models.

Best local LLM for coding: quick picks by hardware

8 GB MacBook Air: Qwen 2.5 Coder 7B (HumanEval high-70s, 4.5 GB at Q4, Apache 2.0). 16 GB laptop: DeepSeek Coder V2 Lite 16B (mid-80s HumanEval, ~10 GB, MoE so it runs at ~3B speeds). Workstation / 48 GB Mac: Qwen 2.5 Coder 32B (high-80s HumanEval, ~20 GB at Q4 — closest to Claude 3.5 Sonnet locally). iPhone / sub-2B: Phi-3.5 Mini (mid-60s HumanEval from 3.8B, 2.4 GB) or Qwen 2.5 Coder 1.5B. Full benchmarks below.

PocketLLM is launching soon. Private, on-device AI, starting on iPhone and iPad with more platforms planned. No account, no tracking, no cloud. Join the launch list and be first in.

Join the launch list

How we scored

HumanEval pass@1 (30%): Standard benchmark, from the model's primary source.
Real tasks (40%): 20 prompts: 5 refactoring, 5 bug fixes, 5 feature additions, 5 code-explain. Each scored on whether the output compiles, whether it's correct, and whether it's idiomatic.
File size (15%): At Q4 quantization. The whole point of local is "fits on your machine."
License (10%): Apache / MIT score highest. Non-commercial licenses penalized.
Ecosystem (5%): llama.cpp, Ollama, MLX, and Core ML support.

The 8 best local coding LLMs in 2026

1. Qwen 2.5 Coder 32B — 94/100

The best open-weights coding model you can run yourself. HumanEval in the high 80s, Apache 2.0, 32B parameters fitting in ~20 GB at Q4. Needs a workstation (Mac Studio, multi-GPU PC, or a 48 GB MacBook Pro M3 Max), but for that investment you get coding help that's within touching distance of Claude 3.5 Sonnet. The right answer if your code is confidential and your budget allows workstation hardware.

2. DeepSeek Coder V2 Lite 16B — 91/100

The best local coding model that fits on a consumer laptop. Mixture-of-experts architecture: only ~2.4B parameters active per token, so it runs at 3B speeds while holding mid-80s HumanEval. Requires ~10 GB of RAM at Q4. On a 16 GB MacBook Pro, this is the sweet spot. The license is custom DeepSeek terms — permissive for most uses but read before commercializing.

3. Qwen 2.5 Coder 14B — 88/100

The middle rung of the Qwen 2.5 Coder ladder. HumanEval in the low-to-mid 80s, 9 GB at Q4, Apache 2.0. Fits on a 16 GB laptop with room to spare. Slightly behind DeepSeek V2 Lite on benchmarks but with cleaner licensing and a simpler dense-transformer architecture.

4. Qwen 2.5 Coder 7B — 85/100

Runs on an 8 GB MacBook Air (tight fit) or comfortably on 16 GB. HumanEval in the high 70s, Apache 2.0, 4.5 GB at Q4. The best coding model that will actually fit on a cheap laptop. If you have an M2 Air and want to do real code work locally, this is your model.

5. Mistral Codestral 22B — 82/100

Mistral's dedicated coding model. Strong on Python and TypeScript, reasonable on other languages. The Mistral Non-Commercial license is restrictive — check it carefully if you're using it for work product. ~14 GB at Q4, runs on a 24 GB+ Mac.

6. StarCoder 2 15B — 80/100

Hugging Face + ServiceNow + NVIDIA's jointly developed coding model. Trained on permissively licensed code, which matters if you're worried about "what did this model learn from" provenance questions. Slightly behind Qwen and DeepSeek on raw benchmarks; best in class on clean training data.

7. Phi-3.5 Mini 3.8B — 75/100

Not a dedicated coding model, but punches absurdly far above its weight. HumanEval in the mid-60s from 3.8B parameters is unprecedented. MIT license, 2.4 GB at Q4, small enough to run on a phone. The best coding assistant that will comfortably fit on an iPhone. PocketLLM is built to package this as a one-tap download (coming soon) — join the launch list.

8. Qwen 2.5 Coder 1.5B — 70/100

The smallest dedicated coding model worth running. HumanEval in the low 60s at 1.5B parameters is genuinely impressive. Apache 2.0. 900 MB at Q4. Perfect as a fast local completion model for inline suggestions, or as a draft model for speculative decoding in front of a larger coder.

The summary table

#	Model	Size (Q4)	Min RAM	License	HumanEval	Score
1	Qwen 2.5 Coder 32B	20 GB	32 GB	Apache 2.0	~89%	94
2	DeepSeek Coder V2 Lite 16B	10 GB	16 GB	DeepSeek	~84%	91
3	Qwen 2.5 Coder 14B	9 GB	16 GB	Apache 2.0	~82%	88
4	Qwen 2.5 Coder 7B	4.5 GB	8 GB	Apache 2.0	~78%	85
5	Mistral Codestral 22B	14 GB	24 GB	Mistral NC	~81%	82
6	StarCoder 2 15B	9 GB	16 GB	BigCode OpenRAIL	~75%	80
7	Phi-3.5 Mini 3.8B	2.4 GB	4 GB	MIT	~65%	75
8	Qwen 2.5 Coder 1.5B	0.9 GB	2 GB	Apache 2.0	~61%	70

Which local coding LLM should you run?

Mac Studio / workstation PC (32 GB+): Qwen 2.5 Coder 32B. No compromises.

16 GB MacBook Pro / gaming laptop: DeepSeek Coder V2 Lite 16B. The MoE architecture makes it feel much lighter than its parameter count suggests.

8 GB MacBook Air: Qwen 2.5 Coder 7B. Tight but usable. Close everything else when you're running it.

iPhone or older Mac: Phi-3.5 Mini or Qwen 2.5 Coder 1.5B. You're not going to one-shot a Rails app, but you absolutely will get help understanding and modifying existing code. Both are bundled in PocketLLM.

License-sensitive commercial deployment: Stick to the Qwen 2.5 Coder family — Apache 2.0 is the cleanest story in local coding right now.

How to actually run these

On Mac: install LM Studio or Ollama, pick the model from the catalog, click Download. See Ollama vs LM Studio vs PocketLLM. On Linux with a GPU: llama.cpp compiled from source, or Ollama. On iPhone: a native on-device app — see our iPhone app roundup.

Integrating these with your editor is a separate question and worth a post of its own. The short version: Continue.dev, Cody, and Cursor-style plugins all support Ollama or an OpenAI-compatible localhost endpoint, which covers every model on this list.

The quick answer

The best local LLM for coding in 2026 is Qwen 2.5 Coder — at whatever size your hardware can run. 32B if you have a workstation, 14B on a gaming laptop, 7B on a cheap laptop, 1.5B on a phone. DeepSeek Coder V2 Lite is the one exception: if you have exactly 16 GB of RAM, DeepSeek's MoE architecture gives you meaningfully better quality than the 14B Qwen for roughly the same memory footprint. Everything else on this list is a specialist.