← Back to blog

11 Best Small Language Models in 2026 (SLM Roundup)

A "small language model" is a fuzzy term, but here's the useful definition: an SLM is a language model small enough to run on a phone, a laptop without a discrete GPU, or a cheap server. In 2026 that roughly means under 4 billion parameters. This post ranks the 11 best SLMs released or actively maintained in 2026 by the only metrics that matter for on-device use: how good is the output, how small is the file, and how fast does it run on real hardware.

Short version: Llama 3.2 3B and Phi-3.5 Mini are the current leaders. Gemma 2 2B and Qwen 2.5 1.5B Coder are the best specialists. SmolLM2 1.7B is the best pure research-lineage option. Jump to the table, or join the PocketLLM waitlist to get them on your phone.

Why SLMs matter in 2026

Two years ago, "small" meant "toy." A 2 B model couldn't hold a coherent conversation. Today's 2 B models out-perform yesterday's 7 B models on most benchmarks, because the training recipes got better and the data curation got much better. The practical consequence: private on-device AI stopped being a theoretical argument and became a product. Every model on this list is good enough for daily use. The interesting question is no longer "do SLMs work?" — it's "which SLM fits your specific constraints?"

How we ranked

  • Quality (35%): MMLU, HumanEval, and GSM8K from the model's primary source, weighted toward general-purpose tasks unless the model is a specialist.
  • Size on disk (25%): At Q4 quantization — what you actually download.
  • Speed on iPhone 15 Pro (15%): Tokens per second in llama.cpp or Core ML.
  • License (15%): Apache and MIT score highest. Restricted commercial licenses penalized.
  • Ecosystem (10%): How well-supported the model is in llama.cpp, MLX, Core ML, and Ollama.

The 11 best small language models in 2026

1. Llama 3.2 3B — 94/100

The best all-around small model in 2026. Meta's 3.2 3B is the go-to answer for "run a language model on a phone." Trained with the same recipe that made 3.1 8B a success, then distilled aggressively. Runs at 30+ tok/s on an iPhone 15 Pro at Q4, fits in 2 GB, and holds up on MMLU against 7B models from 2024. Llama Community License allows commercial use.

2. Phi-3.5 Mini (3.8B) — 92/100

Microsoft's "textbook-trained" small model. Punches absurdly far above its weight on reasoning, math, and code (HumanEval in the mid-60s from 3.8 B parameters is exceptional). MIT license, 2.4 GB at Q4. The knock is thinner general-knowledge coverage — Phi 3.5 knows less about pop culture than Llama does, because the training data was deliberately narrower. Use Phi for structured tasks, use Llama for open chat.

3. Gemma 2 2B — 90/100

Google's small model, and the strongest sub-3B option. Excellent safety tuning out of the box. Particularly good at multilingual tasks. Runs in 1.6 GB at Q4 and on every consumer-grade Apple Silicon Mac. Slightly weaker than Llama 3.2 3B on reasoning, notably stronger on languages other than English.

4. Qwen 2.5 Coder 1.5B — 88/100

The best coding model at 1.5B parameters — by a lot. HumanEval in the low 60s from a model that fits in 900 MB. Apache 2.0. Use it as a local completion engine, a lightweight code-explain model, or the "smart autocomplete" layer in front of a bigger coder like DeepSeek V2 Lite. See our best LLMs for coding roundup for the whole coding ladder.

5. SmolLM2 1.7B — 86/100

Hugging Face's in-house small model. Apache 2.0. Trained on a carefully filtered dataset and genuinely good for its size. The best pick when you want an open, community-friendly model with a clean provenance story — every training source is documented.

6. Qwen 2.5 3B — 83/100

The general-purpose 3B variant of Qwen 2.5. Apache 2.0, strong multilingual, competitive with Llama 3.2 3B on general tasks. Slightly behind Llama on English reasoning, slightly ahead on Chinese and Japanese. A solid alternative if the Llama Community License concerns you.

7. Llama 3.2 1B — 78/100

Meta's 1 B sibling to our #1 pick. Useful for speculative decoding (running a small model as a draft for a bigger one) and for devices where 3 B doesn't fit. Quality is noticeably weaker than the 3 B for only a ~2× speedup, so unless RAM is the hard constraint, step up to the 3 B.

8. Phi-3 Mini (3.8B, older) — 75/100

The original Phi-3 Mini from 2024. Largely superseded by Phi-3.5 Mini but included because it has a huge ecosystem of fine-tunes and the original training paper is still one of the clearest explanations of "textbook-data" SLM training.

9. StableLM 2 1.6B — 72/100

Stability AI's small model. Decent quality, non-commercial research license, multilingual. Runs at good speed. Held back from higher placement mainly by the restrictive license and the fact that SmolLM2 has largely eaten its niche.

10. TinyLlama 1.1B — 68/100

The OG tiny model. Apache 2.0, 700 MB at Q4, runs on absolutely anything. Quality has been beaten by SmolLM2 and Llama 3.2 1B, but TinyLlama remains the best choice when you need the smallest coherent model and the largest fine-tuning community for sub-2 B models.

11. Danube 3 500M — 60/100

H2O's 500 M parameter model. Technically a language model, practically closer to a smart completion engine. Worth mentioning because it fits on basically any device and is the smallest model in this ranking that produces coherent sentences most of the time. Suitable for edge deployment, embedded, and extreme low-RAM scenarios.

The summary table

#ModelParamsSize (Q4)LicenseScore
1Llama 3.2 3B3B2.0 GBLlama Community94
2Phi-3.5 Mini3.8B2.4 GBMIT92
3Gemma 2 2B2B1.6 GBGemma Terms90
4Qwen 2.5 Coder 1.5B1.5B0.9 GBApache 2.088
5SmolLM2 1.7B1.7B1.1 GBApache 2.086
6Qwen 2.5 3B3B1.9 GBApache 2.083
7Llama 3.2 1B1B0.8 GBLlama Community78
8Phi-3 Mini (2024)3.8B2.4 GBMIT75
9StableLM 2 1.6B1.6B1.0 GBStability NC72
10TinyLlama 1.1B1.1B0.7 GBApache 2.068
11Danube 3 500M0.5B0.4 GBApache 2.060

Which SLM should you pick?

For general on-device chat: Llama 3.2 3B. The default answer. Best quality-per-megabyte in the sub-4B range.

For reasoning, math, and code: Phi-3.5 Mini. The training recipe makes a real difference on structured tasks.

For multilingual work: Gemma 2 2B or Qwen 2.5 3B.

For truly tiny footprints: Llama 3.2 1B above ~2 GB budgets, SmolLM2 or TinyLlama below that.

For coding-specific tasks: Qwen 2.5 Coder 1.5B. It's the best sub-2B coding model by a surprising margin.

For a breakdown of the 3B-through-70B models, see our 15 Best Local LLM Models in 2026.

How to actually run an SLM

On iPhone, you want an app that handles Core ML conversion for you — see Best On-Device LLM Apps for iPhone. On Mac, LM Studio or Ollama handle model management in one step. On Linux with a GPU, llama.cpp compiled from source remains the best option.

On PocketLLM specifically, the top five SLMs in this ranking are bundled as one-tap downloads with pre-converted Core ML weights. Join the waitlist.

The quick answer

The best small language model in 2026 for general use is Llama 3.2 3B. For reasoning and code, Phi-3.5 Mini. For safety-sensitive multilingual work, Gemma 2 2B. Everything else on this list is a specialization of those three. SLMs are finally good enough that "put it on your phone" is a legitimate product decision, not a research curiosity — which is exactly why we built PocketLLM.

Every top SLM, on your iPhone.

PocketLLM ships the top-ranked small language models as one-tap downloads. Zero telemetry. No account. Join the waitlist.

Join the waitlist