Searches like "Claude Code local LLM" and "Claude Code with Ollama" are rising fast, and the framing is usually wrong. Anthropic does not distribute Claude model weights for local self-hosting — official Claude access is via the Claude apps/API and cloud providers. What you can do is keep using the Claude Code CLI as your coding interface while pointing it at a local open-weights model (Qwen 3 Coder, DeepSeek Coder, etc.) served through an Anthropic-compatible endpoint. This post explains the actual mechanism, what each tool documents, and where it falls short.
Claude Code reads the ANTHROPIC_BASE_URL environment variable, so you can point it at a local server that speaks Anthropic's Messages API instead of Anthropic's cloud. Ollama documents Anthropic API compatibility (usable by tools like Claude Code), and LM Studio documents a Claude Code path via its Anthropic-compatible /v1/messages endpoint. The model answering is then a local open model — not Claude. Plain OpenAI-compatible endpoints need a translation proxy. Agentic quality depends heavily on the local model.
What "Claude Code with a local LLM" actually means
Claude Code is Anthropic's command-line coding agent. It talks to a backend that implements Anthropic's Messages API. By default that backend is Anthropic's cloud (running Claude). Claude Code respects an ANTHROPIC_BASE_URL override (plus ANTHROPIC_AUTH_TOKEN for a Bearer token, or ANTHROPIC_API_KEY), which lets you redirect it to any endpoint that implements enough of the Anthropic Messages API for Claude Code — including a server on your own machine. "Claude Code with a local LLM" therefore means: Claude Code as the client, an open model on your hardware as the engine. Model inference can then stay local/offline, but Claude Code itself may still make ancillary network calls (auth, updates, telemetry, error reporting, web-fetch safety checks) unless you disable or avoid those. You do not get Claude's model quality, because Claude's weights aren't running.
Option A — Ollama (documented Anthropic compatibility)
Ollama documents Anthropic Messages API compatibility for Claude Code (on recent Ollama releases). The base URL is the server root (http://localhost:11434), not a /v1 path. The documented manual form:
# documented mechanism; not tested in this article
ollama pull qwen3-coder
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
claude --model qwen3-coder
Ollama also documents a shortcut (ollama launch claude) that wires this up for you. Treat the above as documented, not tested here — exact flags and minimum versions change between releases, so Ollama's Anthropic-compatibility and Claude Code docs are the source of truth. The privacy/safety properties of running Ollama this way (including that :cloud models are not local inference) are covered in Is Ollama private and safe?.
Option B — LM Studio (documented Claude Code path)
LM Studio documents a Claude Code path via its Anthropic-compatible POST /v1/messages (added in LM Studio 0.4.1). The base URL is the server root (default http://localhost:1234); /v1/messages is the path the client calls. Documented setup:
# documented mechanism; not tested in this article
lms server start --port 1234
export ANTHROPIC_BASE_URL=http://localhost:1234
export ANTHROPIC_AUTH_TOKEN=lmstudio
claude --model openai/gpt-oss-20b
Functionally the same idea as Option A, with LM Studio's GUI for model management — see our Ollama vs LM Studio vs PocketLLM comparison.
Option C — OpenAI-compatible servers + a translation proxy
Many local runtimes expose an OpenAI-compatible API only, not an Anthropic-compatible one. Claude Code's gateway docs require Anthropic Messages (or Bedrock/Vertex) format — plain OpenAI Chat Completions is not supported directly, so you need a translation layer (a proxy mapping Anthropic Messages ↔ OpenAI Chat Completions) in front of it. Community gateways (e.g. LiteLLM-style proxies) do this, but note: Anthropic does not endorse or audit third-party proxies, and routing your code/prompts through one adds supply-chain and data-handling risk. Pick an actively maintained one, run it locally, and understand what it logs. This is the most fragile of the three options.
Which local model to use
The client is Claude Code; the brains are whatever open model you serve. For coding, that means the Qwen 2.5 Coder family or DeepSeek Coder — see best local LLM for coding for the full ranking, and best Ollama models for Mac for what fits your RAM. Rule of thumb from those posts: 8 GB → Qwen 2.5 Coder 7B; 16 GB → DeepSeek Coder V2 Lite 16B; workstation → Qwen 2.5 Coder 32B.
Honest limitations
- It is not Claude. You're trading frontier model quality for privacy, offline use, and zero per-token cost. A 7B–32B local model is materially weaker than Claude on hard agentic coding tasks.
- Agentic features can degrade. Claude Code leans on strong tool-use/function-calling and long context. Local models vary widely here; multi-step edits and tool loops may misfire where Claude wouldn't.
- API surface drift. This relies on third-party Anthropic-API compatibility. Endpoints, flags, and supported features change — treat any command as documented, and verify against the tool's current docs before relying on it.
- Context limits. Claude Code is context-heavy (Ollama recommends a large context window; LM Studio suggests well over ~25k tokens). Local models often have smaller usable context than Claude; large repos may not fit.
- Partial API. These endpoints implement enough of the Anthropic Messages API for Claude Code, not all of it — prompt caching, token counting, PDFs/citations, and some tool controls may differ from Anthropic.
- Minimum versions. Anthropic compatibility is recent: LM Studio added
/v1/messagesin 0.4.1, and Ollama's support is on newer releases. Update both before relying on this. - "Local" has an asterisk. Ollama
:cloudmodels and LM Studio remote-device features are not on-device local inference, and Claude Code makes some network calls regardless.
The quick answer
Anthropic doesn't ship Claude weights to self-host, but you can run Claude Code against a local open model by pointing ANTHROPIC_BASE_URL at an Anthropic-compatible local server. Ollama and LM Studio both document this; OpenAI-only servers need a translation proxy. Model inference can stay local/offline (Claude Code itself still makes some ancillary network calls), and it's materially weaker than cloud Claude — use it for privacy-sensitive or offline coding with a strong local coder model, not as a free Claude replacement.
Last verified: 2026-05-16. Mechanism and env vars per Anthropic's Claude Code documentation (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, model config, LLM-gateway requirements, data-usage); endpoints per Ollama's Anthropic-compatibility / Claude Code docs and LM Studio's Claude Code docs. Commands are the documented approach, not tested in this article — endpoints, flags, and minimum versions change between releases; confirm in each tool's current docs.