← Back to blog

How to Set Up a Self-Hosted AI Chatbot Without a Server

Most "self-hosted chatbot" tutorials assume you'll stand up a server — install a runtime, open a port, manage a service. This one doesn't. The simplest way to self-host an AI chatbot is to skip the server entirely and run the model on the device you already carry. The model and the runtime live inside one app; you install it, download a model once, and chat — offline, with nothing leaving the device. This is a strictly procedural walkthrough of that no-server path, from install to a verified-offline chatbot. If you want to compare this against desktop and home-server approaches first, that's covered in our self-hosted AI guide; here we just get one running.

Want the short version? Jump to the setup summary. Following along on iPhone? PocketLLM is the no-server path — install, download, chat offline. Join the waitlist.

No-server setup: quick answer

To self-host a chatbot without a server, use a fully on-device app. Install it, download a 3B model (about 2 GB at Q4) once over Wi-Fi, then chat. There's no server to configure, no port to open, and no account. Verify it's truly local by turning off Wi-Fi and cellular and sending a message — if it answers, the model is running on the device and nothing is leaving. The whole setup takes a few minutes, most of it the one-time model download.

What you need before you start

  • A recent phone or laptop. On a phone you'll run a 1B–3B model; a 3B needs roughly 4 GB of RAM available. On a laptop with 8 GB+ you can run a 7B model later if you want.
  • Storage headroom. Budget about 2 GB free for a 3B model at Q4, or under 1 GB for a 1B model.
  • A Wi-Fi connection for the one download. You need the network once, to fetch the model. After that the chatbot runs offline.

Step 1 — Install the on-device app

Install an app that bundles both the model runtime and a model catalog, so there's nothing separate to set up. On iPhone that's PocketLLM; on a Mac you can use a desktop runner, though that drifts toward the "local server" pattern this tutorial avoids — for a true no-server experience, the phone app is the cleanest. Installing is a normal app install: no terminal, no dependencies, no service to register. When it opens, you'll see a model picker rather than a chat screen, because you haven't downloaded a model yet.

Step 2 — Download a model

From the model picker, choose a 3B model — Llama 3.2 3B is the default recommendation because it balances quality and size, fitting in about 2 GB at Q4 and running at usable speed on recent hardware. Tap to download. This is the one step that needs the network: the model file comes down over Wi-Fi and is stored on the device. If your device is tight on storage or you want snappier responses, pick a 1B–2B model instead at some quality cost. For more on choosing, see our best on-device LLM apps roundup referenced from the broader catalog. Once the download finishes, the model is yours — you won't fetch it again.

Step 3 — Start your first chat

Open a new conversation and send a message — ask it to draft an email, summarize a block of text you paste in, or answer a reference question. The first response may take a moment as the model loads into memory; after that, generation is steady. On a 3B model on recent hardware you'll see responsive, conversational output. Everything you type is processed locally by the model you just downloaded. There's no round trip, no server doing the work — the chatbot is running entirely on your device.

Step 4 — Verify it runs offline

This is the step that proves the setup is genuinely self-hosted. Turn off Wi-Fi and cellular — put the device in airplane mode — and send another message. If the chatbot answers normally, you've confirmed the model is running on the device and nothing is leaving it. This is the test that separates real on-device tools from ones quietly relying on a server. With a true on-device app like PocketLLM, the chat keeps working with no connection at all. For the step-by-step on running offline specifically, our how to run AI offline on iPhone guide goes deeper.

Step 5 — Tune for your device

If responses feel slow, switch to a smaller model — a 1B–2B model generates faster at some quality cost. If you have storage and RAM to spare and want higher quality, keep the 3B or, on a laptop, step up to a 7B model. You can keep multiple models downloaded and switch between them per conversation. Conversation history stays on the device; clear it whenever you want. None of this involves a server — it's all local settings on a local model.

Setup summary

StepActionNetwork needed?Time
1Install the on-device appApp install only~1 min
2Download a 3B model (~2 GB)Yes (one time)~2–4 min
3Start your first chatNoInstant
4Verify offline (airplane mode)No (by design)~1 min
5Tune model size for your deviceNoAs needed

Troubleshooting common issues

The download stalls: a 3B model is ~2 GB, so a flaky connection can interrupt it — resume on stable Wi-Fi. Responses are slow: you're likely running a model too large for the device; drop to a 1B–2B model. It won't answer offline: confirm the model finished downloading before you went offline, and that you picked a genuinely on-device app rather than a thin client to a cloud service. You're out of storage: delete unused models or pick a smaller one; you don't need to keep several at once.

Where to go from here

You now have a self-hosted chatbot with no server, running offline on the device in your hand. If you later want bigger models or to serve them to other devices, that's the territory of desktop and home-server setups — and a different trade-off in maintenance and network exposure. To see how the no-server path compares to those, and to decide whether you need more, read the full self-hosted AI guide. If you'd rather compare the desktop tooling directly, see Ollama vs LM Studio vs PocketLLM.

Frequently asked questions

Can you self-host an AI chatbot without a server?

Yes. A fully on-device app runs the model right on your phone or laptop, so there is no server to set up, expose, or maintain. You install the app, download a model once, and chat — the model and runtime are bundled together. This is the simplest form of self-hosting because the only machine involved is the one in your hand.

How long does it take to set up a self-hosted AI chatbot this way?

On the on-device path it takes a few minutes, most of which is the one-time model download. Installing the app is a normal app install, and downloading a 3B model is a roughly 2 GB download over Wi-Fi. After that, launching and chatting is instant, and the model stays on the device so you never download it again.

Which model should I download for a self-hosted chatbot on a phone?

On a phone, start with a 3B model such as Llama 3.2 3B — it fits in about 2 GB at Q4 and runs at usable speed while giving good chat quality. If your device is tight on storage or you want faster responses, drop to a 1B to 2B model at some quality cost. Larger 7B models are better suited to a laptop with 8 GB or more of RAM.

How do I confirm my self-hosted chatbot is running offline?

Turn off Wi-Fi and cellular, then send a message. If the chatbot answers normally, the model is running on the device and nothing is leaving it. If it stops working, the tool was relying on a server and is not truly self-hosted. The offline test is the single clearest way to verify that your data stays local.

Do I need an account to set up a self-hosted AI chatbot?

On the on-device path, no. There is no server to log in to, so there is no account, email, or phone number required. PocketLLM, for example, has never collected accounts — you install it, download a model, and start chatting. The absence of an account is part of what makes the setup private and fast.

Your chatbot, no server required.

PocketLLM is the no-server path — install, download a model, chat offline. No account, zero telemetry. Join the waitlist.

Join the waitlist