Run the Claude Desktop App on a Local Model with LM Studio

Claude Desktop on 3P (it was called Cowork on 3P, same thing) is the normal Claude Desktop app with one extra mode turned on. Instead of talking to Anthropic's servers, it talks to a model backend you choose. That backend can be a cloud provider, or it can be a server running on your own machine. LM Studio is exactly that kind of server, so this is the cheapest possible way to drive the Claude Desktop interface with a model you own.

No Claude account is involved at any point. You download the app, switch it into third-party mode, point it at LM Studio, and you are done. The full docs are worth a read if you want the enterprise rollout path: overview and installation. This post is the home-lab version of those steps.

Is a local model fast, and is it as good as paid Claude?

Worth answering before you bother, because if you have only ever used Claude or ChatGPT in the cloud, both answers will surprise you. A local model runs on your own computer, not in a data centre full of GPUs. Claude Desktop also works as a harness: one message from you is not one request. In the background it fires off five or ten calls to do the actual work, and every one of them has to read the whole conversation before it writes a single word. So your machine is doing a lot, on hardware that was never built for it.

Two things follow. It is slower than the cloud, sometimes a lot slower. And it is usually not as smart as the paid model. How big the gap is comes down to your machine and the model you pick.

You can claw back a lot of the speed. On Apple Silicon the MLX build of a model is much faster than the GGUF one (GGUF vs MLX on Apple Silicon), and a smaller model always answers quicker than a giant one. The bigger and smarter the model, the slower it runs: I put GLM-5.2 on a big machine and it still crawled, because the model itself is enormous (I ran GLM-5.2 locally on a Mac Studio). The intelligence gap is the harder one to close, and it is worth its own post (coming soon).

Coming straight from the cloud, the gap feels big the first time. It is slower and it is not as sharp. For drafting, private notes, and offline tinkering, trading that for something free and fully private is an easy call.

Before you start

Start from zero. If you already have these, skip ahead. Everything here is free, and once the model is downloaded you can run the whole thing offline.

Get them here: Claude Desktop, LM Studio, Node.js (coding only). A "Getting started with LM Studio" walkthrough is coming soon.

01 / Load your model and give it a Claude name

This is the whole trick, and it is the one thing the docs do not call out. When you load a model in LM Studio there is an API Identifier field. Whatever you type there is the name Claude Desktop has to ask for, and it has to look like a Claude model id. Something like claude-opus-4.8, claude-sonnet, or claude-haiku works; the giveaway is the claude- prefix. I loaded Gemma 4 26B and set its identifier to claude-opus-4.8.

Here is what I think is happening. Claude Desktop decides where to send a request from the model name. If the name looks like one of Anthropic's models, it routes down the path that hits your configured backend. Leave the real name on it, gemma or qwen, and the app does not recognise it as one of its models, so you get a flat "no models found" and nothing reaches LM Studio. The fix is cosmetic: give your local model a Claude-shaped name and the routing accepts it.

The model behind the name can be anything. The name just has to look like a Claude one: claude-opus-4.8, claude-sonnet, claude-haiku.

02 / Start the local server

With the model loaded, open the Developer tab in LM Studio (the terminal icon down the left side). Flip the server on so the status reads Running, and leave it there. The address next to Reachable at is the one you will paste into Claude Desktop in a moment.

03 / Turn on third-party mode

Third-party mode, or 3P, is the switch that tells Claude Desktop to send its requests to a backend you choose instead of Anthropic's servers. Turning it on is a two-part move: first reveal the hidden Developer menu, then open its configuration window. Launch Claude Desktop and stay on the login screen. Do not sign in. The path is the same on both platforms, you just reach the menu differently.

04 / Point it at LM Studio

The configuration window opens on the Connection section, the only one you need here. Set the provider dropdown at the top to Gateway, since LM Studio is a plain OpenAI-compatible server. Then work down the fields, the marked-up screen below shows where each one goes:

Gateway base URL: http://127.0.0.1:1234, the address you copied from LM Studio. Claude adds the /v1 path itself.
Gateway API key: required, but LM Studio does not check it. Type any value, lmstudio is fine.
Gateway auth scheme: bearer.
Credential kind: Static API key.
Model list: leave Model discovery on, or click Add model and enter claude-opus-4.8, the same name from step 1. The first entry is the default.

Hit Test connection to confirm it works. A green result means Claude reached LM Studio and ran a one-token completion through claude-opus-4.8. Then click Apply Changes in the bottom-right. The app relaunches, and the sign-in screen now offers the option to start in Claude Desktop on 3P. Pick that.

05 / You're running on a local model

Start in Claude Desktop on 3P and you land here. Two quick tells that it worked: the top-left has only Cowork and Code, no Chat tab, and the footer reads Gateway instead of a signed-in account. The model picker shows claude-opus-4.8, which is your local model wearing a Claude name.

Want to be sure the request is going local? Glance at LM Studio while you send a message; the server log lights up the moment Claude Desktop calls it. And because the only rule is the name, you can load several models at once as claude-opus-4.8, claude-sonnet-4.8, and claude-haiku-4.8, then switch between them in the picker for different jobs.

What to expect

A few honest notes before you judge the result:

It is genuinely offline. Once the model is loaded, you can pull the network. No account, no per-token bill, nothing leaves the machine.
The name is load-bearing. If you ever rename the model and drop the Anthropic-style name, it stops working and you get "no models found". That is the first thing to check.

One more thing: is it really local?

While Cowork is working, LM Studio shows the model generating in real time. That GEN row is claude-opus-4.8 doing the thinking, 15.64 GB resident, all on the machine.

And here is the fun part. I asked it what model it is. It answered, with total confidence, that it is Claude 4.8 Opus, trained by Anthropic.

It is not. That is my Gemma model talking. The Claude name only lives in the slug and the system prompt, so the model genuinely thinks it is Claude and has no idea it is Gemma under the hood. The badge says Claude, the weights say Gemma. Same local model, doing real work, just wearing a borrowed name.