
LM Studio is a free desktop app for Mac, Windows, and Linux that you download from lmstudio.ai. No account, no payment, no trial that runs out. It lets you download, run, and manage AI models directly on your own machine.
The models it runs are open-weight models. Plain version: a company trains a model and publishes the finished result, the weights, for anyone to download and run. Open weights means you can download and run it. Either way, the model lives on your computer, not on someone else's server.
It is the easiest way in. The interface is clean and visual, and while the first screen can look like a wall of settings, you really only touch two or three of them to start chatting. It also grows with you: good for a complete beginner, and still good once you are doing advanced work weeks later, so you never have to switch apps. For the bigger question of how local compares to the cloud, see Local AI vs cloud AI (coming soon).
Is local AI any good, and what machine do I need?
Honest answer first. A local model will not match a frontier cloud model, and it will be slower, more so on a modest machine. What you get instead is free, private, and offline. How good and how fast comes down to two things: your hardware, and the size of the model you pick. Bigger models are smarter and slower; smaller ones are quicker and lighter.

One thing worth knowing up front: on Apple Silicon, the MLX build of a model (MLX is Apple's runtime) is much faster than the GGUF build (more on why in GGUF vs MLX on Apple Silicon). And the very large models stay slow no matter the machine, as I found running GLM-5.2 locally. Start small; you can always size up.


01 / Install LM Studio
Download the build for your machine from lmstudio.ai/download. It runs on macOS, Windows, and Linux. Open it, no sign-up, and here is the app. A quick tour of the four things you will actually use:

02 / Find and download a model
Open the search tab and look for a model. If this is your first one, search gemma and pick a small build. Good first picks: Gemma 4 E2B or E4B on a 16 GB machine, or the newer Gemma 4 12B if you have the memory to spare. LM Studio flags whether a model will fit your RAM, which saves you downloading something your machine cannot load. Hit download and wait; models are a few gigabytes.
Quantization: the 4-bit version is the same model packed down smaller. It loads faster and uses less memory, with a small quality cost. A fine default.
MLX or GGUF: on Apple Silicon, prefer the MLX build for speed. On other hardware, GGUF is the safe pick. The GGUF vs MLX post goes deeper.


03 / Load the model
Everything you have downloaded shows up in your model list, ready to load. You open that list by clicking the model bar at the top of the window, it is only purple once a model is loaded. You can keep a lot here.


Select a model to load it into memory. You can leave every setting at its default. The one worth touching is Max Context Length: how much of the conversation the model can hold at once. A bigger context uses more memory, so start where it lands and raise it only if you need to. Then click Load.

One handy difference between formats: a GGUF model estimates the memory it will need up front, so you can see whether it fits before loading. MLX does not show this yet, so on MLX you are loading a little blind.

04 / Chat with it
Go to the chat tab and type. The first reply is slower while the model warms up, then it settles into a steady pace. Each reply shows its speed in tokens per second, the token count, and how long it took. That is your machine generating the answer, with nothing going to a server. Congratulations, you are running AI locally.

05 / Turn on the local server (optional)
This is the step that makes LM Studio more than a chat app. First, switch Developer mode on: open Settings (the gear, bottom-left), go to Developer, and toggle Developer mode on.

That adds the Developer tab to the left rail. Open it and start the server. LM Studio now exposes an OpenAI-compatible endpoint on your machine, by default at http://127.0.0.1:1234. Any app that can talk to that kind of endpoint can now use your local model, including coding tools and Claude Desktop.


So what do you do with the server running? You point other tools at it, the same way they would talk to a cloud API, and get AI running inside the apps you already use. The one I would start with: point Claude Code and Claude Cowork at it and run that whole interface on your free local model. Here is how to set that up, step by step.
What to expect
It is genuinely offline. Once the model is downloaded, you can pull the network. No account, no bill, nothing leaves the machine.
Speed tracks your machine. A small model on a recent laptop feels fine; a big model on a modest machine will crawl. Start small.
Quality tracks the model. A local model is a step below frontier cloud models. You are buying privacy and zero cost, not the best answer money can buy.


Read this next: Run Claude Cowork on a local model with LM Studio.

