OpenClaw with Local Models: Ollama, LM Studio & llama.cpp
Run OpenClaw completely offline using local AI models. Free, private, no data leaves your machine β with support for Ollama, LM Studio, and llama.cpp.
February 8, 2026
Note: OpenClaw was previously known as MoltBot and Clawdbot. All commands are interchangeable.
Donβt want to pay for API access? Concerned about privacy? Run OpenClaw 100% locally.
Why Use Local Models?
- β Free β no API keys, no monthly bills
- β Private β data never leaves your machine
- β Offline β works without internet
- β No rate limits β run as many requests as you want
Trade-offs
- β οΈ Requires capable hardware (GPU strongly recommended)
- β οΈ Quality is lower than Claude or GPT-4 for complex reasoning
- β οΈ Slower on CPU-only setups
Option 1: Ollama (Recommended)
The easiest way to run local models.
Install Ollama
macOS / Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows:
Download the installer from ollama.com/download
Download a Model
# Llama 3.1 8B β great balance of quality and speed
ollama pull llama3.1:8b
# More powerful (requires 16GB+ RAM)
ollama pull llama3.1:70b
# Great for coding tasks
ollama pull codellama:13b
# Lightweight, great for weaker hardware
ollama pull phi3:mini
Quick Test
ollama run llama3.1:8b "Explain what OpenClaw is in one sentence."
Configure OpenClaw for Ollama
# ~/.openclaw/openclaw.yaml
agent:
model: "ollama/llama3.1:8b"
providers:
ollama:
baseUrl: "http://127.0.0.1:11434"
Start Everything
# Terminal 1 β start Ollama
ollama serve
# Terminal 2 β start OpenClaw
openclaw gateway
Option 2: LM Studio
A GUI app for running models locally. Great for experimentation.
Installation
- Download from lmstudio.ai
- Install and open the app
- Browse the built-in model catalog and download one
Start the Local Server
- Open LM Studio
- Navigate to the Local Server tab
- Select a model
- Click Start Server
The server runs at http://localhost:1234 by default.
Configure OpenClaw
agent:
model: "openai/local-model"
providers:
openai:
baseUrl: "http://127.0.0.1:1234/v1"
apiKey: "not-needed"
LM Studio emulates the OpenAI API format, so we use the openai provider.
Option 3: llama.cpp (Advanced)
Maximum performance and control. Best for power users.
Build from Source
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Standard build
make -j
# With CUDA support (NVIDIA GPU)
make LLAMA_CUDA=1 -j
# With Metal support (Apple Silicon)
make LLAMA_METAL=1 -j
Download a GGUF Model
# Example: download from Hugging Face
wget https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_K_M.gguf
Start the Server
./server -m llama-2-13b-chat.Q4_K_M.gguf --port 8080 --ctx-size 4096
Configure OpenClaw
agent:
model: "openai/local"
providers:
openai:
baseUrl: "http://127.0.0.1:8080/v1"
apiKey: "none"
Hardware Recommendations
| Model Size | RAM Needed | GPU VRAM | Speed |
|---|---|---|---|
| 7B / 8B | 8 GB | 6 GB | Fast |
| 13B | 16 GB | 10 GB | Medium |
| 30B | 32 GB | 20 GB | Slow on CPU |
| 70B | 64 GB | 40 GB | GPU required |
Apple Silicon (M1/M2/M3/M4): Excellent for local models. Metal acceleration is built in. An M2 Pro with 16 GB RAM handles 13B models comfortably.
Choosing the Right Model
| Use Case | Recommended Model |
|---|---|
| General chat | llama3.1:8b |
| Coding | codellama:13b or deepseek-coder |
| Low-end hardware | phi3:mini |
| Best quality | llama3.1:70b (GPU required) |
Troubleshooting
Ollama not responding
# Check if the service is running
curl http://localhost:11434/api/tags
Out of memory errors
Try a smaller model (e.g., phi3:mini) or reduce context size.
Slow responses on CPU
This is expected. For acceptable speeds, a GPU or Apple Silicon chip is strongly recommended.