OpenClaw with Local Models: Ollama, LM Studio & llama.cpp | OpenClaw

Note: OpenClaw was previously known as MoltBot and Clawdbot. All commands are interchangeable.

Don’t want to pay for API access? Concerned about privacy? Run OpenClaw 100% locally.

Why Use Local Models?

✅ Free — no API keys, no monthly bills
✅ Private — data never leaves your machine
✅ Offline — works without internet
✅ No rate limits — run as many requests as you want

Trade-offs

⚠️ Requires capable hardware (GPU strongly recommended)
⚠️ Quality is lower than Claude or GPT-4 for complex reasoning
⚠️ Slower on CPU-only setups

Option 1: Ollama (Recommended)

The easiest way to run local models.

Install Ollama

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows:
Download the installer from ollama.com/download

Download a Model

# Llama 3.1 8B — great balance of quality and speed
ollama pull llama3.1:8b

# More powerful (requires 16GB+ RAM)
ollama pull llama3.1:70b

# Great for coding tasks
ollama pull codellama:13b

# Lightweight, great for weaker hardware
ollama pull phi3:mini

Quick Test

ollama run llama3.1:8b "Explain what OpenClaw is in one sentence."

Configure OpenClaw for Ollama

# ~/.openclaw/openclaw.yaml
agent:
  model: "ollama/llama3.1:8b"

  providers:
    ollama:
      baseUrl: "http://127.0.0.1:11434"

Start Everything

# Terminal 1 — start Ollama
ollama serve

# Terminal 2 — start OpenClaw
openclaw gateway

Option 2: LM Studio

A GUI app for running models locally. Great for experimentation.

Installation

Download from lmstudio.ai
Install and open the app
Browse the built-in model catalog and download one

Start the Local Server

Open LM Studio
Navigate to the Local Server tab
Select a model
Click Start Server

The server runs at http://localhost:1234 by default.

Configure OpenClaw

agent:
  model: "openai/local-model"

  providers:
    openai:
      baseUrl: "http://127.0.0.1:1234/v1"
      apiKey: "not-needed"

LM Studio emulates the OpenAI API format, so we use the openai provider.

Option 3: llama.cpp (Advanced)

Maximum performance and control. Best for power users.

Build from Source

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Standard build
make -j

# With CUDA support (NVIDIA GPU)
make LLAMA_CUDA=1 -j

# With Metal support (Apple Silicon)
make LLAMA_METAL=1 -j

Download a GGUF Model

# Example: download from Hugging Face
wget https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_K_M.gguf

Start the Server

./server -m llama-2-13b-chat.Q4_K_M.gguf --port 8080 --ctx-size 4096

Configure OpenClaw

agent:
  model: "openai/local"

  providers:
    openai:
      baseUrl: "http://127.0.0.1:8080/v1"
      apiKey: "none"

Hardware Recommendations

Model Size	RAM Needed	GPU VRAM	Speed
7B / 8B	8 GB	6 GB	Fast
13B	16 GB	10 GB	Medium
30B	32 GB	20 GB	Slow on CPU
70B	64 GB	40 GB	GPU required

Apple Silicon (M1/M2/M3/M4): Excellent for local models. Metal acceleration is built in. An M2 Pro with 16 GB RAM handles 13B models comfortably.

Choosing the Right Model

Use Case	Recommended Model
General chat	`llama3.1:8b`
Coding	`codellama:13b` or `deepseek-coder`
Low-end hardware	`phi3:mini`
Best quality	`llama3.1:70b` (GPU required)

Troubleshooting

Ollama not responding

# Check if the service is running
curl http://localhost:11434/api/tags

Out of memory errors
Try a smaller model (e.g., phi3:mini) or reduce context size.

Slow responses on CPU
This is expected. For acceptable speeds, a GPU or Apple Silicon chip is strongly recommended.