OpenClaw with Local Models: Ollama, LM Studio & llama.cpp

Run OpenClaw completely offline using local AI models. Free, private, no data leaves your machine β€” with support for Ollama, LM Studio, and llama.cpp.

February 8, 2026

Note: OpenClaw was previously known as MoltBot and Clawdbot. All commands are interchangeable.

Don’t want to pay for API access? Concerned about privacy? Run OpenClaw 100% locally.

Why Use Local Models?

  • βœ… Free β€” no API keys, no monthly bills
  • βœ… Private β€” data never leaves your machine
  • βœ… Offline β€” works without internet
  • βœ… No rate limits β€” run as many requests as you want

Trade-offs

  • ⚠️ Requires capable hardware (GPU strongly recommended)
  • ⚠️ Quality is lower than Claude or GPT-4 for complex reasoning
  • ⚠️ Slower on CPU-only setups

The easiest way to run local models.

Install Ollama

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows:
Download the installer from ollama.com/download

Download a Model

# Llama 3.1 8B β€” great balance of quality and speed
ollama pull llama3.1:8b

# More powerful (requires 16GB+ RAM)
ollama pull llama3.1:70b

# Great for coding tasks
ollama pull codellama:13b

# Lightweight, great for weaker hardware
ollama pull phi3:mini

Quick Test

ollama run llama3.1:8b "Explain what OpenClaw is in one sentence."

Configure OpenClaw for Ollama

# ~/.openclaw/openclaw.yaml
agent:
  model: "ollama/llama3.1:8b"

  providers:
    ollama:
      baseUrl: "http://127.0.0.1:11434"

Start Everything

# Terminal 1 β€” start Ollama
ollama serve

# Terminal 2 β€” start OpenClaw
openclaw gateway

Option 2: LM Studio

A GUI app for running models locally. Great for experimentation.

Installation

  1. Download from lmstudio.ai
  2. Install and open the app
  3. Browse the built-in model catalog and download one

Start the Local Server

  1. Open LM Studio
  2. Navigate to the Local Server tab
  3. Select a model
  4. Click Start Server

The server runs at http://localhost:1234 by default.

Configure OpenClaw

agent:
  model: "openai/local-model"

  providers:
    openai:
      baseUrl: "http://127.0.0.1:1234/v1"
      apiKey: "not-needed"

LM Studio emulates the OpenAI API format, so we use the openai provider.


Option 3: llama.cpp (Advanced)

Maximum performance and control. Best for power users.

Build from Source

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Standard build
make -j

# With CUDA support (NVIDIA GPU)
make LLAMA_CUDA=1 -j

# With Metal support (Apple Silicon)
make LLAMA_METAL=1 -j

Download a GGUF Model

# Example: download from Hugging Face
wget https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_K_M.gguf

Start the Server

./server -m llama-2-13b-chat.Q4_K_M.gguf --port 8080 --ctx-size 4096

Configure OpenClaw

agent:
  model: "openai/local"

  providers:
    openai:
      baseUrl: "http://127.0.0.1:8080/v1"
      apiKey: "none"

Hardware Recommendations

Model SizeRAM NeededGPU VRAMSpeed
7B / 8B8 GB6 GBFast
13B16 GB10 GBMedium
30B32 GB20 GBSlow on CPU
70B64 GB40 GBGPU required

Apple Silicon (M1/M2/M3/M4): Excellent for local models. Metal acceleration is built in. An M2 Pro with 16 GB RAM handles 13B models comfortably.


Choosing the Right Model

Use CaseRecommended Model
General chatllama3.1:8b
Codingcodellama:13b or deepseek-coder
Low-end hardwarephi3:mini
Best qualityllama3.1:70b (GPU required)

Troubleshooting

Ollama not responding

# Check if the service is running
curl http://localhost:11434/api/tags

Out of memory errors
Try a smaller model (e.g., phi3:mini) or reduce context size.

Slow responses on CPU
This is expected. For acceptable speeds, a GPU or Apple Silicon chip is strongly recommended.