OpenClaw is a personal, self-hosted AI assistant platform designed to run on your own hardware while connecting to the communication tools you already use. Instead of being just a chat interface, it functions as an agent system—capable of reasoning, executing tasks, and interacting with software and services across multiple steps.
A typical OpenClaw setup includes a local gateway (control plane) that manages sessions, tools, and communication channels, while the assistant itself can operate using either local large language models (LLMs) or cloud-based models.
Running OpenClaw locally
OpenClaw is designed to be flexible in where and how it runs. It can operate on:
The hardware requirements depend heavily on the model you choose, not the platform itself.
Cloud-assisted setup (lightweight)
If you connect OpenClaw to a cloud model (for example OpenAi’s Codex or Claude Opus), the local machine mainly acts as a gateway and tool executor. In this case:
- A modest system (≈8–16 GB RAM, modern CPU) is sufficient
- No dedicated GPU is required
- Low-power devices are often ideal for always-on usage
Fully local setup (heavyweight)
Running Agent capable LLM locally requires significantly more compute:
- 24–32 GB VRAM (or unified memory) is a practical starting point
- High-end GPUs or large unified memory systems are often needed for reliable tool use
- Larger models demand multi-GPU setups or advanced hardware
In short, the stronger the model, the more capable and reliable the assistant becomes—but the hardware cost rises quickly.
Why fully local operation is challenging
Running OpenClaw locally is not just about loading a model into memory. The system must continuously:
- Plan multi-step tasks
- Call tools (filesystem, browser, APIs, etc.)
- Recover from errors
- Maintain long-running context
This makes it fundamentally different from a simple chatbot.
In practice:
- Smaller models (≈4B–14B parameters) often fail at tool use or lose context
- Mid-range models (20B–35B) can work, but reliability is inconsistent
- Larger models handle agent workflows much more effectively
A realistic baseline for usable local performance today includes models in the 20B–35B range, such as:
- Gemma4 (26B – 31B)
- Qwen 3.5 (27B – 35B variants)
- GLM 4.7 Flash
Even at this level, performance varies, and larger models generally behave more reliably in long workflows.
Models you can run locally with OpenClaw
Choosing a model for local OpenClaw use depends on your hardware and how reliable you need the agent to be. Below is a practical grouping based on model size and typical real-world usability for agent workflows.
20B – 30B class (entry-level for local agents)
These models are the minimum practical tier for running OpenClaw locally with tool use. They can handle simple to moderate workflows but may struggle with long or complex tasks.
- Gemma4 26B
- Qwen3.5 27B
- Gemma4 31B
- Qwen3.5 35B
What to expect:
- Works on high-end consumer GPUs (≈24–32 GB VRAM) or large unified memory systems
- Usable for basic automation and shorter workflows
- Occasional failures in planning, memory, or tool execution
- Good starting point for experimentation
100B – 120B class (reliable high-end local)
This tier is where OpenClaw starts to feel consistently capable as an agent. These models are much better at reasoning, tool chaining, and maintaining context.
- GLM 4.5 Air 106B
- GPT OSS 120B
- Qwen3.5 122B
What to expect:
- Requires multi-GPU setups or very large unified memory (≈60–120 GB+)
- Strong improvement in reliability and task completion
- Handles longer context and multi-step workflows more consistently
190B+ class (ultra-large / near-cloud level)
These models approach cloud-level capability, especially for long-running agent tasks. They are significantly more stable in planning, recovery, and tool use.
- Step 3.5 Flash 196B
- MinMax-M2.5 230B
- Qwen3.5 397B
- GLM-5 744B
- Kimi K2.5 1T
What to expect:
- Requires enterprise-grade hardware (multi-GPU or massive unified memory systems like Apple Studio with M Ultra chip and 512 GB memory )
- Excellent performance in complex, long-running workflows
- Much better at avoiding tool errors and maintaining state
- High prompt processing token generation times on many consumer setups, especially non-GPU systems
Local vs cloud models
OpenClaw supports two main modes of operation:
1. Local model
- Runs entirely on your hardware
- Offers maximum privacy and control
- Requires significant compute resources
- May have slower responses and lower reliability (depending on model size)
2. Cloud model
- Uses external providers for inference
- Delivers better performance and larger context windows
- Requires internet access and often a subscription
- Minimal local hardware needed
Many users adopt a hybrid approach, using cloud models for demanding tasks and local models for experimentation or privacy-sensitive workflows.
What makes OpenClaw different
Unlike traditional chat apps, OpenClaw is an agentic system. It can:
- Execute commands and scripts
- Run tanks on intervals (cron)
- Communicate across platforms like messaging apps and voice interfaces
- Manage files and workflows
- Interact with browsers and external services
- Maintain context across long sessions
This continuous, tool-driven behavior is what makes it powerful—but also why it places much higher demands on both models and hardware.
