top of page

Taming OpenClaw (not easy)

  • Ken Munson
  • Mar 2
  • 4 min read

Updated: Mar 3

Hardening an Autonomous Agent Runtime: Installing and Governing OpenClaw on a VPS

From Installation to Governance: Turning an Agent Runtime into a Controlled System


Executive Summary

In this project, I deployed OpenClaw, an autonomous agent runtime, on a hardened VPS environment and layered execution governance controls around it.

The goal was not simply to “run an agent,” but to:

  • Understand how tool execution works at the OS level

  • Validate container namespace isolation

  • Separate reasoning from execution

  • Implement audit visibility for OS-level commands

  • Add operational supervision to the monitoring layer

The result was a sidecar-based governance architecture that provides deterministic visibility into tool execution — without modifying vendor runtime code.

This post documents:

  • The initial OpenClaw deployment

  • Container and network hardening

  • Tool execution verification

  • Billing and quota debugging

  • The implementation of a governance sidecar

  • Lessons learned about AI runtime architecture

Phase 1: Deploying OpenClaw in a Hardened VPS Environment

Infrastructure

  • Ubuntu 24.04 VPS

  • Docker-based deployment

  • UFW firewall restricted to home IP

  • SSH key-only authentication

  • Password login disabled

  • OpenClaw running in container

  • Docker restart policy: unless-stopped

  • Explicit volume mount: ./data:/data

This created clear isolation boundaries:

Internet → VPS → Docker → Container namespace → Agent runtime

Container-Level Isolation

One of the first validations was confirming where tool execution actually occurs.

When invoking:

uname -a

via OpenClaw’s exec tool:

  • The command executed inside the container namespace

  • Files written to /tmp remained inside container /tmp

  • No host-level filesystem modification occurred

This confirmed that Linux mount and PID namespaces were functioning as intended.

Phase 2: Understanding Tool Execution Semantics

OpenClaw exposes OS-level commands through a structured tool:

tool: exec

The agent does not directly execute shell commands.

Instead:

  1. The model emits structured intent.

  2. The runtime invokes the tool.

  3. The result is injected back into the model context.

This separation reinforced a key architectural insight:

The model reasons. The runtime executes. The system orchestrates.

Phase 3: The Governance Problem

Once exec was verified to run inside the container, the next question emerged:

How do we know which commands were truly executed?

Specifically:

  • How do we distinguish simulated output from real execution?

  • How do we audit tool usage?

  • How do we supervise execution behavior?

Rather than modifying OpenClaw’s minified runtime code (which proved brittle), I implemented a detective control sidecar.

Phase 4: Implementing an Execution Governance Sidecar

Initial Attempt (Rejected)

  • Background watcher process inside container

  • Cron-based restart logic

  • Lock-file duplication prevention

This approach worked but introduced lifecycle fragility and race conditions.

Lesson learned:

Background processes + cron supervision inside containers are brittle.

Final Architecture: Sidecar Pattern

The governance monitor was moved into its own container service:

services:  openclaw:    ...  watcher:    image: ghcr.io/hostinger/hvps-openclaw:latest    command: ["/usr/bin/python3", "-u", "/data/watch_exec.py"]    restart: unless-stopped    volumes:      - ./data:/data

What the Watcher Does

  • Tails OpenClaw session JSONL logs

  • Detects toolCall events where name = exec

  • Captures corresponding toolResult

  • Writes structured audit entries to:

/data/governance.jsonl

Each entry includes:

  • Timestamp

  • Correlation ID

  • Command string

  • Exit code

  • Duration

  • Working directory

  • Aggregated output

The UI now clearly shows:

EXECUTED (tool: exec):Linux a198d8664793 ...

This creates explicit execution provenance.

Phase 5: Billing and Quota Debugging

During testing, OpenClaw began returning:

“API rate limit reached”

Log inspection revealed the true error:

“You exceeded your current quota”

Investigation showed:

  • Organization budget was configured

  • Project budget was configured

  • But prepaid credit balance was $0

  • Auto-recharge was disabled

Enabling auto-recharge and adding credit immediately restored functionality.

Key insight:

There are three separate limit layers:

  1. Rate limits (TPM/RPM)

  2. Budget caps (org/project)

  3. Prepaid credit balance

Understanding that distinction was critical.

Final Runtime Architecture

Internet->VPS (UFW restricted)->Docker bridge network->OpenClaw container->Tool execution (exec)->Sidecar watcher container->Persistent audit log

Controls now include:

Control Type

Implementation

Preventive

UFW firewall, container namespace isolation

Detective

Sidecar execution audit logging

Corrective

Docker restart policy on watcher

What I Learned

1. AI Agents Are Just Systems

An “autonomous agent” is not magic.

It is:

  • A reasoning engine

  • A tool invocation layer

  • A runtime orchestrator

  • A containerized process

Understanding the infrastructure layer is more important than the prompt layer.

2. Separation of Concerns Is Critical

Do not modify vendor runtime code unless necessary.

Instead:

  • Add sidecars

  • Add observability

  • Add structured monitoring

This keeps the control plane separate from the reasoning plane.

3. Namespace Isolation Matters

Verifying that exec runs inside the container — not on the host — was a key security boundary validation.

Never assume isolation. Test it.

4. Governance Can Be Added Without Blocking Capability

The system still allows execution.

But now:

  • Every execution is attributable

  • Every execution is logged

  • The monitor is supervised

  • Crashes are auto-recovered

That is operational maturity.

Why This Matters

This project demonstrates competency in:

  • Containerized AI runtime deployment

  • Linux namespace verification

  • OS-level tool execution controls

  • Audit log design

  • Sidecar architecture patterns

  • Billing/quota debugging

  • Operational supervision

More importantly, it demonstrates understanding of:

  • Trust boundaries

  • Execution provenance

  • Separation of reasoning vs control plane

  • Detective vs preventive controls

  • Runtime governance for AI systems

Final Reflection

The most valuable realization from this project was not about OpenClaw itself.

It was architectural:

Autonomous agents are not just LLMs. They are systems with execution surfaces, trust boundaries, and governance requirements.

Once that mental model clicked, the project shifted from “experimenting with an agent” to “engineering a controlled runtime.”

That distinction matters.


Oh and, here is an interesting interview with the creator of OpenClaw, Peter Steinberger:



Comments


bottom of page