Taming OpenClaw (not easy)

Ken Munson
Mar 2
4 min read

Updated: Mar 3

Hardening an Autonomous Agent Runtime: Installing and Governing OpenClaw on a VPS

From Installation to Governance: Turning an Agent Runtime into a Controlled System

Executive Summary

In this project, I deployed OpenClaw, an autonomous agent runtime, on a hardened VPS environment and layered execution governance controls around it.

The goal was not simply to “run an agent,” but to:

Understand how tool execution works at the OS level
Validate container namespace isolation
Separate reasoning from execution
Implement audit visibility for OS-level commands
Add operational supervision to the monitoring layer

The result was a sidecar-based governance architecture that provides deterministic visibility into tool execution — without modifying vendor runtime code.

This post documents:

The initial OpenClaw deployment
Container and network hardening
Tool execution verification
Billing and quota debugging
The implementation of a governance sidecar
Lessons learned about AI runtime architecture

Phase 1: Deploying OpenClaw in a Hardened VPS Environment

Infrastructure

Ubuntu 24.04 VPS
Docker-based deployment
UFW firewall restricted to home IP
SSH key-only authentication
Password login disabled
OpenClaw running in container
Docker restart policy: unless-stopped
Explicit volume mount: ./data:/data

This created clear isolation boundaries:

Internet → VPS → Docker → Container namespace → Agent runtime

Container-Level Isolation

One of the first validations was confirming where tool execution actually occurs.

When invoking:

uname -a

via OpenClaw’s exec tool:

The command executed inside the container namespace
Files written to /tmp remained inside container /tmp
No host-level filesystem modification occurred

This confirmed that Linux mount and PID namespaces were functioning as intended.

Phase 2: Understanding Tool Execution Semantics

OpenClaw exposes OS-level commands through a structured tool:

tool: exec

The agent does not directly execute shell commands.

Instead:

The model emits structured intent.
The runtime invokes the tool.
The result is injected back into the model context.

This separation reinforced a key architectural insight:

The model reasons. The runtime executes. The system orchestrates.

Phase 3: The Governance Problem

Once exec was verified to run inside the container, the next question emerged:

How do we know which commands were truly executed?

Specifically:

How do we distinguish simulated output from real execution?
How do we audit tool usage?
How do we supervise execution behavior?

Rather than modifying OpenClaw’s minified runtime code (which proved brittle), I implemented a detective control sidecar.

Phase 4: Implementing an Execution Governance Sidecar

Initial Attempt (Rejected)

Background watcher process inside container
Cron-based restart logic
Lock-file duplication prevention

This approach worked but introduced lifecycle fragility and race conditions.

Lesson learned:

Background processes + cron supervision inside containers are brittle.

Final Architecture: Sidecar Pattern

The governance monitor was moved into its own container service:

services:  openclaw:    ...  watcher:    image: ghcr.io/hostinger/hvps-openclaw:latest    command: ["/usr/bin/python3", "-u", "/data/watch_exec.py"]    restart: unless-stopped    volumes:      - ./data:/data

What the Watcher Does

Tails OpenClaw session JSONL logs
Detects toolCall events where name = exec
Captures corresponding toolResult
Writes structured audit entries to:

/data/governance.jsonl

Each entry includes:

Timestamp
Correlation ID
Command string
Exit code
Duration
Working directory
Aggregated output

The UI now clearly shows:

EXECUTED (tool: exec):Linux a198d8664793 ...

This creates explicit execution provenance.

Phase 5: Billing and Quota Debugging

During testing, OpenClaw began returning:

“API rate limit reached”

Log inspection revealed the true error:

“You exceeded your current quota”

Investigation showed:

Organization budget was configured
Project budget was configured
But prepaid credit balance was $0
Auto-recharge was disabled

Enabling auto-recharge and adding credit immediately restored functionality.

Key insight:

There are three separate limit layers:

Rate limits (TPM/RPM)
Budget caps (org/project)
Prepaid credit balance

Understanding that distinction was critical.

Final Runtime Architecture

Internet->VPS (UFW restricted)->Docker bridge network->OpenClaw container->Tool execution (exec)->Sidecar watcher container->Persistent audit log

Controls now include:

Control Type	Implementation
Preventive	UFW firewall, container namespace isolation
Detective	Sidecar execution audit logging
Corrective	Docker restart policy on watcher

What I Learned

1. AI Agents Are Just Systems

An “autonomous agent” is not magic.

It is:

A reasoning engine
A tool invocation layer
A runtime orchestrator
A containerized process

Understanding the infrastructure layer is more important than the prompt layer.

2. Separation of Concerns Is Critical

Do not modify vendor runtime code unless necessary.

Instead:

Add sidecars
Add observability
Add structured monitoring

This keeps the control plane separate from the reasoning plane.

3. Namespace Isolation Matters

Verifying that exec runs inside the container — not on the host — was a key security boundary validation.

Never assume isolation. Test it.

4. Governance Can Be Added Without Blocking Capability

The system still allows execution.

But now:

Every execution is attributable
Every execution is logged
The monitor is supervised
Crashes are auto-recovered

That is operational maturity.

Why This Matters

This project demonstrates competency in:

Containerized AI runtime deployment
Linux namespace verification
OS-level tool execution controls
Audit log design
Sidecar architecture patterns
Billing/quota debugging
Operational supervision

More importantly, it demonstrates understanding of:

Trust boundaries
Execution provenance
Separation of reasoning vs control plane
Detective vs preventive controls
Runtime governance for AI systems

Final Reflection

The most valuable realization from this project was not about OpenClaw itself.

It was architectural:

Autonomous agents are not just LLMs. They are systems with execution surfaces, trust boundaries, and governance requirements.

Once that mental model clicked, the project shifted from “experimenting with an agent” to “engineering a controlled runtime.”

That distinction matters.

Oh and, here is an interesting interview with the creator of OpenClaw, Peter Steinberger:

https://www.youtube.com/watch?v=YFjfBk8HI5o

Governing the Machine