OpenClaw playbook

OpenClaw setup checklist for a stable first week

Most OpenClaw installs “work” on day one, then drift into resets, inconsistent tool usage, and runaway context. This checklist is for the implementation stage: moving from “installed” to “it runs the same way tomorrow.” By the end you should have one workflow, one approval rule, one memory structure, and one recovery loop.

TL;DR (what you are trying to achieve)

Your first week goal is not “build something impressive.” It is: make the system predictable. A predictable OpenClaw is easy to extend. An unpredictable one becomes expensive to debug because you cannot tell whether failures come from the model, tools, memory, or infrastructure.

So you want a baseline that is cheap to run, hard to break, and easy to observe.

Keep the gateway private by default (loopback or Tailnet-only)
Fix dashboard auth first (most “it is broken” reports are auth)
Pick one channel and one workflow with a concrete output contract
Start with strict tool permissions, budgets, and one approval rule
Write memory like a system of record (reviewable files)
Automate only after the manual run is stable for 3 to 5 runs
Browse only when freshness or citations are required

Before you install anything else (4 decisions)

Most onboarding failures are “too many variables.” Decide these four things up front and write them down. This will save you days of “it changed again” debugging.

If you cannot answer these in one minute, you are not ready to add more tools or channels.

Primary channel: where the workflow starts (pick one)
First workflow: smallest production candidate (pick one)
Approval rule: what is never allowed to happen automatically (pick one)
Failure definition: what counts as a failed run (pick one)

Write this down (seriously)

Channel:
Workflow:
Approval rule:
Failure definition:

A 60-minute baseline plan (do this in order)

This is the fastest path to “stable enough to iterate.” Each step reduces uncertainty and gives you a known-good checkpoint.

Treat this like engineering: one layer at a time.

0 to 10 minutes: start the gateway, open the dashboard, send one message in the Control UI
10 to 20 minutes: confirm access is private (loopback/Tailnet) and understand auth
20 to 35 minutes: choose one workflow and write an output contract (what “done” looks like)
35 to 50 minutes: add guardrails (tool allowlist, budgets, one approval rule)
50 to 60 minutes: set up memory baseline (files + memory flush), run once, write a short run recap note

Step 0: keep the gateway private (security posture)

The Control UI is an admin surface: chat, config, approvals. Treat it like an internal dashboard. Do not expose it publicly by accident.

The simplest baseline is: loopback bind, then access it via an SSH tunnel or Tailscale Serve.

Recommended: `gateway.bind: "loopback"` (local-only)
Remote access (recommended): Tailscale Serve (Tailnet-only HTTPS)
Remote access (fallback): SSH tunnel to loopback port
Avoid: binding to `lan` on a VPS without strict auth and allowlists

Tailscale Serve (Tailnet-only) baseline

{
  "gateway": {
    "bind": "loopback",
    "tailscale": { "mode": "serve" }
  }
}

If you bind to LAN: configure auth explicitly (doc baseline)

{
  "gateway": {
    "bind": "lan",
    "auth": {
      "mode": "token",
      "token": "replace-me"
    }
  }
}

Step 1: dashboard auth and the “unauthorized / 1008” loop

A surprising amount of “setup is broken” is really dashboard auth. The dashboard talks to the gateway over a WebSocket and authenticates at the handshake.

If you see “unauthorized” or code `1008`, do not reinstall. Fix auth first.

Re-open with a clean tokenized link: `openclaw dashboard`
Token source: `gateway.auth.token` (or `OPENCLAW_GATEWAY_TOKEN`)
Retrieve a token: `openclaw config get gateway.auth.token`
Generate a token: `openclaw doctor --generate-gateway-token`
If the UI is stuck on a bad token: clear localStorage key `openclaw.control.settings.v1` and reconnect

Common “fix it” commands

openclaw dashboard
openclaw status --all
openclaw config get gateway.auth.token
openclaw doctor --generate-gateway-token

Step 2: pick one workflow (and define “done”)

Most setups fail because they start with an abstract goal like “be my assistant” or “manage my life.” OpenClaw works best when a run ends in a concrete artifact: a draft reply, a 1-page brief, a checklist, or a structured report.

Pick a workflow you can run daily, with a clear success/failure signal.

Rule of thumb

If a workflow can send messages, change production state, or touch billing, require explicit approval before the final action.

Good first workflows: research brief, inbox triage, daily briefing, weekly status report
Define output format up front (sections, length, tone)
Define escalation: what requires your approval
Define stop conditions: what ends the run
Define a failure: what output would make you say “this run failed”

Workflow contract template (copy/paste)

Workflow:
Trigger:
Inputs:
Tools allowed:
Output contract:
Requires approval when:
Stop condition:
Failure definition:

Example: inbox triage (bounded)

Trigger: new messages (batch of 20)
Tools: memory_get, memory_search
Output: Summary + Draft replies (approval) + Next actions
Stop: ask 1 clarifying question if intent is unclear

Step 3: lock down tools with a risk matrix (not vibes)

Tool freedom is the fastest way to create instability. It is also the easiest thing to control. Start strict, then open permissions intentionally when you can explain why a tool is needed.

In production, the most expensive failures come from tools that run too long, call too many endpoints, or keep retrying without a stop condition.

Low risk: search, fetch, read-only tools, summarization
Medium risk: writing files, editing drafts, tagging/labeling
High risk: browser automation, sending messages, deployments, billing actions
Baseline rule: high-risk tools stay off until the workflow is stable for 3 to 5 runs
Set budgets: max tool calls, max pages, max retries

Conceptual allowlist shape

{
  "tools": {
    "allow": ["web_search", "web_fetch", "memory_get", "memory_search"],
    "deny": ["browser", "exec"]
  }
}

Model routing and cost control (don’t debug with bigger models)

Model choice matters, but workflow design matters more. The most common mistake is using an expensive model to compensate for vague scope and runaway tools.

A safe baseline is “brain vs muscles”: use a cheaper reliable model for classification and summaries, and reserve a stronger model for final synthesis or ambiguous decisions.

Cheap model: triage, tagging, structured extraction, summaries
Strong model: final brief, tricky reasoning, conflict resolution
If it fails: reduce scope and add guardrails before changing models
Track which step actually needed the expensive model

Example routing (conceptual)

Step 1: Triage (cheap model)
Step 2: Gather sources (tools)
Step 3: Synthesis (strong model)
Step 4: Final output (strong model if needed)

Step 4: set up memory like files, not “magic recall”

Treat OpenClaw memory as an explicit workspace ledger. If something matters next week, write it down. If it does not, do not store it. This keeps memory small and prevents contradictions.

The system becomes dramatically more reliable when you can review memory, edit it, and spot drift early. For longer sessions, enable memory flush before compaction so durable notes are written before summarization.

Long-term memory: curated preferences and operating rules
Daily memory: append-only log of decisions and “what changed”
Avoid storing raw web dumps or transient chat fragments
If memory becomes noisy, prune it intentionally instead of adding more prompts
If a rule matters, write it once in a stable place (don’t duplicate)

Simple memory layout

workspace/
  MEMORY.md
  memory/
    2026-03-09.md

Example MEMORY.md (keep it small)

# Preferences
- Keep replies concise.
- Never send external messages without approval.

# Workflow rules
- Research briefs must include sources and dates.
- Inbox triage must separate urgent vs deferrable.

# Known constraints
- Do not browse unless freshness is required.

Enable memory flush before auto-compaction (verified keys)

{
  "agents": {
    "defaults": {
      "compaction": {
        "memoryFlush": {
          "enabled": true,
          "softThresholdTokens": 4000,
          "systemPrompt": "Session nearing compaction. Store durable memories now.",
          "prompt": "Write lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
        }
      }
    }
  }
}

Step 5: write one skill for your one workflow

Skills are the cleanest way to make tool usage consistent. Instead of re-explaining “how to use tools” in every prompt, you give the agent an operating procedure it can follow.

The onboarding mistake is installing a dozen skills. Start with one skill that encodes one workflow.

Scope: one workflow, one set of tools, one output contract
Include: “when to stop”, “when to ask”, and “what not to do”
Keep examples small and reusable (templates)
Update skills like code: small diffs, test after each change

Minimal SKILL.md skeleton

# Inbox triage (Skill)

## Goal
Triage new messages into: (1) respond now, (2) draft reply for approval, (3) defer with a next action.

## Tools
- Allowed: web_search, web_fetch, memory_get, memory_search
- Not allowed: exec, browser

## Output contract
- 3 sections: Summary, Draft replies (needs approval), Next actions

## Stop conditions
- If unsure about intent, ask one clarifying question and stop.
- Never send external messages. Draft only.

Step 6: approvals that actually work (template)

“Ask for approval” is not a guardrail unless the approval request is structured. The reviewer should see the exact action, the risk, and the rollback.

Use this template any time the workflow approaches a side-effect.

Approval request template

Proposed action:
Why now:
Risk:
Exact output or message:
Rollback plan:
Alternative options:

Step 7: automation (heartbeat and cron) after the manual run is stable

Automation should be earned. First make the workflow stable manually. Then automate the report-only version. Only then automate small safe actions.

Use heartbeats for lightweight periodic check-ins and cron jobs for explicit schedules.

Disable heartbeats during onboarding if you want zero background activity
When enabling: start with `target: "none"` and a tiny prompt
Keep heartbeat instructions in one workspace file (for example `HEARTBEAT.md`) so it stays bounded and reviewable
For daily jobs: use cron in an isolated session and deliver a short report

Disable heartbeat during onboarding

{
  "agents": {
    "defaults": {
      "heartbeat": { "every": "0m" }
    }
  }
}

Safe first heartbeat (report-only)

{
  "agents": {
    "defaults": {
      "heartbeat": {
        "every": "30m",
        "target": "none",
        "lightContext": true
      }
    }
  }
}

Step 8: browsing without garbage output (bounded, cited, dated)

Browsing is valuable when freshness matters or when you must cite sources. It is also a reliability risk: large pages, redirects, robot blocks, and retries can blow up latency and context.

Your baseline should treat browsing like a budgeted resource.

Use search + fetch before browser automation
Limit pages per run (and stop if the first sources disagree)
Extract only what you need (no raw HTML dumps)
When you cite: include publication dates and link to primary sources

Research output contract (tiny but strict)

## Answer (5-8 bullets)

## Sources
- Source name, date, what it supports

## Uncertainties
- What you could not verify

Step 9: build a lightweight “observe and recover” loop

Your baseline should make it easy to answer: what did it do, why did it do it, and what should change next time? Even without fancy observability, you can enforce a discipline: every run ends with a short summary, and every failure ends with a minimal repro.

If the agent cannot explain its own actions in a short, structured way, you cannot debug it effectively.

End each run with: what changed, what was skipped, what needs approval
When it fails, capture: the trigger, the inputs, and the tool steps
Prefer small, repeatable runs over long “do everything” sessions
Use `/compact` when sessions feel stale or bloated
If a workflow regresses, roll back scope before you change models

Incident note template

Incident:
Trigger:
Expected behavior:
Actual behavior:
Tools used:
What changed before this run:
Recovery action:

Your first-week rollout plan (day-by-day)

This is a boring plan on purpose. Stability comes from repetition.

If you are adding multiple new surfaces at once (channels, browsing, automation), slow down and isolate variables.

Day 1: baseline config + one manual workflow run
Day 2: memory structure + one durable rule in MEMORY.md
Day 3: tool allowlist + budgets; verify it still works 3 times
Day 4: approval gate for a risky step (draft-only → approve-to-send)
Day 5: report-only heartbeat or one cron job (isolated session)
Day 6: prune memory; remove unused rules; tighten stop conditions
Day 7: write the runbook (what to do when it fails)

Common new-user mistakes (and the fix)

The best onboarding is just avoiding a handful of predictable traps. If you fix these early, you will save days of debugging.

Use this list as a weekly check-in until your system feels boring.

Unauthorized / reconnect loop → wrong token → `openclaw dashboard` and set `gateway.auth.token` in the UI settings
Slow/hanging runs → unbounded browsing/retries → cap pages, cap retries, stop after repeated failures
Memory feels wrong → memory is a transcript dump → keep MEMORY.md curated, daily logs separate
Agent keeps doing too much → no stop condition → add a clear “done” state and one “stop and ask” rule
Costs explode after automation → no budgets / too much parallelism → start report-only, add budgets before actions

How Clawdguy helps (when you want to stop babysitting infra)

Once your workflows are defined, infrastructure becomes the bottleneck: provisioning, security hardening, keeping the runtime stable, and giving your team a clean path to production.

Clawdguy gives you a managed OpenClaw runtime with dedicated infrastructure and the control layer needed for production guardrails.

Dedicated infrastructure with root access
Managed provisioning and lifecycle controls
Diagnostics, logs, updates, and reprovisioning
A faster path from “it works” to “it runs every day”