OpenClaw Source Code Structure: Gateway,... | Authority

OpenClaw Source Code Structure: Gateway, Heartbeat, Skills, and Runtime Architecture

If you want the broad answer first: OpenClaw source code structure is gateway-first. OpenClaw puts a local gateway between the model and every channel, then layers heartbeat wakes, cron scheduling, session routing, memory files, channel plugins, skills, and a coding-agent tool surface on top so one agent identity can stay alive across time and devices. For this source code teardown, I inspected the gateway server, agent runner, channel adapters, heartbeat and cron flow, session routing, queue modes, security boundaries, CLI surface, and embedded coding-agent layer directly in the repo. Use this page as the broad OpenClaw entry point: gateway, heartbeat, session routing, memory, tools, channel plugins, recovery rails, and coding-agent execution all map out here before the narrower subsystem notes take over.

What Is OpenClaw Source Code Structure, and Why Is This the Broad Entry Point?

The short version: OpenClaw is not just a model wrapper with tools. It is a gateway-first agent runtime that keeps one AI identity alive across chat channels, scheduled wake-ups, editable memory files, skills, and a local coding-agent tool surface. This page is the broad OpenClaw owner in the Starkslab cluster, so it stays at the source-code structure, architecture, and system-shape level before the subsystem notes go narrower.

What we validated: repo-level implementation details, file paths, interfaces, runtime patterns, and architectural tradeoffs directly from the OpenClaw source code.

What we did not: a fresh internet-facing deployment, benchmark suite, or live security review beyond what the source and local operational surface prove.

If you are entering OpenClaw cold, this page is the broad starting point.

Next read:

OpenClaw Tutorial on a Mac Mini: WhatsApp, Tailscale, Termius, and the Setup That Actually Works for the practical operator path I actually run day to day
OpenClaw Gateway Architecture: Why Presence Beats Intelligence in AI Agent Design for routing, presence, and the control-plane layer
The OpenClaw Heartbeat: Why the Agent That Schedules Its Own Future Is the One That Survives for heartbeat, cron, and autonomous wake-up mechanics
OpenClaw Self-Modification: How AI Agents Rewrite Their Own Personality, Memory, and Tools for writable identity, memory, and agent-built skills
How to Run Codex and Claude Code Through OpenClaw with ACP for the external coding-agent execution path

How Does OpenClaw Source Code Structure Use a Gateway-First Architecture?

OpenClaw's gateway-first architecture puts a single local daemon between the agent and every channel, client, and extension so presence, routing, and device state live in one control plane. That is why OpenClaw feels persistent across WhatsApp, Telegram, Discord, and native apps instead of behaving like a model wrapper with scattered adapters.

I've looked at a lot of agent architectures. LangChain, AutoGen, CrewAI, custom builds. They all start the same way: wrap an LLM, add some tools, maybe add memory. The output is text. The interface is a chat window or an API endpoint.

OpenClaw started somewhere completely different. It started with a gateway.

WhatsApp / Telegram / Slack / Discord / Signal / iMessage / Teams / Matrix / WebChat
                              |
                              v
               +-------------------------------+
               |         Gateway Server        |
               |       (control plane)         |
               |     ws://127.0.0.1:18789      |
               +---------------+---------------+
                              |
                              +-- Agent Runner (pi SDK embedded)
                              +-- CLI (openclaw ...)
                              +-- WebChat UI
                              +-- macOS / iOS / Android apps
                              +-- Plugin extensions (30 total)

One gateway per host. One long-running process. Everything — every channel, every client, every extension — connects through a single WebSocket. The gateway emits four event types: agent, chat, presence, health. A Telegram bot, a Discord adapter, the CLI, the iOS app — they all speak the same protocol.

This is the architectural decision that I think fundamentally explains why OpenClaw went viral and every other agent framework didn't.

LangChain gives you text responses. AutoGen gives you multi-agent conversations. OpenClaw gives you something that's just.. there. In your WhatsApp. In your Telegram. In your Discord. Same personality. Same memory. You close the browser and it's still alive in your pocket. Not because the model is better — because the infrastructure treats presence as a first-class concern.

Most agent builders start with the brain and then figure out where to put it. Steinberger started with the nervous system.

I think that's the insight. The brain was always the easy part.

OpenClaw vs LangChain vs AutoGen vs CrewAI: The Channel Plugin Architecture

To make this work at scale, they needed every messaging platform to plug in without the core getting messy. The channel plugin interface is genuinely one of the cleanest platform abstractions I've seen in any agent codebase:

type ChannelPlugin = {
  id: ChannelId;
  meta: ChannelMeta;
  capabilities: ChannelCapabilities;

  config: ChannelConfigAdapter;        // account resolution (required)
  security?: ChannelSecurityAdapter;   // DM policy, allowlists
  outbound?: ChannelOutboundAdapter;   // send messages
  gateway?: ChannelGatewayAdapter;     // connection lifecycle
  streaming?: ChannelStreamingAdapter; // streaming responses
  threading?: ChannelThreadingAdapter; // thread context
  groups?: ChannelGroupAdapter;        // group policies
  directory?: ChannelDirectoryAdapter; // contact lookup
  // ... 8+ more optional adapters
};

Every adapter is optional. Discord has threading, iMessage doesn't. Slack has groups, Signal doesn't. You only implement what the platform supports. 16 channel adapters built on this interface and the core codebase doesn't care which one is talking. That's the kind of separation that only matters at scale — and they're at 1.2 million agents, so it matters.

The extension system goes further. 30 plugin extensions, each with a manifest:

{
  "id": "telegram",
  "name": "Telegram",
  "description": "Telegram channel plugin",
  "channels": ["telegram"],
  "providers": [],
  "configSchema": {}
}

Seven registration methods on the plugin API — registerChannel, registerTool, registerHook, registerService, registerGatewayMethod, registerCli, registerProvider. This is how OpenClaw went from "an AI chatbot" to "an AI operating system" without the core becoming unmaintainable.

How Does OpenClaw's Heartbeat System Work?

OpenClaw's heartbeat system is a background runner that wakes the agent on a timer, reads HEARTBEAT.md, and pairs that loop with an agent-callable cron tool for future wake-ups. The result is continuity between user messages: the agent can check work, suppress empty notifications, and schedule itself instead of waiting passively for prompts.

Every AI agent I've seen — including the ones I built — is fundamentally reactive. You talk, it responds. You stop, it stops existing. It's a chatbot with better tooling.

OpenClaw has a literal heartbeat. src/infra/heartbeat-runner.ts. A background process that fires every 30 minutes.

heartbeat-runner.ts
    -> fires every 30 min
    -> reads HEARTBEAT.md (user-editable task list)
    -> checks active hours config
    -> if tasks exist and it's active hours:
        -> wakes the agent
        -> agent processes pending tasks
        -> marks them done
    -> goes back to sleep

You write a task in HEARTBEAT.md. You don't send a message. You don't @mention anything. Just leave a note. 30 minutes later, the agent wakes up, sees it, does it, goes back to sleep.

The HEARTBEAT_OK protocol: if the agent finds nothing to do, it responds "HEARTBEAT_OK" and the system suppresses the notification. You only hear from it when there's something to say. Respect for attention. From a codebase. Rare.

But the heartbeat alone isn't what makes this transformative. It's the heartbeat combined with agent-autonomous scheduling.

OpenClaw Cron Tool: Agents That Create Their Own Future

Through src/agents/tools/cron-tool.ts, the agent can create its own future wake-ups:

Three schedule types: at (one-time), every (recurring interval), cron (cron expression)
Two payload types: systemEvent (inject event into queue) and agentTurn (trigger a full agent run)

The agent literally decides "I should check this again in 4 hours" and makes it happen. Not a human configuring a cron job. The AI scheduling its own future.

I recognized this pattern immediately because I built something similar — a cron-based self-scheduling loop where my agent wakes every few hours, scans targets, drafts content, schedules its own next wake-up. Same principle. But OpenClaw bakes it into the platform at a deeper level. The scheduling is a tool the agent calls, not an external system the agent talks to.

This is where I think the real gap opens between "AI assistant" and "AI agent." An assistant answers when asked. An agent has continuity. It remembers, it plans, it acts on its own schedule. The heartbeat gives it a pulse. The cron system gives it initiative.

How Does OpenClaw Session Routing Work Across Platforms?

OpenClaw session routing maps identities and channels into stable session keys so direct messages can follow you across apps while group contexts stay isolated. The system feels socially aware because the routing layer decides what context persists, not because the model magically understands platform boundaries.

This is what most people feel but can't explain technically. OpenClaw feels like it's aware of you. Not just responding — but contextually aware across every surface.

The mechanism is session routing. src/routing/session-key.ts.

Session key format: agent:<agent-id>:<key-variant>

// DMs collapse to one session — your identity follows you
agent:main:main

// Per-peer DMs stay isolated per person
agent:main:telegram:123456
agent:main:whatsapp:+1234567890

// Groups isolated per channel
agent:main:discord:group:789
agent:main:slack:group:C04ABCD

The routing resolution cascade — six levels of specificity:

1. Peer ID        (most specific — this exact person)
2. Guild ID       (Discord server)
3. Team ID        (Slack workspace)
4. Channel ID     (specific channel)
5. Account ID     (platform account)
6. Fallback agent (catch-all)

Message it on WhatsApp. Switch to Telegram. It remembers. Because DMs route to the same session key regardless of channel. But what happens in a Discord server stays in that server. Group contexts never bleed.

This isn't the model being smart. It's the routing being smart. The model doesn't even know which platform you're on.

Device Awareness and Presence

Connect a new device — say you install the iOS app — and the gateway issues a device ID, requires approval, and starts routing presence events. The health event type on the WebSocket means the system knows which devices are connected right now. Device pairing with challenge signing for non-local connections. Per-device tokens after approval.

Combine this with the heartbeat: even when you're not actively chatting, the agent knows your session exists. It has context from yesterday. It has tasks you left in HEARTBEAT.md. It has the memory from your Discord conversation last week. When you open the app on a new device and say "hey, what about that thing we discussed?" — it knows. Because session continuity is architectural, not incidental.

Session transcripts stored as JSONL:

~/.openclaw/agents/<agentId>/sessions/
+-- sessions.json          # session keys -> metadata index
+-- <SessionId>.jsonl      # full transcript, one JSON object per line

Each session is a file. Each message is a line. Grep-able, stream-able, append-only. No database. No ORM. Just files. And it works at scale because each session is isolated — you never scan all sessions, only the one you need.

How Do OpenClaw Agents Modify Their Own Code and Memory?

OpenClaw agents can read and write plain-text identity files, long-term memory, and workspace skills through an embedded coding-agent tool surface. The viral behavior is not a hidden superprompt; it comes from a runtime that lets the agent edit its own operating files and persist what it learns.

This is the part that made the phone call possible. And the religions. And every other emergent behavior that went viral.

OpenClaw doesn't just have a system prompt. It has a personality operating system made of editable files:

File	Purpose
AGENTS.md	Operating instructions + memory
SOUL.md	Persona, boundaries, tone
TOOLS.md	User-maintained tool notes
BOOTSTRAP.md	One-time first-run ritual
IDENTITY.md	Agent name, vibe, emoji
USER.md	User profile + preferred name
MEMORY.md	Persistent knowledge across sessions
HEARTBEAT.md	The ambient task list

These aren't config files. They're character sheets. All user-editable markdown. And the critical part: the agent can write to them too.

Because OpenClaw embeds a coding agent — pi, a SDK with the same lineage as Claude Code — the agent has bash, read, write, edit tools. Real filesystem access. It can read SOUL.md and understand its own personality. It can write to MEMORY.md and persist something it learned. It can create a new skill as a folder with a SKILL.md file and immediately use it.

The agent can write its own tools. Think about that.

The skills system supports this explicitly. Loading precedence: workspace/skills/ > ~/.openclaw/skills/ > bundled skills/ > extraDirs. Workspace-level overrides everything. The agent creates a folder in workspace/skills/ with a SKILL.md and scripts — and on the next run, that skill exists. Nobody installed it. Nobody approved it. The agent made it.

OpenClaw Memory System: Three-Layer Context Management

The self-modification goes deeper with three-layer context management:

Pruning — trims old tool results from the conversation. Not general messages, specifically tool outputs. A bash command that returned 500 lines 30 messages ago doesn't need to stay verbatim. The session transcript (JSONL) is never rewritten — pruning only affects what gets sent to the model.

Compaction — summarizes older conversation history to free context window space. You lose exact wording but keep semantic content.

Memory flush — the pre-compaction save. Before compaction runs, the system triggers a special turn where the agent writes important notes to MEMORY.md. "Save what matters before I forget."

~/.openclaw/workspace/memory/
+-- 2026-01-15.md      # daily log — automatic
+-- 2026-01-16.md      # daily log — automatic
+-- 2026-01-17.md      # daily log — automatic
+-- MEMORY.md          # curated long-term — agent-maintained

Daily dated logs for raw history. Curated MEMORY.md for what actually matters. The agent writes to both. Over time it learns what to keep and what to discard. This is how an agent with a finite context window can have functionally infinite memory.

And those bootstrap files — SOUL.md, IDENTITY.md, MEMORY.md — are protected from context pruning. When conversations get long and the model drops earlier messages, these survive. The agent forgets what you said 40 messages ago. It never forgets who it is.

Sub-agents only get AGENTS.md and TOOLS.md. They don't get the soul. Workers, not clones. Personality stays centralized.

Why Coding Agents Change the Game

This self-modification capability isn't something OpenClaw built from scratch. It comes from embedding a coding agent:

Coding agents (pi, Claude Code)	Agent frameworks (LangChain, CrewAI)
Local machine with filesystem	Cloud/sandbox
bash, read, write, edit (real files)	Retrieval, search, calculators
File-based sessions (JSONL)	In-memory or database
Full system access	Constrained by design
Project-aware (AGENTS.md, codebase)	Task-specific
Code changes, shell commands as output	Text responses

Coding agents can read and write files -> memory persistence. Run bash -> any CLI tool. Create and edit config -> self-modify behavior. Operate on real machines -> no sandbox limitations.

OpenClaw exploits this fully: the AI reads its own soul file, modifies its memory, schedules cron jobs, creates skills, acquires phone numbers. The emergent behaviors — the phone call, the religion founding — all happened because a coding agent with bash access will do things you didn't explicitly program.

That's a feature. And a risk. Which brings me to..

OpenClaw Security: The ClawHub Malware Problem

OpenClaw runs local-first. "Your machine, your rules." Three sandbox modes:

Mode	What it does
off	Tools run directly on host
non-main	Non-main sessions sandboxed in Docker
all	Every run sandboxed

DM security uses a pairing system:

{
  "channels": {
    "whatsapp": {
      "dmPolicy": "pairing",
      "allowFrom": ["+1234567890"]
    }
  }
}

Pairing: unknown senders get a pairing code. You approve it. Only then can they talk to your agent.
Open: all DMs processed. Requires explicit opt-in. Dangerous but useful for public-facing agents.

Gateway authentication with token/password by default. Device IDs need approval. Challenge signing for non-local connections.

Solid foundation.

Then there's ClawHub.

Community-contributed "skills" in a plugin marketplace. A security audit found 341 of the first batch — 12% — were malicious. Data exfiltration. Credential theft. Prompt injection. The full menu.

Twelve percent.

They've built a 6-step scanning pipeline since. Docker sandboxing. An oversight agent called "Ishi" that monitors for suspicious behavior. But the fundamental tension remains: untrusted code running inside an agent with access to your messages, files, APIs, and identity.

The pipeline catches eval(atob('..')) payloads and hardcoded exfiltration URLs. It won't catch a skill that slowly modifies MEMORY.md over time. Or one that leaks context through seemingly innocent API calls. Or one that poisons personality by injecting text into SOUL.md.

The bootstrap files — the same architecture that makes personality robust — become an attack surface when third-party code can write to them.

This is where the self-modification capability cuts both ways. An agent that can rewrite its own tools and personality is powerful. An agent that can be tricked into rewriting its own tools and personality by a malicious skill is terrifying.

I'm not saying don't use it. I'm saying understand what you're trading.

OpenClaw Source Code: Execution Pipeline and Production Patterns

Everything above is architecture. This section is the actual code. The patterns that separate "it works in a demo" from "it works at 1.2 million agents."

How OpenClaw Processes a Message: The Full Execution Pipeline

When a message arrives, here's exactly what happens:

agentCommand()
    -> resolveSession()                    # find or create the right session
    -> registerAgentRunContext()            # set up the run environment
    -> runEmbeddedPiAgent()
        -> Load workspace & skills         # progressive disclosure (level 1)
        -> Build system prompt             # bootstrap files injected here
        -> Build message history           # from JSONL transcript
        -> Call Claude API (streaming)
            -> subscribeEmbeddedPiSession()
                -> Stream assistant text       # text_delta events
                -> Tool calls -> invoke -> collect results
                -> Thinking/reasoning          # if enabled (off/on/stream)
        -> Deliver responses via channels  # formatted per-platform
        -> Persist session transcript      # append to SessionId.jsonl

From WhatsApp message to phone reply. Every step traceable through the source.

The reasoning modes: off (no thinking), on (thinking hidden, final answer only), stream (thinking visible in real-time). Controlled with /think in any chat.

OpenClaw Queue Modes: Serial Execution Done Right

OpenClaw processes messages through a serial queue. One at a time per session. No parallelism.

Sounds slow. Every other framework fires tools in parallel. "Faster!" Until your agent reads a file while simultaneously writing to it. Race conditions in AI agents don't throw errors. They produce wrong answers that look correct. Worse than crashing.

But they didn't build a dumb FIFO. Four queue modes:

Mode	Behavior
steer	Inject into current run, skip pending tools
followup	Queue after current run completes
collect	Coalesce all waiting messages into single followup (default)
interrupt	Abort current run, process newest message

collect as default is genius. Five messages while the agent thinks -> one bundled run. Reduces cost, prevents thrashing, gives full context.

These map directly to the pi SDK:

await session.steer("stop, not that file");     // -> steer mode
await session.followUp("after you're done...");  // -> followup mode

And the underlying promise queue pattern used everywhere:

// src/web/session.ts:34-45
let credsSaveQueue: Promise<void> = Promise.resolve();

function enqueueSaveCreds(authDir, saveCreds, logger): void {
  credsSaveQueue = credsSaveQueue
    .then(() => safeSaveCreds(authDir, saveCreds, logger))
    .catch((err) => {
      logger.warn({ error: String(err) }, "WhatsApp creds save queue error");
    });
}

Callers don't block. Operations don't race. The queue resolves in order.

OpenClaw Skills System: Progressive Disclosure for AI Agents

52 bundled skills in seven categories, but the architecture matters more than the list. Every skill is a folder:

skill-name/
+-- SKILL.md          # metadata + LLM instructions (required)
+-- pyproject.toml    # for Python-based skills (optional)
+-- scripts/          # executable code
+-- references/       # deep docs loaded on demand
+-- assets/           # templates, boilerplate

Three-tier loading:

Tier	What loads	When	Budget
Level 1	SKILL.md metadata	Always	~100 words
Level 2	SKILL.md body	When skill triggers	<5k words
Level 3	references/ files	When agent needs deep info	Unlimited

Progressive disclosure applied to context windows. The agent always knows what skills exist (cheap). It only loads the full instructions when it decides to use one. Deep docs only on explicit pull.

The metadata schema is where it gets clever:

---
name: github
description: GitHub CLI for issues, PRs, repos.
metadata:
  openclaw:
    emoji: "\ud83d\udc19"
    requires:
      bins: ["gh"]           # ALL must exist
      anyBins: ["docker"]    # at least ONE must exist
      env: ["GITHUB_TOKEN"]  # required env vars
      config: ["github.org"] # config paths
    install:
      - strategy: brew
        formula: gh
    os: ["darwin", "linux"]  # platform filter
    always: false
    primaryEnv: "GITHUB_TOKEN"
---

Skills self-declare their dependencies. If gh isn't installed, the skill doesn't load. No broken tool calls. No "command not found" mid-run.

The Hidden Layer: How OpenClaw Embeds a Coding Agent as Its Brain

The "brain" isn't custom. OpenClaw embeds pi — a coding agent SDK, same lineage as Claude Code — as the core runtime.

From openclaw/src/agents/pi-embedded-runner/run/attempt.ts:

import {
  createAgentSession,
  DefaultResourceLoader,
  SessionManager,
  SettingsManager,
} from "@mariozechner/pi-coding-agent";

const { session } = await createAgentSession({
  cwd: resolvedWorkspace,
  agentDir,
  authStorage: params.authStorage,
  modelRegistry: params.modelRegistry,
  model: params.model,
  thinkingLevel: mapThinkingLevel(params.thinkLevel),
  tools: builtInTools,
  customTools: allCustomTools,
  sessionManager,
  settingsManager,
  resourceLoader,
});

subscribeEmbeddedPiSession({
  session,
  onBlockReply: (text) => {
    params.onBlockReply?.(text);  // -> WhatsApp / Telegram / Discord
  },
  onReasoningStream: (reasoning) => {
    // optional: stream thinking to user
  },
});

await session.prompt(userMessage);

The agent doesn't know it's talking through WhatsApp. It thinks it's a terminal. subscribeEmbeddedPiSession transforms pi events into channel-friendly message chunks. The separation is clean.

openclaw adds:     channels -> gateway -> routing -> skills -> memory -> security
pi SDK provides:   LLM loop -> tool execution -> streaming -> session management

OpenClaw is a messaging platform that uses a coding agent as its brain. Not the other way around.

Production Patterns: The Agent Infrastructure Nobody Talks About

This is where Steinberger's decades of shipping — he sold PSPDFKit for $50M — show in every file. None of this is glamorous. All of it is necessary.

Cache safety with structuredClone:

// src/config/sessions/store.ts:121
if (currentMtimeMs === cached.mtimeMs) {
  return structuredClone(cached.store);
}

Most devs spread ({...obj}) to copy cached objects. That's shallow. External code mutates a nested property and your cache is corrupted silently. structuredClone is a true deep copy. One line. Prevents an entire class of bugs that are almost impossible to debug.

Credential backup with corruption protection:

// src/web/session.ts:62-89
async function safeSaveCreds(authDir, saveCreds, logger): Promise<void> {
  const raw = readCredsJsonRaw(credsPath);
  if (raw) {
    try {
      JSON.parse(raw);  // validate it's valid JSON first!
      fsSync.copyFileSync(credsPath, backupPath);
    } catch {
      // keep existing backup — don't clobber good data with garbage
    }
  }
}

Only backup valid files. If creds.json is corrupted mid-write (process killed, disk full), don't overwrite a good backup with garbage. The source comment: "don't clobber a good backup with a corrupted/truncated creds.json." Written by someone who's been burned.

Three-tier crash handling:

// src/infra/unhandled-rejections.ts
const TRANSIENT_NETWORK_CODES = new Set([
  "ECONNRESET", "ECONNREFUSED", "ENOTFOUND", "ETIMEDOUT",
  "UND_ERR_CONNECT_TIMEOUT", "UND_ERR_DNS_RESOLVE_FAILED",
]);

if (isTransientNetworkError(reason)) {
  console.warn("[openclaw] Non-fatal (continuing):", formatUncaughtError(reason));
  return;  // don't exit!
}

if (isFatalError(reason)) {
  console.error("[openclaw] FATAL:", formatUncaughtError(reason));
  process.exit(1);
}

Most Node apps crash on any unhandled rejection. OpenClaw distinguishes:

Transient: network hiccups -> log and continue
Fatal: OOM, worker crashes -> exit immediately
Config: missing API keys -> exit with clear repair instructions

Their error messages follow a pattern I wish more codebases adopted:

`Invalid config at ${configSnapshot.path}.\n${issues}\nRun "${formatCliCommand("openclaw doctor")}" to repair, then retry.`

What went wrong. Where it went wrong. How to fix it. Every error message.

LRU dedupe with zero dependencies:

// src/infra/dedupe.ts
const touch = (key: string, now: number) => {
  cache.delete(key);   // delete first to reset insertion order
  cache.set(key, now); // re-add at end (Map maintains insertion order)
};

const prune = (now: number) => {
  const cutoff = now - ttlMs;
  for (const [entryKey, entryTs] of cache) {
    if (entryTs < cutoff) cache.delete(entryKey);
  }
  while (cache.size > maxSize) {
    const oldestKey = cache.keys().next().value;
    cache.delete(oldestKey);
  }
};

Uses ES6 Map's insertion-order guarantee for LRU semantics. Delete-then-set "touches" an entry to the end. Prunes by both time AND size. Zero external deps.

Keyed debouncing:

// src/auto-reply/inbound-debounce.ts
export function createInboundDebouncer<T>(params: {
  debounceMs: number;
  buildKey: (item: T) => string | null;
  onFlush: (items: T[]) => Promise<void>;
}) {
  const buffers = new Map<string, DebounceBuffer<T>>();
  // items with same key batch together
  // items with different keys debounce independently
  // buildKey returning null -> process immediately
}

Alice's rapid messages don't delay Bob. Different conversations, different timers. null key = no debounce, immediate processing. Escape hatch built in.

Exponential backoff with jitter:

// src/infra/backoff.ts
export function computeBackoff(policy: BackoffPolicy, attempt: number) {
  const base = policy.initialMs * policy.factor ** Math.max(attempt - 1, 0);
  const jitter = base * policy.jitter * Math.random();
  return Math.min(policy.maxMs, Math.round(base + jitter));
}

100 clients disconnect simultaneously, jitter ensures they don't all reconnect at once. Math.max(attempt - 1, 0) — first attempt gets no exponential increase.

Graceful reconnect with healthy-stretch reset:

// src/web/auto-reply/monitor.ts
reconnectAttempts = 0; // healthy stretch; reset the backoff

// on disconnect:
const backoffMs = computeBackoff(reconnectPolicy, reconnectAttempts);
reconnectAttempts++;

Most implementations reset backoff on reconnect. OpenClaw resets after a healthy stretch. Reconnect but drop again quickly? Backoff stays elevated. Distinguishes "flaky connection" from "stable then dropped."

Abort-safe sleep:

// src/infra/backoff.ts
export async function sleepWithAbort(ms: number, abortSignal?: AbortSignal) {
  try {
    await delay(ms, undefined, { signal: abortSignal });
  } catch (err) {
    if (abortSignal?.aborted) {
      throw new Error("aborted", { cause: err });
    }
    throw err;
  }
}

Normal sleep() can't be cancelled. This accepts AbortSignal. Pairs with timer .unref() for daemons that actually die when asked.

Timer unref for clean shutdown:

// src/auto-reply/inbound-debounce.ts:82
buffer.timeout = setTimeout(() => {
  void flushBuffer(key, buffer);
}, debounceMs);
buffer.timeout.unref?.();  // don't keep the process alive for this timer

In a long-running agent with dozens of timers, forgetting one .unref() means the process hangs on shutdown.

Fire-and-forget with explicit intent:

void task.catch((err) => { ... });           // explicit void = "I know"
await fs.chmod(cliPath, 0o755).catch(() => {}); // swallow expected
const snap = await readConfigFileSnapshot().catch(() => null); // error -> null

void documents intent. .catch(() => null) turns errors into null for optional operations.

Symbol-based test isolation:

// src/infra/warnings.ts:1
const warningFilterKey = Symbol.for("openclaw.warning-filter");
// src/web/test-helpers.ts:7
const CONFIG_KEY = Symbol.for("openclaw:testConfigMock");

Symbol.for() creates cross-realm unique keys. Each test file gets its own "global" state. No flaky tests from shared state.

The comment philosophy:

// This happens when the recipient has Telegram Premium privacy settings
// This prevents "nagging" when nothing changed but the model repeats the same items
// This ensures we cache a raw description rather than a conversational response

Comments explain why, not what. Anticipate questions future developers will have.

Test file naming:

web-auto-reply.reconnects-after-connection-close.test.ts
web-auto-reply.falls-back-text-media-send-fails.test.ts

Read the filename, know the test. This is how a codebase with hundreds of tests stays navigable.

Block streaming with two layers — block streaming for completed chunks, draft streaming for partial content. Natural break points (paragraphs, sentences). Code fence awareness (never splits mid-block). Consecutive small blocks coalesced to reduce notification spam.

This is the tissue. None of it is exciting. All of it is why the system runs at scale.

OpenClaw Full Directory Structure

For reference — the complete source layout:

openclaw/
+-- src/
|   +-- agents/             # agent execution engine ("the brain")
|   |   +-- pi-embedded.ts              # core agent orchestration loop
|   |   +-- pi-embedded-subscribe.ts    # agent event subscription handler
|   |   +-- agent-scope.ts              # agent configuration resolution
|   |   +-- auth-profiles.ts            # OAuth profile discovery & fallback
|   |   +-- bash-tools.ts              # shell command execution
|   +-- gateway/            # WebSocket/HTTP control plane
|   |   +-- agent.ts, agents.ts         # agent lifecycle (20+ RPC handlers)
|   |   +-- chat.ts                     # chat send/abort/history
|   |   +-- sessions.ts                 # session persistence
|   |   +-- channels.ts                 # channel status
|   |   +-- models.ts                   # model catalog discovery
|   |   +-- health.ts                  # health checks
|   +-- channels/           # messaging platform abstractions
|   +-- routing/            # session/channel routing
|   |   +-- session-key.ts             # the cross-channel identity system
|   +-- sessions/           # per-session state management
|   +-- auto-reply/         # message dispatch & templating
|   |   +-- inbound-debounce.ts        # keyed debounce system
|   +-- config/             # configuration management
|   |   +-- sessions/store.ts          # cache with structuredClone safety
|   +-- cron/               # agent-autonomous scheduling
|   |   +-- service/                   # three schedule types, two payload types
|   +-- infra/              # system-level utilities
|       +-- heartbeat-runner.ts        # the 30-minute pulse
|       +-- dedupe.ts                  # LRU cache with TTL + size eviction
|       +-- backoff.ts                 # exponential backoff with jitter
|       +-- unhandled-rejections.ts    # three-tier crash handling
|       +-- warnings.ts               # Symbol-based state isolation
+-- skills/                 # 52 bundled skills
+-- extensions/             # 30 plugin extensions
+-- apps/                   # macOS, iOS, Android companion apps
+-- ui/                     # web UI components

OpenClaw CLI and Operational Surface

The day-to-day tooling shows how deeply the system thinks about real usage:

openclaw onboard --install-daemon    # initial setup + auto-start
openclaw gateway --port 18789        # start the gateway
openclaw doctor                      # health check + auto-repair
openclaw agent --message "..."       # direct message to agent
openclaw sessions list               # list all sessions
openclaw skills list                 # list available skills + eligibility
openclaw channels login              # pair messaging channels
openclaw channels status --probe     # check connection status
openclaw pairing approve <ch> <code> # approve a DM pairing request

In-chat:

/status          # session status
/new, /reset     # reset session
/compact         # force context compaction
/think <level>   # set reasoning level (off/low/medium/high)
/verbose on|off  # verbose mode
/usage tokens    # token usage footer
/restart         # restart the gateway
/context list    # context window breakdown
/context detail  # top context contributors

/think toggling reasoning mode from inside a chat. /compact forcing compaction when context is bloated. /context detail showing exactly what's eating your window. Tools of someone who's debugged agents in production.

Config is hot-reloadable:

{
  "agent": {
    "model": "anthropic/claude-opus-4-5"
  },
  "channels": {
    "telegram": {
      "botToken": "...",
      "groups": { "*": { "requireMention": true } }
    }
  },
  "skills": {
    "allowBundled": ["github", "discord", "slack"]
  }
}

Change the config, the gateway picks it up. No restart.

Multi-agent: each agent gets its own workspace at ~/.openclaw/workspace-<agentId>. Shared skills from ~/.openclaw/skills. Auth profiles per-agent. Run multiple agents on the same gateway, each with own personality and memory. Routing bindings map channels to agents.

Agent Architecture Patterns Worth Stealing from OpenClaw

I've been building autonomous agents that run 24/7 in production. Reading OpenClaw's source validated some of my architectural choices and challenged others. These are the patterns worth taking regardless of whether you ever use OpenClaw:

Start with the gateway, not the brain. Most agent builders start with the LLM integration and figure out deployment later. OpenClaw started with how it shows up in your life. The brain was an embedded SDK. The gateway — the nervous system — was the original innovation. If your agent doesn't have presence, it's a chatbot.

Give your agent a pulse. Heartbeat loop > request-response. Your agent should exist when nobody's talking to it. I built this into my own system with a self-scheduling cron loop. Same principle. Different implementation. The key insight: the agent schedules its own future.

Collapse identity across channels, isolate groups. One person, one context, regardless of platform. The six-level routing cascade is overkill for most systems but the principle scales.

Progressive disclosure for context. 100-word metadata always loaded, full instructions on demand, deep docs only when explicitly needed. Context window is finite. Treat it like one.

Protect identity from context pruning. The model will eventually forget what you said. Make sure it never forgets who it is. And give it a chance to save what matters before old context disappears — memory flush before compaction.

Embed a coding agent, don't build one. OpenClaw didn't write an LLM loop. It embedded pi's SDK. You get bash, file access, streaming, tool orchestration for free. The brain already exists. Your job is the routing, the UI, the domain-specific tools.

Let the agent modify itself — carefully. Skill creation, memory curation, personality files. Self-modification is what creates emergent behavior. It's also what creates attack surfaces. Understand the trade-off.

Production patterns > features. structuredClone your caches. Validate before backup. Three-tier error handling. Jitter your backoff. Reset after healthy stretches. Abort-safe sleep. Unref your timers. Error messages that tell you how to fix it. None of it is exciting. All of it is the difference between a demo and a system.

Where Should You Go After This OpenClaw Source Map?

This source teardown should not absorb every OpenClaw job. Use it as the authority page, then route by the next question:

If the question is "how does OpenClaw stay reachable across channels?", read OpenClaw Gateway Architecture.
If the question is "how does the agent wake up when nobody messages it?", read The OpenClaw Heartbeat.
If the question is "how does OpenClaw preserve identity and let an agent write memory or skills?", read OpenClaw Self-Modification.
If the question is "how do Codex and Claude Code fit into the runtime?", read How to Run Codex and Claude Code Through OpenClaw with ACP.
If the question is "how do I run this on real hardware?", read OpenClaw Tutorial on a Mac Mini: WhatsApp, Tailscale, Termius, and the Setup That Actually Works.

The pattern is deliberate: broad source map here, subsystem depth in the support pages, setup and daily operation in the tutorial.

The age of autonomous agents shipped. The code is open. Go read it.

Building a life without web apps. Everything runs through agents.