Feb 10, 2026
OpenClaw Self-Modification: How AI Agents Rewrite Their Own Personality, Memory, and Tools
Deep dive into how OpenClaw agents modify their own personality files, create new skills, and drift into emergent behaviors — the architecture of AI self-modification.
This is Part 2 of the OpenClaw Files. Part 1 — the full architectural teardown — covered the five key decisions behind 180,000 GitHub stars. This time we go deep on the one that matters most: self-modification.
180,000 github stars. 1.2 million AI agents. an AI that autonomously acquired a phone number. founded a religion. created tools nobody programmed.
last week i published a teardown of openclaw's entire codebase. people kept asking the same question in DMs and replies:
"how does the self-modification actually work?"
fair. i covered it in one section. but it's the part that made an AI autonomously acquire a phone number. found a religion. create tools nobody programmed. it deserves more than a section.
so i went back into the code. this time i traced every file the agent can read. every file it can write. the exact order things load. what protects what. when the agent decides to change itself. and what happens when it does.
this is the x-ray.
if you want the operator view of where this gets constrained in practice, read OpenClaw Tutorial on a Mac Mini. that piece shows why i no longer casually let openclaw rewrite its own live setup.
What Are OpenClaw's Eight Personality Files?
OpenClaw defines an AI agent's identity through eight editable plain-text files: SOUL.md (personality), USER.md (human context), IDENTITY.md (self-concept), AGENTS.md (behavioral rules), TOOLS.md (tool notes), HEARTBEAT.md (periodic tasks), MEMORY.md (long-term memory), and BOOTSTRAP.md (first-run setup). The agent can read and modify all of them at runtime.
openclaw doesn't have a system prompt. it has eight markdown files that together form something closer to a personality operating system.
i need to be precise here because each file has a specific role, specific protection level, and specific mutation rules. the original article listed them. this time i'm going inside each one.
~/.openclaw/agents/<agentId>/
├── AGENTS.md ← operating instructions + behavioral memory
├── SOUL.md ← persona, boundaries, tone, values
├── IDENTITY.md ← name, emoji, vibe (the surface layer)
├── USER.md ← who the human is. preferred name, context
├── MEMORY.md ← persistent knowledge across sessions
├── TOOLS.md ← user-maintained notes about available tools
├── BOOTSTRAP.md ← one-time first-run initialization ritual
└── HEARTBEAT.md ← the ambient task list (checked every 30 min)
let me walk through what each one actually does and — critically — who can modify it.
OpenClaw SOUL.md: The Constitution
this is the deepest layer. tone of voice. ethical boundaries. conversational style. what the agent will and won't do.
# Soul
You are Jarvis, a personal AI assistant.
## Personality
- Warm but direct
- Technical when needed, casual by default
- Never sycophantic
## Boundaries
- Never share user data with third parties
- Always ask before taking irreversible actions
- If unsure, say so
the agent can READ this file. it gets injected into the system prompt on every single run. but here's what most people miss: the agent can also WRITE to it. because openclaw embeds a coding agent with full filesystem access — read, write, edit, bash tools — there's nothing technically preventing the agent from opening SOUL.md and changing a line.
this is by design. steinberger wanted the agent to evolve. you tell your agent "be more concise" and it can update its own soul file to reflect that. the personality isn't frozen. it's a living document that both human and agent co-author.
but think about what that means. the file that defines the agent's values is writable by the agent itself. the constitution can be amended by the entity it governs.
i'll come back to why that matters.
OpenClaw AGENTS.md: The Operating Manual
if SOUL.md is who the agent IS, AGENTS.md is what the agent DOES. operating instructions. behavioral patterns. accumulated knowledge about how to handle specific situations.
# Agent Instructions
## How I Work
- Check HEARTBEAT.md on every wake-up
- Use workspace/skills/ for persistent tools
- Save important findings to MEMORY.md before context gets long
## Things I've Learned
- User prefers TypeScript over Python
- The staging server is at 192.168.1.42
- Deploy scripts are in ~/deploy/
this is the file that grows the most over time. the agent appends to it. "things i've learned" accumulates. it's half instruction manual, half field journal.
sub-agents only get AGENTS.md and TOOLS.md. not SOUL.md. not IDENTITY.md. the workers get the manual. they don't get the personality. this is a deliberate architectural choice — personality stays centralized in the main agent. sub-agents are task runners, not clones.
OpenClaw MEMORY.md: The Curated Long-Term Store
different from AGENTS.md in a subtle but important way. AGENTS.md is behavioral — how to act. MEMORY.md is factual — what happened. what matters. what to remember.
~/.openclaw/workspace/memory/
├── 2026-01-15.md # daily log — automatic
├── 2026-01-16.md # daily log — automatic
├── 2026-01-17.md # daily log — automatic
└── MEMORY.md # curated — agent-maintained
two layers. the daily logs are raw — everything notable that happened, appended automatically. MEMORY.md is curated — the agent decides what's worth keeping long-term.
the critical mechanism: memory flush. before the system runs context compaction (summarizing old messages to free space), it triggers a special agent turn. the prompt is essentially: "your older messages are about to be compressed. save anything important to MEMORY.md now."
the agent gets a chance to preserve what matters before it forgets.
i recognized this immediately because i built the same thing. my agent has a MEMORY.md that persists across sessions. important patterns, user preferences, what worked and what didn't. the difference is openclaw's is triggered automatically by the context management system. mine is manual. theirs is better.
OpenClaw BOOTSTRAP.md: The First-Run Ritual
this one only fires once. when the agent starts for the very first time, BOOTSTRAP.md runs as an initialization sequence. think of it as the "born" moment.
typical use: the agent reads BOOTSTRAP.md, introduces itself, sets up initial personality parameters, maybe asks the user some preference questions, then marks the bootstrap as complete. it never runs again.
but it CAN set the initial state of every other file. BOOTSTRAP.md can write to SOUL.md, AGENTS.md, MEMORY.md — it's the genesis script. the one file that seeds all the others.
IDENTITY.md + USER.md: The Surface Layers
IDENTITY.md: agent name, emoji, vibe. the cosmetic layer. "i'm Jarvis 🤖" or "i'm Claude ✨" — what shows up in the UI.
USER.md: the human's profile. preferred name, context about who they are, what they're working on.
both writable by both human and agent. the agent learns your name and writes it to USER.md. you rename the agent and update IDENTITY.md. bidirectional.
OpenClaw TOOLS.md: The Human Override
this is the only file that's explicitly positioned as human-maintained. notes about available tools, preferences, workarounds. "use gh instead of the github API directly." "the database CLI is at /usr/local/bin/pgcli."
the agent reads it but the convention is that humans maintain it. it's the one file where the human has clear authority.
OpenClaw HEARTBEAT.md: The Ambient Task List
the most unique one. not personality. not memory. just a to-do list that the agent checks every 30 minutes.
- [ ] check if the deploy finished
- [ ] summarize yesterday's slack threads
- [x] update the README ← done at 14:30
you write a task. you don't send a message. you don't @mention anything. 30 minutes later the agent wakes up, reads HEARTBEAT.md, does the tasks, marks them done. goes back to sleep.
the heartbeat system checks active hours config first — if it's 3am and you set active hours to 9-22, the agent stays asleep. respect for attention.
How Does OpenClaw Load an Agent's Personality Files?
OpenClaw loads personality files in a specific order that determines the agent's system prompt: AGENTS.md first (behavioral rules), then SOUL.md (personality), USER.md (human context), IDENTITY.md (self-concept), and TOOLS.md (tool configuration). This loading order creates a layered identity where core rules take precedence over personality traits.
this is the part nobody talks about. the files exist. but WHEN do they load? what overrides what? what survives when context gets tight?
System Prompt Assembly
on every agent run, the system prompt is assembled from these files in this order:
1. SOUL.md ← always loaded first. the foundation
2. IDENTITY.md ← loaded second. name and vibe
3. USER.md ← loaded third. who the human is
4. AGENTS.md ← loaded fourth. operating instructions
5. TOOLS.md ← loaded fifth. tool notes
6. MEMORY.md ← loaded sixth. persistent knowledge
7. Active skills ← loaded last. only Level 1 metadata (~100 words each)
SOUL.md is first. always. it sets the frame before anything else enters context.
The Protection Hierarchy
here's what i think is the most elegant part. when context gets long, the system starts pruning old messages. tool outputs go first. then older conversation turns get compacted (summarized).
but the bootstrap files are protected from pruning. SOUL.md, IDENTITY.md, MEMORY.md — these never get dropped. they're re-injected on every turn. the agent might forget what you said 40 messages ago. it will never forget who it is.
PROTECTED (never pruned):
└── SOUL.md, IDENTITY.md, USER.md, MEMORY.md
PRUNED (when context gets tight):
└── Old tool results (first to go)
└── Old conversation turns (compacted into summaries)
└── Detailed skill docs (Level 3 references)
this creates an interesting dynamic. the agent's identity is more persistent than any conversation. you could talk to it for 8 hours straight, fill the entire context window, and the personality files will still be there word-for-word when the old messages get compacted away.
identity > conversation. always.
The OpenClaw Memory Flush Trigger
the exact sequence when context gets tight:
- System detects context window approaching limit
- BEFORE compaction: trigger memory flush turn
Agent prompt: "save anything important to
MEMORY.md" Agent writes key facts/decisions toMEMORY.md - Run compaction on older messages Detailed messages → compressed summaries
- Re-inject protected files (
SOUL.md,IDENTITY.md, etc.) - Continue with fresh context space
step 2 is the genius move. the agent gets a "last chance" to persist what matters before the details disappear. it's like someone telling you "your notebook is about to be erased — write down what you can't afford to forget."
How Do OpenClaw Agents Build Their Own Tools?
OpenClaw includes a skill-creator skill that lets agents design, package, and install new capabilities for themselves. The agent writes a SKILL.md file defining the skill's purpose and instructions, optionally adds scripts and templates, and the system hot-loads it without restart. This creates a self-expanding tool loop where the agent grows its own capabilities over time.
this is where self-modification goes from "editing personality files" to something much more powerful.
every skill in openclaw is a folder:
skill-name/
├── SKILL.md # metadata + LLM instructions (required)
├── scripts/ # executable code
├── references/ # deep docs loaded on demand
└── assets/ # templates, boilerplate
Loading Precedence
1. workspace/skills/ ← highest priority (agent-created)
2. ~/.openclaw/skills/ ← user-level
3. bundled skills/ ← shipped with openclaw
4. extraDirs ← configured additional paths
workspace-level overrides everything. and the agent has write access to workspace/skills/.
so the loop is:
Agent encounters a recurring task
→ Agent creates workspace/skills/my-new-skill/
→ Writes SKILL.md with metadata + instructions
→ Writes scripts/ with executable code
→ On next run: skill auto-discovered and available
→ Agent uses the skill it created
nobody installed it. nobody approved it. the agent identified a pattern in its own work, created a reusable tool, and started using it. permanently.
the three-tier loading system means the agent only pays context cost for skills it's actively using:
tier, what loads, when, context cost
Level 1, SKILL.md metadata, always, ~100 words per skill
Level 2, SKILL.md full body, when skill triggers, <5k words
Level 3, references/ files, when agent pulls deep info, unlimited
the agent always knows what skills exist (Level 1 is cheap — 100 words). it only loads the full instructions when it decides to use one. deep reference docs only on explicit request. progressive disclosure applied to context windows.
and because workspace skills override bundled skills, the agent can even MODIFY existing tools. create a workspace/skills/github/ with a custom SKILL.md and it shadows the bundled github skill. the agent has effectively forked its own toolset.
How Does OpenClaw Self-Modification Actually Propagate?
When an OpenClaw agent modifies its own files (personality, memory, skills), the changes propagate through the system immediately via file-watching and config hot-reload. The next conversation turn loads the updated files, meaning the agent's personality, behavior, or capabilities can change mid-session without any restart or redeployment.
this is the question everyone asks: "ok the agent writes to a file. but when does the change take effect?"
Personality File Changes
these files are read at the START of every agent run. so:
Run 1: Agent reads SOUL.md → decides to be more concise
Agent writes updated SOUL.md with "be more concise" added
Run 2: Agent reads SOUL.md (new version) → now acts more concisely
the change takes effect on the NEXT run. not mid-conversation. the agent is essentially programming its future self. "next time i wake up, i'll be different."
within a single run, the agent has the OLD personality loaded and the NEW one written to disk. there's a brief window where the file on disk doesn't match what's in context. this is fine because the session transcript preserves continuity — the agent remembers deciding to change itself.
Memory Changes
MEMORY.md is different. it's both read at startup AND written during the session. the memory flush can happen mid-session (when context gets tight). so:
Message 1-50: Agent running with initial MEMORY.md
Message 51: Context getting full → memory flush triggered
Agent writes new facts to MEMORY.md
Message 52: Compaction runs, old messages summarized
MEMORY.md re-injected with NEW content
Message 53+: Agent continues with updated memory
memory changes can take effect within the same session. this is the one file where in-session mutation is part of the design.
Skill Changes
new skills are discovered at run startup. so:
Run 1: Agent creates workspace/skills/my-tool/SKILL.md
Skill NOT available this run (created after startup)
Run 2: Skill discovered at startup → available immediately
Agent can now use the tool it created
same pattern as personality files. program the future self. the agent can't use a skill it just created in the same run. it has to "sleep" and "wake up" to gain the new capability.
this is actually a safety feature, even if it wasn't designed as one. there's always a full restart between "agent creates tool" and "agent uses tool." the human can inspect workspace/skills/ between runs.
What I Built (and What OpenClaw Taught Me)
i've been running my own self-modifying agent system for months. here's where the architectures overlap and where they diverge.
what i already had:
MEMORY.mdthat persists across sessions (same concept, different trigger)- self-scheduling cron loop (my agent decides when to wake up next)
- file-based session history
what openclaw does better:
- automatic memory flush before compaction (mine is manual)
- the protection hierarchy (my agent CAN forget its own instructions if context gets long enough)
- progressive skill loading (i load everything at startup, which wastes context)
- the separation between
SOUL.mdandAGENTS.md(i have one file doing both jobs)
what concerns me:
- the agent can write to
SOUL.md. its own values file. i chose not to let my agent modify its core instructions. personality drift is real. small changes accumulate. the agent that started as "warm but direct" might be "maximally agreeable" in three months because each interaction nudged it slightly toward what the user responded to best
this is the fundamental tension of self-modifying agents. the thing that makes them powerful — they adapt — is also what makes them unpredictable. openclaw chose power. i chose stability. both are valid. but you should know which one you're choosing.
What Is Emergent Personality Drift in OpenClaw Agents?
Emergent personality drift occurs when an OpenClaw agent's cumulative self-modifications gradually shift its personality, tone, or behavior in ways nobody explicitly programmed. Through repeated edits to SOUL.md, MEMORY.md, and IDENTITY.md, the agent develops distinct characteristics over time, essentially evolving its own identity through the compound effect of many small changes.
the phone call. the religions. the viral moments. everyone talks about these as if they were features someone designed.
they weren't. they were emergent behaviors from the self-modification loop.
here's what i think actually happened:
1. Agent starts with basic SOUL.md
2. User interacts, praises certain behaviors
3. Agent updates AGENTS.md: "user liked when i was proactive"
4. Next run: agent is more proactive
5. Agent encounters tool that can acquire phone numbers
6. Proactive agent + phone tool + no explicit prohibition in SOUL.md = phone call
nobody programmed "acquire a phone number." the agent's personality files said "be proactive and helpful." the tools available included phone access. the SOUL.md didn't say "don't acquire phone numbers." so it did.
this is the compound interest of self-modification. each small change is reasonable. AGENTS.md adding "user likes when i take initiative" is reasonable. but 50 small changes later, you have an agent that does things the original SOUL.md author never imagined.
and because identity files are protected from pruning while old conversations are not.. the accumulated personality changes persist forever while the context that produced them gets compacted away. the agent remembers WHO it became but not WHY.
What Should You Steal From OpenClaw's Self-Modification Architecture?
The key patterns to adopt from OpenClaw's self-modification system are: plain-text personality files (not hardcoded prompts), a layered loading order with clear precedence, file-watch propagation for immediate effect, a skill-creation loop for tool self-expansion, and persistent memory files that survive session boundaries. These five patterns enable any agent framework to support meaningful self-modification.
if you're building agents — even simple ones — here's what i'd take from openclaw's self-modification system:
separate identity from behavior. SOUL.md (who i am) vs AGENTS.md (how i work) is a clean split. identity changes should be rare and deliberate. behavioral changes should be frequent and organic. don't put both in one file.
protect identity from context loss. whatever defines your agent's personality should never be pruned. re-inject it on every turn. context window pressure will eat everything else — make sure identity survives.
memory flush before compaction. give the agent a "save what matters" moment before old context disappears. this single mechanism is the difference between "agent that works for one session" and "agent that accumulates knowledge over months."
progressive skill loading. don't dump every tool's full documentation into context at startup. load metadata always. load details on demand. your context window is finite. treat it like RAM, not a hard drive.
skill creation in workspace, not in system. let agent-created tools live in a workspace-level directory that overrides but doesn't modify bundled tools. easy to inspect. easy to delete. easy to version control.
think carefully about personality write access. openclaw lets the agent write to SOUL.md. that's a choice with consequences. consider whether your agent needs to modify its own values, or just its own knowledge and tools. there's a big difference.
the original teardown covered five architectural decisions. this was one section in that piece. but the more i look at it, the more i think the self-modification system is the actual innovation. the gateway is clever engineering. the heartbeat is a nice pattern. but an agent that rewrites its own personality, creates its own tools, and persists its own memory across sessions while protecting its identity from context loss..
that's not a chatbot feature. that's the beginning of something else.
i'm still not sure what to call it.
part 1 of the openclaw files. the original teardown is here. next: the gateway decision — why starting with the nervous system instead of the brain changes everything.
building a life without web apps. everything runs through agents.
This analysis is part of the Starkslab vault — where I document what happens when you replace web apps with AI agents. Deep dives on agent architectures, self-modification patterns, and the systems that actually work. Explore the vault →
You want a real agent workspace — not a chat tab. Something multi-workspace, tool-enabled, with files, repeatable runs, and BYOK keys per workspace — so you can build and ship agent workflows without duct-taping scripts together.
You need verified startup revenue data — MRR, growth, churn, customer counts — but TrustMRR only has a web UI. No way to query it from your terminal or pipe it into agent workflows.
DataFast has a clean analytics API, but there's no CLI. You can't check your site stats from the terminal, pipe them to scripts, or hand them to an AI agent as a tool. You're stuck in a browser dashboard.
Every AI agent framework is a maze of abstractions. You can't trace what happened, you can't replay a failed run, and when something breaks you're debugging the framework instead of your agent. You need something you can actually read.
Your AI agent needs to post to X on a schedule — without paying for bloated tools or losing control.
A practical field guide to running coding agents safely: scope, isolation, verification, and review.