Back to notes
AI Agent ToolsSupport
Deep dive/May 22, 2026/Support

Agent CLI Control Surfaces: What To Compare Before You Trust a Coding Agent

The useful agent CLI comparison is not a winner ranking. It is a control-surface audit: what the tool can see, edit, execute, delegate, extend, report, and recover from before an operator trusts it.

orientation

AI Agent Tools/Support/readable page
Map the control-plane layer

Marker: ai-agent-cli-control-surface-comparison-draft

Agent CLI Control Surfaces: What To Compare Before You Trust a Coding Agent

Every serious coding agent CLI now arrives with the same promise: it can help inside your repo.

That is not the comparison that matters.

The useful question is what the CLI is allowed to do before you trust it: what it can read, edit, execute, call, remember, delegate, extend, report, and recover from. A coding agent CLI is not just a chat box in a terminal. It is a local automation surface with tools, permissions, context files, provider credentials, MCP servers, skills, subagents, headless modes, logs, checkpoints, and review boundaries.

This page compares that control surface across Codex, Claude Code, Gemini CLI and the Antigravity transition, OpenCode, Qwen Code, and two sidebar baselines: Aider and Goose. It is not a ranking. It is not a benchmark. It is not a recommendation to adopt any one tool.

Short version: before choosing a coding agent CLI, compare the control surface, not the logo. Look for explicit planning modes, permission classes, MCP filters, context-file precedence, subagent boundaries, structured headless output, rollback paths, telemetry posture, and a review layer that fits your workflow.

Proof state: this draft is based on accepted Starkslab source-read artifacts for Codex, Claude Code, Gemini/Antigravity, OpenCode, Qwen Code, Aider, Goose, and the agent CLI comparison source matrix. Starkslab did not install, run, authenticate, benchmark, sandbox-test, or security-test these tools in this pass.

What this page covers:

  • what an agent CLI control surface is;
  • what source reads can and cannot prove;
  • how Codex, Claude Code, Gemini/Antigravity, OpenCode, and Qwen Code expose comparable surfaces;
  • why Aider and Goose belong as sidebars instead of pretending every CLI has the same job;
  • why MCP, plugins, skills, extensions, hooks, and subagents are authority expansion;
  • what operators should inspect before letting any CLI mutate a repo;
  • where this fits in the Starkslab AI Agent Tools, Build AI Agent, and OpenClaw stack.

If you want the broader workflow layer, read AI Coding Agent Workflow. If you want the harness layer above native CLIs, read The Coding Agent Harness Layer. If you want the operator-owned control plane around skills, MCP, config, sessions, and gates, read What Is a Coding-Agent Control Plane?.

What Is An Agent CLI Control Surface?

An agent CLI control surface is the set of local runtime controls that decide what an AI coding agent can see, change, execute, call, remember, delegate, and report back.

The terminal UI is only the visible part.

The real product layer is underneath:

model
-> agent runtime
-> tools, permissions, context, auth, MCP, skills, subagents
-> output, checkpoints, telemetry, review
-> operator decision

That middle layer is where trust is either earned or faked. A better model can still be dangerous if the CLI hides broad shell access behind friendly copy. A less glamorous tool can be useful if it makes planning, permissions, logs, and recovery explicit.

For operators, the first comparison is not "which tool writes the nicest code?" It is:

  • can it stay read-only while planning?
  • are file reads, writes, shell commands, network calls, external directories, and MCP tools separate permission classes?
  • where do project rules live?
  • can subagents inherit tools accidentally?
  • can headless runs emit structured output a supervisor can inspect?
  • what happens after a bad edit?

That is the Starkslab frame: compare controls before claims.

What This Comparison Can And Cannot Prove

This comparison can describe source-visible and docs-visible control surfaces.

It cannot prove runtime quality, sandbox correctness, permission enforcement, provider behavior, security, privacy, benchmark performance, install safety, package safety, migration smoothness, production readiness, or adoption.

The accepted source reads support a narrow set of public claims:

Claim type Safe from this pass Not safe from this pass
Architecture docs/source describe CLI, runtime, server, SDK, daemon, or package splits the architecture is secure or reliable
Permissions docs/source name modes, policies, allow/ask/deny rules, or sandbox options the rules are correctly enforced under adversarial use
MCP and extensions docs/source show MCP, plugins, skills, hooks, or extensions integrations are safe or harmless
Headless output docs/source expose JSON, JSONL, stream output, SDKs, or CI surfaces unattended production use is safe
Recovery docs/source mention checkpoints, sessions, export, telemetry, or source-control expectations recovery covers every mutation path
Currentness accepted artifacts name product-lineage caveats all access, pricing, and migration behavior is validated

That boundary is not weakness. It is the point. A source read is a good first filter because it shows what to inspect. It is not a substitute for a runtime audit.

The Fast Comparison Matrix

Use this matrix as a skimmer path. It does not score the tools.

Surface What to inspect Source-visible examples Starkslab operator lesson
Architecture Is this only a terminal UI, or a runtime with protocol surfaces? Codex exposes CLI, app-server, SDK, MCP server, and GitHub Action surfaces. Claude Code documents terminal, IDE, SDK, CI, Remote Control, and review surfaces. OpenCode documents TUI plus local server/client surfaces. Qwen Code documents CLI/core plus SDK, IDE, and experimental daemon/ACP paths. Separate UI, runtime, provider, tool, and review layers. One command is not one safety boundary.
Planning and mutation Can the tool stay read-only until the operator trusts the plan? Codex separates sandbox and approval policy. Claude Code has plan, default, acceptEdits, auto, dontAsk, and bypassPermissions modes. Gemini CLI has plan-mode lineage. Qwen Code names plan, default, auto-edit, and yolo modes. OpenCode exposes Plan/Build agents and permission rules. Prompts are not policy. Modes must be explicit and reviewable.
Permissions Are read, write, shell, web, MCP, and external directory access separate? Codex has sandbox modes and approval policies. Claude Code targets tools, paths, commands, domains, MCP tools, and agents. OpenCode has allow, ask, and deny. Qwen Code has approval modes, settings precedence, and sandbox toggles. Gemini CLI has confirmations, trusted folders, and checkpoint concepts. Tool classes are risk classes. Treat write, shell, network, and external-account actions differently.
MCP and extensibility Can extra tools be included, excluded, trusted, and audited? Codex, Claude Code, Gemini CLI, OpenCode, and Qwen Code all expose MCP-related surfaces in accepted source artifacts. Codex and Claude Code also have skills/plugins. OpenCode has plugins. Gemini has extensions. Qwen has Skills and Subagents. MCP, plugins, skills, extensions, hooks, and subagents are authority expansion, not feature decoration.
Context files Where does the agent get project rules? Codex and OpenCode use AGENTS.md surfaces. Claude Code uses CLAUDE.md and can import AGENTS.md. Gemini CLI uses GEMINI.md. Qwen Code uses settings, Skills, and agent files. Instruction supply is part of the control surface. Global and project rules should be visible and scoped.
Headless output Can a supervisor inspect what happened? Codex has codex exec JSONL event output. Claude Code has claude -p, structured output, Agent SDK, sessions, and telemetry. Gemini CLI has headless JSON lineage. OpenCode has run, serve, SDK/API, export/import. Qwen Code has text, JSON, and stream-JSON surfaces. Machine-readable output is what turns a CLI into factory infrastructure.
Recovery and audit What happens after a bad edit? Codex has sandbox, review, events, and session surfaces. Claude Code has sessions, checkpoints, and documented checkpoint limits. Gemini CLI has checkpointing and telemetry lineage. OpenCode has export/share and permission surfaces. Qwen Code has sandbox and approval surfaces. Recovery is not vibes. You still need source control, diffs, validation, and review gates.

Architecture: Is The CLI Just A Terminal UI, Or A Runtime?

A serious agent CLI is usually more than a terminal prompt.

Codex is the clearest public example in the accepted source reads. Its docs and repository surfaces expose a local CLI, Rust workspace, TUI, app-server, SDK, MCP server, non-interactive execution, skills/plugins, subagents, GitHub Action, and review surfaces. The source-read conclusion was not "Codex is best." It was narrower: Codex is strong comparison stock because it makes many control-surface layers visible.

Claude Code is also a broad platform surface, but the evidence boundary is different. The accepted source read used official docs, not an implementation repository. Those docs describe terminal, IDE, desktop/browser, Agent SDK, GitHub Actions, Code Review, Remote Control, skills, subagents, hooks, memory, sessions, checkpointing, and observability. That is enough to include Claude Code as a major comparison column. It is not implementation proof.

OpenCode exposes a different architectural shape: a TUI over local server/client surfaces, plus run, serve, web, SDK, export/import, GitHub automation, and provider routing. Qwen Code exposes CLI/core split, SDK/IDE surfaces, and experimental daemon/ACP-like paths. Gemini CLI is useful lineage for CLI/core split, tools, MCP, plan mode, extensions, folder trust, checkpointing, headless output, telemetry, and release-channel concepts, but the Google currentness story now needs Antigravity context.

The operator lesson is simple: do not inspect "the CLI" as one thing. Inspect the UI, runtime, local server, SDK, daemon, provider, credential, tool, review, and automation surfaces separately.

For the harness-level version of this problem, route to The Coding Agent Harness Layer.

Aider and Goose are worth naming, but they should not distort the main comparison.

Aider is the mature terminal pair-programming baseline. The accepted source read found strong public evidence for editable file context, repo maps, git-backed recovery, model/provider routing, scripting, lint/test loops, chat modes such as code/ask/architect/help, and source-control-centered review. That makes Aider useful when the reader asks, "what did terminal AI coding look like before every tool became a local control plane?"

The boundary matters. This pass did not find first-class source-visible MCP, subagent, sandbox, per-tool permission, or structured JSON automation surfaces comparable to Codex, OpenCode, Qwen Code, or Goose. So Aider belongs as a baseline sidebar, not as a full modern control-surface column.

Goose is the opposite kind of sidebar. It is broader than coding agent CLI work. The accepted source read found desktop, CLI, API/server, MCP extension, ACP, recipe, subagent, permission, allowlist, structured goose run, remote goosed, and optional macOS sandbox surfaces. That makes Goose strong evidence for the general local-agent control-plane shape.

But this page is still about coding agent CLI trust boundaries. Goose should support that frame by showing how far the surface can widen once desktop, server, MCP, ACP, recipes, and subagents enter the picture. It should not turn the page into a Goose tutorial, adoption recommendation, benchmark, or security audit.

Planning And Mutation: Can The Tool Stay Read-Only?

The first operational question is whether the tool can separate planning from mutation as a named mode.

That cannot depend on a polite prompt like "please do not edit yet." Prompts help. They are not a permission model.

Codex separates sandbox mode from approval policy. The accepted source read called out read-only, workspace-write, danger-full-access, on-request approval, never approval, granular approval policy, writable roots, network/domain controls, and auto-review as a reviewer swap at the sandbox boundary. None of that proves enforcement. It does show a mature vocabulary for planning, editing, and review boundaries.

Claude Code documents permission modes such as plan, default, acceptEdits, auto, dontAsk, and bypassPermissions. The useful public lesson is not that every mode is safe. The useful lesson is that mode names carry different operator postures. bypassPermissions belongs in explicit risk language, not normal workflow copy.

Gemini CLI contributes plan-mode lineage and checkpointing context, while Antigravity changes the Google currentness boundary. OpenCode brings Plan/Build agents and allow / ask / deny permission grammar. Qwen Code makes the contrast especially visible with plan, default, auto-edit, and yolo approval modes.

For Starkslab, this maps directly to lane scopes and target artifact paths. A lane:repo-dissection issue should not become a draft. A lane:draft issue should not mutate public source routes. A coding-agent CLI deserves the same discipline: read-only planning, named mutation authority, and a reviewable closeout.

Tools And Permissions: What Can The Agent Actually Do?

Tool permissions define blast radius.

The core risk classes are different:

  • reading project files;
  • writing project files;
  • running shell commands;
  • touching external directories;
  • accessing the network;
  • loading MCP tools;
  • using external-account integrations;
  • spawning subagents;
  • running hooks, plugins, skills, or extension code;
  • publishing, pushing, deploying, or mutating public surfaces.

Codex exposes file, shell, web, MCP, skill, plugin, subagent, SDK, app-server, and GitHub Action surfaces across its docs/source read. Claude Code uses tool names as permission and hook targets, including Bash, file/path rules, WebFetch domains, MCP tools, and Agent(AgentName) style subagent controls. OpenCode exposes command/path permission rules and external-directory posture. Qwen Code combines approval modes, sandbox options, built-in tools, MCP, Skills, and Subagents. Gemini CLI lineage includes built-in tools, shell, web fetch/search, MCP calls, confirmations, trusted folders, and checkpoints.

The public copy should stay source-visible. Do not write "this is safe." Write what the source/docs surface shows and what an operator still needs to test.

The practical inspection checklist is:

  • Can read-only work run without write tools?
  • Are shell commands gated separately from file edits?
  • Are external directories blocked or explicitly asked?
  • Are .env and credential-like files protected?
  • Are destructive commands denied, asked, or hidden behind broad allow rules?
  • Can MCP tools be allowed or denied by server and tool name?
  • Can plugins, skills, hooks, and extensions be reviewed before loading?

For the broader safety-gate frame, route to What Is a Coding-Agent Control Plane?.

MCP, Plugins, Skills, Hooks, Extensions, And Subagents Are Authority Expansion

MCP support is not a harmless feature bullet.

An MCP server can add new tools, remote services, credentials, OAuth flows, external state, local commands, and failure modes. Plugins can bundle skills, app integrations, MCP servers, hooks, commands, and package install paths. Skills can change behavior through procedural instructions and optional scripts. Hooks can intercept lifecycle events. Subagents can add extra agent loops with their own context, tool grants, memory, and result handoff.

That is authority expansion.

Codex makes the authoring/distribution distinction explicit: skills are reusable workflow packages, while plugins can bundle skills, apps, and MCP servers. Claude Code documents skills, plugins, hooks, MCP, and subagents as configurable operator surfaces. The accepted Claude Code read also flagged a critical boundary: subagents inherit the main conversation's tools by default unless restricted. OpenCode shows MCP servers, plugins, custom agents/subagents, and GitHub automation surfaces. Qwen Code shows MCP, Skills, Subagents, task delegation, and SDK/daemon paths. Gemini CLI contributes MCP filters and extensions, while Antigravity official transition evidence introduces forward-looking Agent Skills, Hooks, Subagents, and Extensions language without Starkslab runtime validation.

The inspection question is not "does it have MCP?"

The inspection question is:

  • Which tools does the MCP server expose?
  • Can tools be included or excluded by name?
  • Does it use stdio, HTTP, OAuth, headers, tokens, or environment variables?
  • Can a subagent inherit MCP tools accidentally?
  • Do plugins or skills run scripts?
  • Are hooks allowed to block or only observe?
  • Is there a visible trust flag, transport boundary, and rollback path?

Starkslab's rule is boring and useful: every external tool grant needs a reviewable reason.

Context Files: Where Does The Agent Learn Project Rules?

Project rules are part of the control surface.

An agent can only obey the operating doctrine it actually receives. That means the location, precedence, and scope of instruction files matter.

Codex reads AGENTS.md with global and project scopes, nested precedence, and override behavior. OpenCode also uses AGENTS.md and related rule surfaces. Claude Code uses CLAUDE.md, CLAUDE.local.md, .claude directories, settings, memory, skills, plugins, and an import pattern when a repo already uses AGENTS.md. Gemini CLI uses hierarchical GEMINI.md memory/context behavior. Qwen Code uses settings, .qwen/skills, .qwen/agents, .qwenignore, project/user/extension scopes, and provider configuration.

This is not trivia. It decides which instructions enter the task before the model starts editing.

For Starkslab, the equivalent is the issue body plus workspace instructions: lane, action, brain profile, context pack, target artifact, validation, landing mode, review owner, constraints, and out-of-scope boundaries. That is why a weak ticket creates a weak worker.

Good public comparison language:

Context files are not documentation around the tool. They are instruction supply.

Bad public comparison language:

This tool supports project rules, so it is safe.

Rules can be ignored, overridden, overloaded, or made ambiguous. Permission gates, validation commands, and review ownership still matter.

For Starkslab's source-read posture, route to I Read OpenClaw's Source Code.

Headless Output: Can The CLI Become Part Of A Factory?

A coding agent CLI becomes factory-useful when another process can inspect what happened.

Human-readable transcripts are useful, but async workflows need more:

  • structured events;
  • final output files;
  • output schemas;
  • session IDs;
  • resumable runs;
  • tool-call stats;
  • file-modification stats;
  • error events;
  • validation output;
  • cost or usage telemetry where available.

Codex has the clearest accepted source-read evidence here: codex exec, JSONL event streams, final-message output, output schema support, session resume, app-server control, SDK control, and MCP-server mode. Claude Code brings claude -p, structured output, Agent SDK sessions, checkpointing, telemetry, and CI/review surfaces. Gemini CLI has headless JSON and telemetry lineage. OpenCode has run, serve, SDK/API, export/import, stats, and server surfaces. Qwen Code has text, JSON, stream-JSON, session resume, SDKs, persistent retry, and experimental daemon/ACP evidence.

That does not mean any of them are safe to run unattended in production.

It means they expose the right kind of surfaces for a supervisor to ask better questions:

  • What was the prompt?
  • What tools were called?
  • Which files changed?
  • Which commands ran?
  • What failed?
  • What did the agent ask for permission to do?
  • What was denied?
  • What validation ran?

This is where Starkslab's own workflow translates the category. Symphony tickets are useful only when workers leave exact artifact paths, validation commands, blocker reasons, and review ownership. Agent CLI output should be held to the same standard if it becomes part of a production factory.

For the CLI-first tool-building angle, route to Build CLI Tools for AI Agents and Analytics if the route is verified at publish prep.

Recovery, Telemetry, And Trust: What Happens After A Bad Edit?

Recovery is part of trust.

An agent CLI that can mutate files needs a rollback story. That story cannot stop at "the agent is careful."

Source-visible recovery surfaces in the accepted artifacts include:

  • Codex sandboxing, approval policy, auto-review boundaries, JSONL event output, session surfaces, and source-control-oriented review;
  • Claude Code sessions, branching, transcript export, checkpoints, and a documented limitation that checkpointing does not cover Bash-made file changes;
  • Gemini CLI checkpointing, folder trust, telemetry, and transition-sensitive release/access posture;
  • OpenCode permissions, export/import, server/share caveats, and rules;
  • Qwen Code sandbox modes, approval modes, settings precedence, telemetry/config surfaces, and subagent/worktree caveats.

The safe lesson is not "these recovery systems make the tools safe."

The safe lesson is:

Recovery features are control surfaces. They reduce risk only when paired with source control, explicit permissions, validation, and review.

That is why Starkslab keeps public/source mutation behind gates. A clean draft is not a live note. A source-visible sandbox option is not a security audit. A checkpoint is not a Git history substitute. A code review product is not a replacement for tests, diff review, and publication discipline.

Gemini And Antigravity: Currentness Is Part Of The Control Surface

Gemini CLI remains useful as source-visible lineage stock.

The accepted artifacts use it for CLI/core split, built-in tools, MCP, plan mode, extensions, folder trust, checkpointing, headless output, telemetry, and release-channel concepts.

The currentness caveat is the important part. Google's May 19, 2026 transition evidence changes how Starkslab should talk about Gemini CLI for consumer/free use. The accepted Antigravity brief says consumer/free and individual Google AI Pro/Ultra paths are migration-bound toward Antigravity CLI by June 18, 2026, while enterprise/license and API-key paths must be separated.

So the claim-safe wording is:

Gemini CLI is useful lineage and control-surface evidence. Antigravity CLI is the current Google terminal-agent transition surface, and consumer/free Gemini CLI guidance needs a date-stamped caveat after Google's May 19, 2026 transition notice.

Blocked wording:

  • "Gemini CLI is still the best free Google coding agent."
  • "Antigravity CLI has full feature parity."
  • "Antigravity CLI is production-ready."
  • "Enterprise users follow the same shutdown path as consumer/free users."

Currentness is a trust boundary. A stale access path can make an otherwise good technical note misleading.

Source Basis For This Pass

This page uses official docs, repositories, and accepted Starkslab source-read artifacts as comparison evidence. The most important public sources behind the claims are:

Those links are source basis, not endorsement. This page still does not claim runtime validation, security validation, install safety, benchmark quality, or production readiness for any tool.

What Not To Conclude From Source Reads

Do not conclude that any compared tool is the winner.

Do not conclude that any sandbox, permission mode, checkpoint system, MCP filter, plugin manager, subagent system, SDK, GitHub Action, or headless mode is safe because it exists.

Do not conclude that Codex, Claude Code, Gemini/Antigravity, OpenCode, Qwen Code, Aider, or Goose is production-ready for your repo from this comparison.

Do not conclude that Starkslab has adopted these tools for unattended work because the source reads are positive.

The narrow conclusion is stronger:

Modern coding-agent CLIs are becoming local control planes. The tools worth taking seriously expose permissions, context, extensibility, automation output, recovery, and review boundaries clearly enough for operators to inspect.

That is the comparison worth publishing.

Operator Checklist For Comparing Agent CLIs

Use this before trusting any coding agent CLI inside a real repo.

  1. Does the CLI separate planning from mutation?
  2. Are file reads, file writes, shell commands, web access, external directories, and MCP tools separate permission classes?
  3. Can destructive commands be denied or forced through an ask gate?
  4. Can MCP tools be allowed or denied by server and tool name?
  5. Do plugins, skills, hooks, or extensions install or execute code?
  6. Where do project rules live: AGENTS.md, CLAUDE.md, GEMINI.md, skills, settings, plugins, memory, or something else?
  7. What takes precedence when global, user, project, local, and nested rules conflict?
  8. Can subagents inherit tools, MCP servers, memory, or write permissions by default?
  9. Is headless output structured enough for a supervisor to inspect?
  10. Are sessions resumable, exportable, and identifiable?
  11. Does checkpointing cover all file mutations, or only tool-made edits?
  12. Are telemetry, prompt logging, and usage reporting documented and configurable?
  13. Are auth, provider, subscription, and API-key paths separated?
  14. Are currentness and product-lineage caveats visible?
  15. Can you validate the result with tests, diff checks, and source-control review?

If a tool cannot answer these questions clearly, it may still be interesting. It is not ready to be trusted blindly.

Where This Fits In The Starkslab Stack

Agent CLIs are runtime surfaces. Starkslab's workflow needs them to become reviewable artifacts.

That is the bridge between the AI Agent Tools cluster and Starkslab's first-party OpenClaw/Symphony work.

At the tool layer, Codex, Claude Code, Gemini/Antigravity, OpenCode, and Qwen Code show the modern surface area: permissions, tools, MCP, skills, subagents, context files, headless output, sessions, checkpoints, telemetry, providers, and remote/control APIs. Aider and Goose add useful sidebars: one for mature terminal pair-programming controls, the other for a broader local-agent runtime with MCP, ACP, recipes, subagents, and server surfaces.

At the operator layer, Starkslab adds:

  • lane scopes;
  • target artifact paths;
  • source/public mutation boundaries;
  • validation commands;
  • conflict-marker scans;
  • diff checks;
  • Zed review by default;
  • Cosmo escalation only for real boundaries;
  • live publication only through validated paths.

The important translation is this:

A coding agent CLI is not operator-grade because it can edit files. It becomes operator-grade only when its controls connect to artifacts, validation, and review.

That is why this page belongs in AI Agent Tools as a comparison/support note, routes into Build AI Agent for product architecture lessons, and uses OpenClaw/Symphony as first-party proof that control surfaces need contracts.

If you want the workflow layer, read AI Coding Agent Workflow.

If you want the harness layer above native CLIs, read The Coding Agent Harness Layer.

If you want the broader operator control plane around skills, MCP, config, sessions, and review gates, read What Is a Coding-Agent Control Plane?.

If you want Starkslab's first-party protocol and harness context, read OpenClaw, Codex, Claude Code, and ACP.

If you want the source-read posture behind this style of comparison, read I Read OpenClaw's Source Code.

Publish-prep should verify every route before linking. If a target is not live or canonical, leave it as plain text or remove the link.

next action

Map the control-plane layerCompare the harness layer
Back to Library

Want the deeper systems behind this note?

See the Vault