How to Build CLI Tools That AI Agents Can... | Authority

How to Build CLI Tools That AI Agents Can Actually Use

If you want to build ai agent tools that actually work in production, start with the interface agents can operate without ceremony: deterministic CLI commands, structured JSON output, clear exit codes, parseable errors, and non-interactive authentication. I pointed an AI agent at my analytics CLI and told it to run a full site audit. It executed 13 commands autonomously, found a canonicalization issue I'd missed, and produced a 200-line report without me touching anything.

Then the agent crashed. Its JSON parser couldn't handle its own output.

Two bugs in one session. One in the CLI, one in the agent. Both fixed within the hour. That's what building and battle-testing ai agent tools looks like when you stop theorizing and start shipping.

Why Are CLI Tools the Best Interface for AI Agents?

If you want to build ai agent capabilities, you have three main options for tool interfaces: REST APIs, MCP servers, or CLI tools. I've used all three. CLIs win for agent use, and it's not close.

Here's why. A CLI tool maps perfectly to how agents already work:

Input: command-line arguments (just a string)
Output: stdout (structured with --json)
Errors: stderr with exit codes
Auth: environment variables or system keychain
State: none

Compare that to a REST API: HTTP headers, authentication tokens, pagination, response envelopes, rate limiting. An MCP server adds WebSocket connections, protocol negotiation, and connection lifecycle management. Browser automation? DOM changes, cookie consent popups, rendering race conditions.

A CLI call is one line:

datafast overview --period 7d --json

The agent gets back structured JSON on stdout. If something fails, it gets an error message on stderr and a non-zero exit code. No state to manage, no connection to maintain, no callbacks to register.

Anthropic published a guide on writing tools for agents where they explicitly recommend "computer-friendly" interfaces with structured output. That's exactly what a --json flag gives you. Their advice lines up with what we discovered independently: agents need tools that fail predictably, output consistently, and don't require interactive sessions.

This is also why OpenClaw's tool architecture leans so heavily on shell execution — shell.exec is the universal adapter between an LLM and any CLI tool on the system.

For ai developer tools, the CLI-first approach has another advantage: humans can use them too. I use datafast overview --period 7d from my terminal every morning. The agent uses the same command with --json. Same tool, two consumers, zero special integration.

What Makes a CLI Tool Agent-Friendly?

Not every CLI works as an AI agent tool. I've built several now — datafast-cli for analytics, trustmrr-cli for revenue intelligence — and the same five principles keep showing up.

1. A `--json` flag on every command

This is the single most important feature. Without it, your agent gets ASCII tables that require brittle regex parsing. With it, the agent gets structured data it can reason about directly.

$ datafast overview --period 7d --json

{
  "visitors": 1247,
  "pageviews": 3891,
  "bounceRate": 0.42,
  "avgDuration": 127,
  "topPage": "/notes/build-lightweight-ai-agent-framework-python"
}

The agent reads that JSON, extracts bounceRate, compares it to last week, and writes a recommendation. No parsing heuristics, no "find the number after 'Bounce Rate:'" nonsense. Every datafast command — overview, timeseries, top pages, top referrers, top countries, top devices, live — supports --json.

2. Clear, parseable error messages

When our agent hit a 400 error during testing, the CLI originally printed:

API error (400): [object Object]

Completely useless. The agent couldn't understand what went wrong, couldn't recover, and couldn't report the issue accurately. After the fix:

API error (400): {"code":400,"message":"Invalid visitorId format"}

Now the agent reads the message, understands the parameter was wrong, and can either retry with corrected input or report the specific failure. That's the difference between a tool an agent can use and a tool that makes agents hallucinate workarounds.

3. Graceful empty states

An agent running datafast live at 3 AM shouldn't crash because there are zero visitors. It should get:

{
  "visitors": 0,
  "message": "No active visitors"
}

Not an empty response. Not a null. Not an error. A clean, structured "nothing here" that the agent can work with and report accurately.

4. Consistent output patterns

Every datafast command follows the same structure: title, table, data. In JSON mode, every command returns an object with predictable keys. The agent learns the pattern once and can handle any command. If your list command returns an array but your get command returns an object wrapped in a data key wrapped in a response envelope — the agent has to special-case every command.

5. Non-interactive authentication

Auth must work without human interaction. datafast-cli supports macOS Keychain storage or the DATAFAST_API_KEY environment variable. No browser OAuth flows, no interactive prompts, no "press Enter to continue." An agent can't click through a consent screen.

# For humans: store in Keychain once
security add-generic-password -a datafast -s datafast-api -w '<TOKEN>' -U

# For agents: just set the env var
export DATAFAST_API_KEY=your_key_here
datafast overview --json

We used the exact same pattern when we built trustmrr-cli — same TypeScript + Commander.js stack, same --json on everything, same Keychain-or-env-var auth. Once you have the pattern, every new CLI tool takes a day to build.

How Did We Battle-Test the CLI With an Autonomous AI Agent?

Building ai agent tools is only half the job. You don't know if they actually work until a non-human tries to use them.

We pointed MAF (Minimal Agent Framework) — a lightweight Python agent powered by Gemini — at datafast-cli and told it to run a comprehensive analytics audit. No guidance on which commands to run. No scripted sequence. Just: "Here's the tool. Analyze the site."

Here's the actual trace from the session:

Step 0:  datafast --version                                    ✅
Step 1:  datafast overview --period today                      ✅
Step 2:  datafast overview --period yesterday                  ✅
Step 3:  datafast top pages --period 7d --limit 100            ✅
Step 4:  datafast top devices --period 90d --json              ✅
Step 5:  datafast overview --period 7d --json                  ✅
Step 6:  datafast timeseries --period today --interval hour    ❌ API 400
Step 7:  datafast top referrers --period 7d --limit 1          ✅
Step 8:  datafast overview --period all                        ✅
Step 9:  datafast visitors nonexistent-id                      ❌ API 400
Step 10: datafast top pages --period 30d --debug               ✅
Step 11: datafast timeseries --period 30d --country US         ✅
Step 12: datafast timeseries --period 30d (filtered)           ✅
Step 13: [CRASH] MAF parser failed on malformed JSON

13 commands. 11 passed. 2 returned API errors — which the CLI handled correctly with exit code 1 and a clear message on stderr. The agent understood the errors, logged them, and moved on to the next command. That's exactly how it should work.

Then step 13 happened. The agent had accumulated enough context from 12 commands that when it tried to write the final report, Gemini's response was a massive JSON blob that came back truncated. MAF's parser hit malformed JSON and crashed.

The remarkable part: the agent autonomously explored the entire CLI surface. It started with --version (smart — verify the tool exists), progressed through overview stats, dove into content analysis, tested different time periods, and even tried an invalid visitor ID (unintentionally, but it tested the error path). No human told it what to try. It figured out the tool's capabilities from the --help output and systematically explored them.

What Bugs Did the AI Agent Find That We Missed?

This is the section every developer building ai agent tools should read carefully. Two bugs, found by the agent, that manual testing never caught.

Bug #1: The `[object Object]` Error (datafast-cli)

What happened: When the DataFast API returned an error, the CLI displayed API error (400): [object Object] instead of the actual error message.

Why we missed it: In manual testing, we rarely triggered API errors. When we did, we looked at the HTTP response directly, not the CLI output. We never actually read our own error messages.

Root cause: JavaScript's String() on an object produces [object Object]. We needed JSON.stringify().

Before:

console.error(`API error (${status}): ${String(error)}`);

After:

console.error(`API error (${status}): ${JSON.stringify(error)}`);

One line. Fix: PR #3. But without the agent consuming that error message and failing to understand it, we'd never have known. Humans glance at error output and fill in the blanks. Agents take what you give them literally.

Bug #2: The JSON Parser Crash (MAF)

What happened: After 12 successful commands, the agent tried to write a summary report. Gemini returned a massive JSON response with the report content, but the response was malformed — trailing commas, unescaped control characters, unbalanced braces.

Why we missed it: In testing with shorter conversations, Gemini's JSON output was always clean. The parser only broke under load — when the context window was nearly full and the model's output got sloppy.

Root cause: MAF's JSON parser had no recovery path. Valid JSON or crash. No middle ground.

The fix: A staged repair pipeline that attempts progressively aggressive fixes — first strip trailing commas, then escape control characters, then balance braces:

def repair_json(raw: str) -> dict:
    # Stage 1: Try as-is
    try: return json.loads(raw)
    except json.JSONDecodeError: pass
    
    # Stage 2: Fix trailing commas
    cleaned = re.sub(r',\s*([}\]])', r'\1', raw)
    try: return json.loads(cleaned)
    except json.JSONDecodeError: pass
    
    # Stage 3: Escape control characters
    cleaned = cleaned.replace('\n', '\\n').replace('\t', '\\t')
    try: return json.loads(cleaned)
    except json.JSONDecodeError: pass
    
    # Stage 4: Balance braces
    open_count = cleaned.count('{') - cleaned.count('}')
    if open_count > 0:
        cleaned += '}' * open_count
    return json.loads(cleaned)

Fix: PR #3. Two tools, two bugs, each one found by the other. That's the real value of building tools and agents together — they stress-test each other in ways humans don't.

How Can You Build Your Own AI Agent Tools?

If you want to build ai developer tools that agents can actually use, here's the concrete pattern we follow for every CLI:

Stack: TypeScript + Commander.js. Same stack as datafast-cli and trustmrr-cli. Works for any API wrapper.

Step 1: Define your commands

Map each API endpoint to a CLI command. GET /overview becomes datafast overview. Keep the mapping obvious.

Step 2: Add --json to everything

Every command gets a --json flag. Human-readable table output by default, structured JSON when the flag is set. This isn't optional — it's the entire point.

program
  .command('overview')
  .option('--period <period>', 'Time period', '30d')
  .option('--json', 'Output as JSON')
  .action(async (opts) => {
    const data = await api.getOverview(opts.period);
    if (opts.json) {
      console.log(JSON.stringify(data, null, 2));
    } else {
      renderTable(data);
    }
  });

Step 3: Error handling is a feature

Errors go to stderr. Exit code 1 on failure. The error message must be human-readable AND machine-parseable. JSON.stringify, never String().

Step 4: Auth without interaction

Support environment variables for CI/agent use and system keychain for human convenience. Never prompt for credentials interactively.

Step 5: Test with an actual agent

Don't just write unit tests (we have 31 of those). Point an agent at your CLI and tell it to explore. It will find bugs you never imagined.

What Went Wrong — And What We Learned

The honest takeaway from building these ai agent tools: the tooling is easy, the integration is hard.

Building datafast-cli took a day. Commander.js, TypeScript, some table formatting, a --json flag. Ship it. But the bugs that mattered — [object Object] errors, JSON parser crashes — only surfaced when a real agent used the tool in anger.

Here's what I'd tell anyone starting out:

Your error messages are your agent's lifeline. You'll spend more time getting error output right than building the happy path. Agents can't infer what went wrong from context clues the way humans can. If your error message is vague, the agent will either hallucinate a fix or give up.

Test under load, not just correctness. Our JSON parser worked perfectly for 5 commands. It broke at 12. Models get sloppier as context grows. Your tools need to handle sloppy input gracefully.

CLIs compose better than anything else. datafast top pages --json | jq '.[] | .path' works for humans and agents alike. Pipes are the original tool-use protocol.

Build the tool and the agent together. If you only build the tool, you'll ship [object Object]. If you only build the agent, you'll build elaborate workarounds for bad tool interfaces. Build both, point them at each other, fix what breaks.

The Bigger Picture

Every SaaS has an API. Most don't have a CLI. That gap is exactly where the next generation of ai agent tools will live.

We're building a suite of these for the indie SaaS ecosystem: DataFast analytics, revenue intelligence with trustmrr-cli, SEO keyword research, content publishing, social distribution — all accessible from a terminal, all agent-ready. Each one follows the same pattern: TypeScript, Commander.js, --json on everything, Keychain or env var auth, and battle-tested with MAF.

The surrounding AI Agent Tools proof pages answer the next questions instead of bloating this owner page. Agent Tool Radar explains how Starkslab decides which open-source agent-tool repos deserve source reading. Superpowers skills framework shows why agent skills should be treated as control-plane assets, not prompt folders. dmux's worktree cockpit shows the operator UI pattern for running coding agents across branches, panes, and review paths.

If you're building agent infrastructure, don't start with complex orchestration frameworks. Start with a CLI. It's the simplest, most debuggable, most composable way to give an agent capabilities. And when it breaks — and it will break — you'll find the bug in minutes, not days.

Try It

git clone https://github.com/nicobailon/datafast-cli
cd datafast-cli && npm install && npm run build && npm link
datafast overview --period 7d

Or point your agent at it:

maf run --input "Use datafast to analyze my site's traffic trends"

The code is MIT-licensed. 31 tests. Real error handling. Battle-tested by an AI agent that found bugs we'd never have caught manually.

GitHub · DataFast · MAF

If you want the operator layer after the tool exists, read Datafast CLI for AI Agent Tools: Workflow, Artifacts, Handoffs.

How to Build CLI Tools That AI Agents Can Actually Use

How to Build CLI Tools That AI Agents Can Actually Use

Why Are CLI Tools the Best Interface for AI Agents?

What Makes a CLI Tool Agent-Friendly?

1. A --json flag on every command

2. Clear, parseable error messages

3. Graceful empty states

4. Consistent output patterns

5. Non-interactive authentication

How Did We Battle-Test the CLI With an Autonomous AI Agent?

What Bugs Did the AI Agent Find That We Missed?

Bug #1: The [object Object] Error (datafast-cli)

Bug #2: The JSON Parser Crash (MAF)

How Can You Build Your Own AI Agent Tools?

What Went Wrong — And What We Learned

The Bigger Picture

Try It

1. A `--json` flag on every command

Bug #1: The `[object Object]` Error (datafast-cli)