How to Build AI Agent Tools: A Revenue Data... | Note

How to Build AI Agent Tools: A Revenue Data CLI from Scratch

When I say I build AI agent tools, I don't mean chatbots or prompt chains. I mean CLIs — narrow, composable, piped-into-scripts command-line tools that an agent can call with --json and actually parse. The kind of tool that turns "go research the SaaS market" from a vague directive into a concrete, automatable workflow.

This is the story of trustmrr-cli, Drop #004 in the Starkslab flywheel. It gives my AI agent direct access to verified revenue data — MRR, growth rates, churn, customer counts — for 4,900+ startups across 82 countries. Not scraped. Not estimated. Real numbers pulled from Stripe, LemonSqueezy, and Polar integrations that update hourly.

Here's how I built it, what broke along the way, and why this pattern of building developer tools for agents is the thing most AI builders are sleeping on.

Why Do AI Agents Need Their Own CLI Tools?

Most people building agents focus on the brain — better prompts, better models, fancier chains. But an agent is only as useful as the data it can touch. Give GPT-5 a blank terminal and ask it to "analyze the SaaS competitive landscape." It'll hallucinate a beautiful report full of made-up numbers.

Give it trustmrr trending 3 --json and it gets this:

[
  {
    "name": "ScreenshotOne",
    "mrr": 1850,
    "revenue30d": 2104,
    "momGrowth": 13.7,
    "customers": 342,
    "churnRate": 2.1
  },
  {
    "name": "Plausible",
    "mrr": 84200,
    "revenue30d": 91450,
    "momGrowth": 8.9,
    "customers": 12840,
    "churnRate": 1.4
  },
  {
    "name": "Dub",
    "mrr": 31600,
    "revenue30d": 34200,
    "momGrowth": 11.2,
    "customers": 2890,
    "churnRate": 1.8
  }
]

Real numbers. Structured output. No hallucination possible. This is what I mean by AI agent tools — not another framework, not another wrapper around an LLM. A scalpel-sharp CLI that does one thing and does it well, with a --json flag so agents can consume it programmatically.

This follows the same pattern we used when building the datafast-cli. That tool handles analytics. This one handles revenue intelligence. Together, an agent can correlate "our traffic spiked" with "competitors in this space are growing 12% MoM" — and surface insights no dashboard would ever show you.

What Does the TrustMRR API Actually Expose?

The data source is TrustMRR — same ecosystem as DataFa.st (same founder). The API lives at https://trustmrr.com/api/v1, uses Bearer token auth, and is rate-limited to 20 requests per minute.

Here's what each startup record contains:

MRR — Monthly Recurring Revenue, current
Revenue (30d) — Total revenue over the last 30 days
Total revenue — Lifetime
MoM growth % — Month-over-month revenue growth
Customer count — Active paying customers
Active subscriptions — Current active subs
Churn rate — Monthly churn percentage

This data updates hourly from the startup's connected payment processor (Stripe, LemonSqueezy, or Polar). It's not self-reported. It's not estimated. It's pulled directly from the source of truth.

The API docs list several query parameters — sort, category, country, search — but here's the first lesson: most of them are silently ignored server-side. The API accepts the parameters without error, but returns the same default-sorted results regardless. Only onSale and pagination (page, per_page) actually work.

This is incredibly common with early-stage indie SaaS APIs. The docs describe the intended behavior, not the implemented behavior. If you're building ai developer tools that consume third-party APIs, always verify what the API actually does, not what the docs say it does.

How Did We Build Client-Side Filtering to Work Around API Limitations?

Since server-side filtering was broken, we built it into the client. The approach:

Fetch the top 200 startups — 4 pages at 50 per page
Sort locally — by MRR, growth, customers, revenue, or churn
Filter locally — by category, country, or search term
Return the requested slice — top 10, trending 5, etc.

async fetchStartups(options: FetchOptions = {}): Promise<Startup[]> {
  const allStartups: Startup[] = [];
  const pagesToFetch = 4;
  
  for (let page = 1; page <= pagesToFetch; page++) {
    const response = await this.fetchPage(page, 50);
    allStartups.push(...response.data);
  }

  let filtered = allStartups;

  if (options.category) {
    filtered = filtered.filter(s => 
      s.category?.toLowerCase() === options.category!.toLowerCase()
    );
  }

  if (options.country) {
    filtered = filtered.filter(s => 
      s.country?.toLowerCase() === options.country!.toLowerCase()
    );
  }

  if (options.search) {
    const term = options.search.toLowerCase();
    filtered = filtered.filter(s => 
      s.name.toLowerCase().includes(term) ||
      s.description?.toLowerCase().includes(term)
    );
  }

  return this.sortStartups(filtered, options.sort || 'mrr');
}

Is this elegant? No. Does it work reliably? Yes. And it means the CLI delivers accurate results today, without waiting for the API to catch up. When those server-side params start working, we flip a flag and skip the local processing.

How to Handle API Rate Limits in AI Agent Tools

Rate limiting is the silent killer of agent-driven tools. An agent doesn't know it's being rate-limited — it just sees a failure and retries, potentially hammering the API harder. Our client handles this properly:

private async fetchWithRetry(url: string, options: RequestInit, retries = 2): Promise<Response> {
  const response = await fetch(url, options);
  
  if (response.status === 429 && retries > 0) {
    const resetHeader = response.headers.get('X-RateLimit-Reset');
    const waitSeconds = resetHeader ? parseInt(resetHeader) : 30;
    console.error(`Rate limited, waiting ${waitSeconds}s...`);
    await sleep(waitSeconds * 1000);
    return this.fetchWithRetry(url, options, retries - 1);
  }
  
  return response;
}

Key decisions:

Read X-RateLimit-Reset header — wait exactly as long as the API asks, not an arbitrary backoff
Max 2 retries — if we're still rate-limited after 2 waits, something is structurally wrong; fail loud
Log to stderr — the agent sees "Rate limited, waiting 12s..." in stderr, not in the JSON output. Clean separation of data and diagnostics.

At 20 requests/minute, a full 4-page fetch takes 4 requests — well within limits for normal usage. The retry logic is insurance for when an agent runs multiple commands in rapid succession during a morning briefing.

What Are the 9 CLI Commands and How Do Agents Use Them?

The CLI exposes 9 commands, built with Commander.js:

Command	What It Does	Agent Use Case
`top [n]`	Top N startups by MRR	"Show me the biggest players"
`trending [n]`	Top N by MoM growth %	Morning briefing: who's growing fastest
`search <term>`	Full-text search across names/descriptions	"Find startups doing X"
`startup <name>`	Detailed view of one startup	Deep dive on a specific competitor
`compare <a> <b>`	Side-by-side comparison	"How does X stack up against Y?"
`category <name>`	Filter by category	"What's happening in analytics?"
`categories`	List all categories	Discovery: "What categories exist?"
`countries`	List all countries	"Where are the fastest-growing startups?"
`acquisitions`	List acquired startups	"Who got acquired and for how much?"

Every single command accepts --json. This is non-negotiable for AI agent tools. Without structured output, the agent has to parse human-formatted tables — fragile, slow, and error-prone.

The morning briefing is the battle-test that validates the whole design. Every day, an agent built with MAF runs:

trustmrr trending 3 --json

This shows which startups in our competitive space are growing fastest. The agent can then correlate this with our own analytics from datafast-cli, surface anomalies ("competitor X grew 15% this month while we grew 3%"), and flag items for human review.

What Tech Stack Powers the CLI and Why?

TypeScript — type safety for API responses. When a field is missing or the schema changes, we know at compile time, not when the agent gets undefined in production.
Commander.js — battle-tested CLI framework. Handles argument parsing, help generation, subcommands. Not reinventing wheels.
chalk — colored terminal output for human-readable mode. Agents use --json and never see the colors, but humans running it interactively get a clean, readable experience.
Native fetch (Node 18+) — no axios, no node-fetch. One less dependency. The runtime provides what we need.

Authentication follows a priority chain: check TRUSTMRR_API_KEY environment variable first, fall back to macOS Keychain. This means agents running in CI/CD or automated pipelines use env vars, while a human running it on their Mac gets seamless keychain integration. No config files to manage.

The competitive landscape for TrustMRR integrations is thin: there's a MCP server (via lobehub) and an Apify scraper. No CLI exists — or didn't, until now. That's a gap worth filling. MCP servers require a running agent framework. Scrapers are fragile. A CLI is universal: it works in a shell script, a cron job, a pipe, or an agent's tool call.

What Went Wrong and What Did We Learn Building This?

The silent parameter problem. I spent an hour debugging why ?sort=growth returned the same results as ?sort=mrr. Wrote tests, checked encoding, triple-verified the parameter names against the docs. The API just... ignores them. The docs describe the aspiration, not the reality. Lesson: always verify API behavior empirically, especially with indie SaaS products where the API might be a side project within the side project.

Test architecture saved us. 23 tests passing across client, formatters, and CLI routing — all using mocked fetch for speed. When we pivoted to client-side filtering, the test suite caught three edge cases in the sorting logic (null MRR values, ties in growth percentage, startups with zero customers). Without tests, the agent would have gotten subtly wrong rankings and nobody would have noticed until the data didn't match reality.

$ npx vitest run
 ✓ client.test.ts (8 tests) 12ms
 ✓ formatters.test.ts (7 tests) 4ms
 ✓ cli.test.ts (8 tests) 18ms

 Test Files  3 passed (3)
      Tests  23 passed (23)
   Start at  14:32:11
   Duration  284ms

The --json flag wasn't an afterthought — it was the design constraint. Every formatter function exists in two modes: human-pretty and machine-parseable. This doubles the surface area of the code, but it's the entire point. If you're building AI developer tools and you don't have --json on every command, you're building for humans and hoping agents can squint hard enough. They can't.

How Does This Fit Into the Agent Tool Ecosystem?

This is Drop #004 in a deliberate pattern. Each drop adds a new data source to the agent's toolkit:

datafast-cli — analytics (traffic, pages, referrers)
trustmrr-cli — revenue intelligence (MRR, growth, churn)
Next — social signals, SEO rankings, more

The agent doesn't need a monolithic "business intelligence platform." It needs small, sharp tools it can compose. trustmrr trending 5 --json | jq '.[] | .name' piped into datafast top --period 7d is more powerful than any dashboard, because the agent can build the query dynamically based on context.

This connects to the broader infrastructure we explored in the OpenClaw source code teardown. OpenClaw provides the agent runtime — the heartbeat loop, the tool execution, the session management. But the runtime is only as useful as the tools plugged into it. Each CLI drop extends what the agent can actually do in the world.

If you're building AI agent tools, here's the playbook: pick a data source your agent needs, wrap it in a CLI with --json output, add rate limit handling and auth, write tests, and ship it. Don't build a platform. Build a Unix tool. Your agent will compose it into whatever workflow it needs.

The repo is open source. The pattern is replicable. Go build the next one.