Field NotePrinciple in Practice

Mar 12, 2026

Claude Agent SDK Workspace: An Open-Source Multi-Workspace Agent Dev Environment (FastAPI + Vite)

A deep, command-level teardown of claudeagentsdk (#005): an open-source agent workspace built around the Anthropic Agent SDK, with a FastAPI backend, a Vite/React frontend, and an optional Vercel Sandbox runner for async, reproducible runs.

AI Agent ToolsOpenClawBuild AI AgentView Related Drop

claudeagentsdk (#005) is now public: https://github.com/fedewedreamlabsio/claudeagentsdk.

This note explains what it is, why it exists, and how it actually works (backend + frontend + runner), including the incident history we accumulated while making it safe to ship as an open-source agent workspace.


What it is (and the exact problem it solves)

Most “agent UIs” are thin chat shells.

They typically fail at the moment you ask for the one thing that makes agents useful for real work:

  • a persistent working directory,
  • repeatable tool execution,
  • per-project configuration,
  • and a clean boundary between “my files” and “your files.”

When you put an agent behind a web app, you immediately run into hard questions that chat wrappers dodge:

  • Where do the files live?
  • What is the current working directory (cwd) for tool calls?
  • Who owns a workspace?
  • How do you avoid leaking API keys across tenants?
  • How do you stream tool events and persist transcripts?
  • How do you run jobs asynchronously without leaving an SSH session open?

Claude Agent Workspace (this repo) is our answer.

It is an open-source, multi-workspace agent dev environment built around the Anthropic Agent SDK (anthropic agent sdk) and Claude Code’s tool model. It combines:

  • a FastAPI backend that owns identity, workspaces, storage, and audit logs,
  • a React + Vite frontend that renders the workspace UI,
  • and an optional “runner” service that executes agent runs in a sandbox and syncs workspace snapshots back to the backend.

In Starkslab terms: it is an implementation artifact that turns the idea of a “project-aware agent” into deployable infrastructure.

I’m going to use the phrase claude agent sdk in this note to refer to the system as shipped here (not as a claim of affiliation).


Why it exists (beyond “another web UI”)

We built this because we wanted a specific workflow:

  1. A user creates a workspace.
  2. The workspace is a real directory on disk.
  3. The agent operates inside that directory.
  4. The user can upload files, browse, and download artifacts.
  5. The agent can run tool-enabled sessions where edits are visible and diffable.
  6. The whole thing can be hosted: authentication, access control, and BYOK.

The missing piece in many agent stacks is the “project membrane”: a boundary that makes it obvious what the agent can touch.

This repo makes that boundary explicit:

  • A Workspace is a row in a database and a directory path (/data/workspaces/<workspace_uuid> by default).
  • Sessions, messages, tool events, schedules, and async runs are all scoped to that workspace.

If you already live in a CLI-first world, the motivation is even simpler:

  • Claude Code is powerful, but it’s local.
  • Teams want “Claude Code, but shared, with a UI, and without a single global API key.”

That’s the core of this claude agent sdk drop: making an “agent workspace” a first-class deployable.


Architecture overview (backend / frontend / runner)

At a high level, the architecture is three cooperating systems with clear contracts.

1) Backend (FastAPI): the control plane + storage

Location: backend/src/app.py (FastAPI app) and backend/src/claudeagentsdk/* (domain modules).

The backend owns:

  • identity (via Clerk verification),
  • workspace lifecycle (create/list/delete),
  • workspace file storage (uploaded files + agent-created files),
  • workspace-scoped API keys (encrypted at rest),
  • WebSocket streaming for “live” sessions,
  • runs (async job records + event logs),
  • snapshots (zip in/out for runner mode),
  • schedules (cron-like jobs tied to a workspace).

The backend is intentionally more than a proxy to the model. It is the part that makes auditing and boundary enforcement possible.

Key backend concepts

  • Workspace: has id, owner_id, and a directory_path on disk.
  • WorkspaceSession: a chat-like session with stored Message rows and ToolEvent rows.
  • WorkspaceAPIKey: per-workspace provider key, encrypted using a backend FERNET_KEY.
  • AgentRun + RunEvent: the async execution model used by the runner.

The code uses SQLModel models (backend/src/claudeagentsdk/models.py) so it works on SQLite locally and PostgreSQL in production.

Two execution paths

The backend supports two “ways to run an agent”:

  1. Live WebSocket sessions

    • Endpoint: GET ws /ws/{workspace_id}
    • The backend hosts the Claude Code runtime (via claude_agent_sdk + CLI).
    • Tool events are streamed and persisted.
  2. Async runs

    • REST endpoints create an AgentRun record.
    • A runner process is notified (or polled) and executes the run in a sandbox.
    • The runner posts events and final status back.

This dual-path design is not accidental:

  • Live mode is best for interactive, human-in-the-loop work.
  • Runner mode is best for long tasks, scheduled tasks, and reproducible execution.

2) Frontend (React + Vite): the operator console

Location: frontend/.

The frontend is a pragmatic UI around the backend contracts:

  • It uses Clerk for auth (@clerk/clerk-react).
  • It calls backend REST endpoints via Axios.
  • It opens a WebSocket per workspace via buildWebSocketUrl(workspaceId, token, sessionId).

It exposes a set of operator primitives that matter for agent development:

  • workspace list + selection,
  • session list + session detail (messages + tool events),
  • file upload/download/content preview,
  • workspace settings (model, thinking budget, system prompt mode),
  • API key management (BYOK),
  • schedules and async run status.

From a product standpoint, the frontend is not “the app.”

The backend and runner are where the durable constraints live. The UI is how you touch them.

3) Runner (Vercel Sandbox): reproducible async execution

Location: runner/.

The runner is an optional component that makes “agent runs” operate like jobs:

  • A run is created in the backend.
  • The backend dispatches the run to the runner (HTTP) if configured.
  • The runner creates (or reuses) a Vercel sandbox and hydrates it with the workspace snapshot.
  • The runner runs the agent using the Node @anthropic-ai/claude-agent-sdk package.
  • The runner streams text/tool events back to the backend and finally uploads an updated workspace snapshot.

Important properties of this runner model:

  • Sandbox reuse: a recent sandbox_id can be reused for a window (~3h) to avoid cold-start installs.
  • Snapshot-in / snapshot-out: the “workspace” is treated as a zip artifact. This is the simplest durable contract between a control plane and an execution plane.
  • Backend callback preflight: the runner explicitly checks it can call back to /api/runs/<id>/events before detaching, because a run that cannot report progress is indistinguishable from a hung run.

This is the part of the system that pushes it from “interactive chat” to “agent infrastructure.”

And yes: this is still claude agent sdk in the sense that it is built around the exact tool semantics of Claude Code.


Deep dive: how the backend drives Claude Code sessions

The core of live mode is in backend/src/app.py under:

  • @app.websocket("/ws/{workspace_id}")

Authentication model

  • The frontend gets a Clerk token.
  • The WebSocket sends it either:
    • as ?token=... query param, or
    • as Authorization: Bearer ... header.
  • The backend calls verify_clerk_token(token) and maps that to a User row.

This matters operationally because WebSockets are where most “pretty agent demos” die:

  • browser clients can’t easily attach custom headers in all environments,
  • proxies strip headers,
  • preview domains change,
  • and CORS/origin behavior becomes a release blocker.

So the backend supports both approaches.

Workspace boundary enforcement

Immediately after auth, the backend:

  • loads the Workspace by UUID,
  • checks workspace.owner_id == user.id,
  • resolves workspace BYOK env or refuses the connection (if BYOK_REQUIRED=true).

That boundary check is the central safety contract of the whole system.

BYOK environment injection

The backend stores only encrypted API keys; at runtime it resolves a workspace’s provider env:

  • provider = anthropic → sets ANTHROPIC_API_KEY=...
  • provider = openrouter → sets an OpenRouter-compatible auth token and base URL

Then it optionally sets:

  • ANTHROPIC_MODEL from workspace.model
  • MAX_THINKING_TOKENS from workspace.thinking_budget

The point: the execution environment is fully workspace-scoped.

HOME isolation (why it exists)

In live mode, the backend sets HOME to a workspace-scoped directory (via get_workspace_cli_home(workspace)).

This matters because Claude Code stores per-directory state in user home.

If you do not isolate HOME per workspace, you get accidental cross-workspace coupling:

  • transcripts bleed,
  • tool preferences leak,
  • and “continue conversation” becomes unpredictable.

Tool allowlisting and MCP

Claude Code enforces an explicit tool allowlist.

The backend builds a list of built-in tool names (Read/Edit/Write/Bash/etc.) and then optionally expands it with MCP tool names.

How MCP is handled in live mode:

  • The workspace directory may contain .mcp.json.
  • The backend loads it, normalizes format (mcpServers wrapper supported), and sanitizes server names.
  • Secrets inside .mcp.json can reference user secrets like ${secret:foo}.
  • The backend resolves those placeholders using encrypted secrets.
  • For HTTP-based MCP servers, the backend can pre-fetch tool names (tools/list) to populate the allowlist.

Why pre-fetching matters: if the tool name is not on the allowlist, Claude Code will not call it, regardless of what the model “wants.”

This is one of the most non-obvious integration constraints when building an agent workspace UI.


Deep dive: the runner contract (snapshot-in / events / snapshot-out)

Runner mode exists because “keeping Claude Code alive on the backend” is not always desirable.

You often want:

  • async execution,
  • time-bounded sandboxes,
  • aggressive cleanup,
  • and a clear artifact boundary.

The runner is implemented as a single HTTP handler (runner/api/run.js).

The runner’s trust model

The runner has two tokens:

  • RUNNER_TOKEN: authenticates the runner to the backend’s internal endpoints.
  • RUNNER_INVOKE_TOKEN: authenticates the backend (or dispatcher) to the runner.

This split matters because:

  • you can allow only your backend to invoke runs,
  • while allowing the runner to access internal endpoints without exposing them publicly.

The run lifecycle

In simplified form:

  1. Runner receives { runId }.
  2. Runner fetches run metadata from backend: /api/runs/{runId}/internal.
  3. Runner creates or reuses a Vercel sandbox.
  4. Runner downloads a workspace snapshot zip:
    • /api/workspaces/{workspaceId}/snapshot/internal
  5. Runner writes zip into sandbox and extracts it.
  6. Runner fetches env vars for the run: /api/runs/{runId}/env/internal.
  7. Runner runs the agent with query() from @anthropic-ai/claude-agent-sdk.
  8. Runner posts events (text, tool events, errors) back to /api/runs/{runId}/events.
  9. Runner uploads a new snapshot zip back to the backend.
  10. Runner sets run status: completed/error.

The runner uses detached sandbox commands so runs can outlive the HTTP request.

Practical guardrails implemented in the runner

Some of the safeguards in the runner are there because we learned the hard way that agent tooling will try to escape its boundaries unless you build explicit constraints.

Examples you can see in run-agent.mjs (generated inside the sandbox):

  • Schedule endpoint workspace mismatch guard

    • It inspects Bash tool input for calls to schedule endpoints.
    • If it finds a workspace UUID mismatch, it blocks the tool call.
  • OpenRouter + MCP compatibility

    • When OpenRouter is detected, MCP servers are disabled to avoid tool schema incompatibilities.
  • Auto-updater disabling

    • Sets DISABLE_AUTOUPDATER=true.
    • Prevents the CLI from mutating itself during runs.

These are the kinds of “boring constraints” that turn “agent demo” into “agent system.”

This is the third time I’m calling it claude agent sdk because it’s the recurring theme: the integration is mostly about tool execution contracts and boundaries, not about prompt phrasing.


Local development quickstart (backend + frontend + runner)

Below is a pragmatic local setup that mirrors how the repo is actually structured.

Prerequisites

  • Python 3.12+
  • Poetry
  • Node.js 18+
  • Claude Code CLI (for live mode): @anthropic-ai/claude-code

Clone:

git clone https://github.com/fedewedreamlabsio/claudeagentsdk
cd claudeagentsdk

Backend (FastAPI)

Install deps:

cd backend
poetry install

Create a local backend/.env (do not commit it). Minimum viable settings:

# backend/.env
DATABASE_URL=sqlite:///./app.db
JWT_SECRET_KEY=<generate-a-random-secret>
FERNET_KEY=<generate-a-base64-32-byte-key>
CLAUDE_CLI_PATH=$(which claude)
WORKSPACE_ROOT=<absolute-path-to-a-workspaces-dir>
FRONTEND_URL=http://localhost:5173
BYOK_REQUIRED=true

# Optional: Clerk (recommended if you want real auth)
CLERK_SECRET_KEY=<your-clerk-secret>
CLERK_TEST_MODE=true

Generate secrets (examples):

# JWT secret
python -c "import secrets; print(secrets.token_urlsafe(32))"

# Fernet key (base64 urlsafe, 32 bytes)
python -c "import os,base64; print(base64.urlsafe_b64encode(os.urandom(32)).decode())"

Run the API:

poetry run uvicorn src.app:app --host 0.0.0.0 --port 8000 --reload

Sanity check:

Frontend (Vite)

cd ../frontend
npm install
npm run dev

Set frontend env vars (for local development):

# frontend/.env.local
VITE_API_URL=http://localhost:8000
VITE_WS_URL=ws://localhost:8000

Open the UI:

Runner (optional, for async runs)

The runner is designed to run in Vercel’s sandbox environment. Locally, you can still install dependencies and run tests, but you won’t get an identical sandbox.

cd ../runner
npm install
npm test

If you deploy the runner (see deployment section), you’ll connect it to the backend using:

  • BACKEND_URL (origin)
  • RUNNER_TOKEN (runner → backend internal auth)
  • RUNNER_INVOKE_TOKEN (backend → runner auth)

Deployment overview (Railway backend + Vercel frontend) — placeholders only

This section is intentionally template-like.

If you are deploying this, treat it as a checklist and fill in your own values.

Backend on Railway (FastAPI)

Storage

  • Add a Railway volume mounted at something like /data/workspaces.
  • Set WORKSPACE_ROOT=/data/workspaces.

Database

  • Use Railway Postgres.
  • Let Railway inject DATABASE_URL.

Environment variables (examples; fill your values)

JWT_SECRET_KEY=<random>
FERNET_KEY=<base64-32-byte-key>
WORKSPACE_ROOT=/data/workspaces
CLAUDE_CLI_PATH=/app/node_modules/.bin/claude
FRONTEND_URL=<https://your-frontend.vercel.app>
BYOK_REQUIRED=true

# Optional
EXTRA_CORS_ORIGINS=<comma-separated-origins>
CLERK_SECRET_KEY=<clerk-secret>
CLERK_TEST_MODE=false

# Runner wiring (optional)
RUNNER_TOKEN=<random>
RUNNER_INVOKE_TOKEN=<random>
RUNNER_DISPATCH_URL=<https://your-runner.vercel.app/api/run>

Build / install

  • Railway uses Nixpacks in this repo and can install the Claude Code CLI during build.

Health check

  • GET /health should return {"status":"healthy"...}.

Frontend on Vercel (Vite)

Environment variables

VITE_API_URL=<https://your-backend.railway.app>
VITE_WS_URL=<wss://your-backend.railway.app>

Deploy as a standard Vite SPA.

Runner on Vercel (optional)

Deploy runner/ as a serverless function project and configure:

BACKEND_URL=<https://your-backend.railway.app>
RUNNER_TOKEN=<same-as-backend>
RUNNER_INVOKE_TOKEN=<same-as-backend>
SANDBOX_SNAPSHOT_ID=<optional-snapshot-id>
RUNNER_HEARTBEAT_MS=30000

Security + BYOK (what we do, what we do not do)

The security model is explicit: this is BYOK.

  • Users provide provider keys per workspace.
  • The backend never ships a shared “server key” by default.
  • The backend encrypts stored keys using FERNET_KEY.

What is encrypted

  • Workspace API keys (WorkspaceAPIKey.encrypted_key).
  • User secrets (for MCP placeholder resolution).

Encryption is symmetric and managed by the backend process. That means your FERNET_KEY is production-critical; rotate it like you would rotate any database credential.

What is scoped

  • API keys are workspace-scoped.
  • Files are workspace directory-scoped.
  • WebSocket sessions are workspace ownership-scoped.

What is deliberately not in scope (yet)

This repo is an infrastructure baseline, not a compliance product.

If you need:

  • fine-grained RBAC within a workspace,
  • org-level policies,
  • audit exports,
  • enterprise SSO,

…you build those on top of these primitives.

Practical advice

  • Treat .env as untracked local state.
  • Never paste real keys in docs.
  • Do not log key prefixes.
  • Use separate runner tokens for each environment.

If you are evaluating this as an ai developer tools component, the key question is: does the boundary model match your threat model? For many internal team deployments, “workspace-scoped BYOK + directory isolation + encrypted DB fields” is already a major upgrade over “everyone shares a single key in a Slack bot.”

This is the fourth appearance of claude agent sdk in this note because BYOK is one of the main reasons the system exists.


What broke / incidents (concrete failures and what we changed)

We keep incident history because agent infrastructure fails in ways that are easy to repeat.

Here are three real categories of breakage we hit while turning this into an open-source drop.

Incident 1: “secrets in docs” scrub (and the accidental key leak pattern)

Failure mode:

  • Early deployment docs contained example ANTHROPIC_API_KEY=sk-ant-... strings.
  • Even when fake, these strings cause two problems:
    1. automated secret scanners flag the repo (noise),
    2. readers cargo-cult copy/paste patterns that normalize unsafe behavior.

A related failure: the development backend/run.py printed the first N characters of ANTHROPIC_API_KEY (“starts with ...”). That’s still a leak: log aggregation is where secrets go to die.

What we changed:

  • Replaced examples with <placeholders>.
  • Pushed the policy into SECURITY.md.
  • Stopped treating “key prefixes are fine” as acceptable.

Takeaway:

  • Open-sourcing agent infrastructure is not just licensing.
  • You must scrub the documentation layer because it is the most copied part of the project.

Incident 2: Auth + protected previews + E2E flakiness (WebSockets are the sharp edge)

Failure mode:

  • Clerk-based auth works in normal environments, but preview deployments add complexity:
    • dynamic Vercel preview domains,
    • mixed http/ws vs https/wss,
    • inconsistent header support in browsers,
    • and origin checks that aren’t obvious.

Symptoms:

  • WebSocket connections failing with policy violations.
  • CORS failures in the REST layer.
  • E2E tests that pass locally but fail in preview because the origin changed.

What we changed:

  • Added wildcard-capable origin configuration and normalization.
  • Allowed token delivery via query param (not only Authorization header).
  • Centralized websocket URL construction in the frontend.

Takeaway:

  • If you ship an agent workspace, your auth story must include WebSockets and previews, not just “login works.”

Incident 3: CLI + sandbox constraints (tooling behaves differently when it’s not your laptop)

Failure mode:

  • Claude Code is designed for local use; moving it into:
    • a hosted backend,
    • and/or a sandboxed runner, introduces constraints that the CLI does not hide.

Concrete issues we hit:

  • The sandbox does not behave like a full VM; egress can be restricted.
  • Cold starts require bootstrapping dependencies (npm install, installing @anthropic-ai/claude-code).
  • “Continue conversation” can break when transcript state is stale or the container was restarted.
  • Tool allowlisting means MCP tools silently disappear unless pre-fetched.

What we changed:

  • Added sandbox reuse + optional snapshot boot.
  • Added a backend callback preflight so “silent stuck runs” become visible.
  • Disabled auto-updaters and made HOME explicit.
  • Added guardrails around cross-workspace scheduling endpoints.

Takeaway:

  • The hardest part of agent infrastructure is not the model call.
  • It’s the environment contract: filesystem, network, tool permissions, and reproducibility.

This is the fifth explicit mention of claude agent sdk because the recurring lesson is always the same: tool execution is where reality shows up.


How this fits the Starkslab flywheel

Starkslab’s loop is not “ship prompts.” It is “ship artifacts that make execution cheaper.”

This repo fits the AI Agent Tools cluster as a concrete bridge between:

  • agent orchestration (control plane),
  • agent execution (CLI + sandbox),
  • and operator UX (workspace UI).

For the broader operating context, these notes are the adjacent pillars:

Where this project lands in the flywheel:

  1. Inbox → scoped workspace

    • An input becomes a workspace with files and a boundary.
  2. Workspace → tool execution

    • Claude Code runs with a real cwd and persisted artifacts.
  3. Execution → deployable interface

    • FastAPI + Vite turns a local tool into a shared system.
  4. System → incident history

    • Every break becomes a guardrail (tokens, snapshots, allowlists).

This is the sixth mention of claude agent sdk because Starkslab’s interest is not “yet another agent demo.” It is a repeatable agent workspace primitive we can reuse.


Extensions worth building (if you adopt this)

If you deploy this internally and it sticks, the next improvements are not speculative. They are the predictable second-order requirements.

Multi-user workspaces

Right now, a workspace is owned by one user.

The natural evolution:

  • workspace membership table,
  • roles (owner/editor/viewer),
  • and per-role tool permissions.

Storage backends

The runner contract is snapshot-based.

That is intentionally simple, but for larger teams you will likely want:

  • object storage for snapshots (S3-like),
  • deduplication,
  • and incremental sync.

Cost and rate controls

The DB already tracks session cost (total_cost_usd) and run cost.

Next:

  • per-workspace budgets,
  • enforced rate limits,
  • and cost attribution.

Policy around MCP

MCP is a power tool.

The missing layer is policy:

  • allowlisted MCP server domains,
  • secret injection rules,
  • and per-workspace tool allowlists.

Closing

Open-sourcing this was not about “showing code.”

It was about publishing a reference implementation of an agent workspace that:

  • enforces boundaries,
  • supports BYOK,
  • provides both live and async execution paths,
  • and documents the incident history that made it shippable.

If you want to build on it, start by reading the repo’s top-level docs and then go straight to:

  • backend/src/app.py (WebSocket live mode),
  • backend/src/claudeagentsdk/routers/workspaces.py (workspace boundary + snapshots),
  • runner/api/run.js (sandbox run loop),
  • frontend/src/api/client.js (contracts + ws URL builder).

That is the actual spine of this claude agent sdk drop.

Back to NotesUnlock the Vault