Open Computer Use MCP: The Computer Runtime Boundary
Open Computer Use MCP is interesting because it puts a real computer behind the Model Context Protocol instead of pretending that a chat window is enough.
That does not automatically make it safe, production-ready, or the right runtime for Starkslab. It does make it useful source material. The repo exposes a clear shape: a Streamable HTTP MCP server, per-session Docker workspaces, bash/file-edit/read tools, live browser streaming, file output preview, mounted skills, and autonomous sub-agent paths through Claude Code, Codex, or OpenCode.
The operator question is not "can an agent use a computer?" The useful question is narrower:
When an LLM gets browser, terminal, code, files, skills, and sub-agents through MCP, where is the boundary?
Proof state: source-read-only.
What Starkslab read: the Wide-Moat/open-computer-use repo, README, docs/FEATURES.md, docs/MCP.md, docs/COMPARISON.md, docs/SKILLS.md, docs/multi-cli.md, .env.example, docker-compose.yml, Dockerfile, computer-use-server/app.py, computer-use-server/mcp_tools.py, computer-use-server/docker_manager.py, computer-use-server/skill_manager.py, computer-use-server/cli_runtime.py, and LICENSE.
What Starkslab did not run: clone, install, Docker build, docker compose up, hosted MCP endpoint, Open WebUI, Claude Desktop, n8n, LiteLLM, browser streaming, MCP initialize/tool calls, bash execution, file editing, sub-agent dispatch, security testing, benchmark reproduction, or production validation.
What this page can prove: source-visible architecture and operator boundaries.
Blocked claims: this page cannot prove the project is safe, secure, production-ready, isolated, reliable, correctly priced, broadly adopted, benchmark-validated, account-safe, or recommended for unattended workflows.
What this page covers: what the repo exposes, how the server and sandbox boundary work, what the MCP tools expose, why the shared browser matters, where isolation helps, where it does not prove safety, what the license changes for operators, and how Starkslab would inspect this kind of runtime before trusting it.
If you want the adjacent browser-control note, read Browser Harness. If you want the broader computer-use operator-surface note, read UI-TARS Desktop. If you want the runtime substrate comparison, read AI Agent Sandbox Substrates. If you want the control-plane frame around MCP, skills, config, and gates, read What Is a Coding-Agent Control Plane?.
What Is The Runtime?
The project is a self-hosted MCP server that gives an LLM a containerized Ubuntu workspace with terminal commands, file operations, live browser access, document skills, and sub-agent delegation.
The README describes the core product as an MCP server that gives any LLM "its own computer." Source-read-only, the concrete version is this: a FastAPI server exposes a Streamable HTTP MCP endpoint and manages Docker containers for chat sessions. Inside each sandbox, the assistant gets tool access, a working filesystem, mounted uploads and outputs, a browser path, and a skill system.
That makes the page role narrow. This is not an install tutorial. It is not a recommendation. It is not a security audit. It is a support note for the open computer use mcp and computer use mcp queries inside Starkslab's AI Agent Tools cluster.
The search value is obvious: developers are now looking for agent tools that do more than return text. They want computer-use runtimes, browser automation, terminal execution, document creation, and remote workspaces. Open Computer Use gives Starkslab a current source-visible example of that category.
The Build AI Agent lesson is also clear:
LLM client
-> MCP transport
-> computer-use server
-> Docker workspace
-> browser + terminal + files + skills + sub-agent
-> operator review
That is the runtime boundary. The model is not the product. The boundary is the set of tools, containers, credentials, files, browser state, sub-agent paths, and evidence that surround the model.
Where Does The Runtime Put The Boundary?
The runtime puts the main boundary at the server-managed Docker workspace.
The README and feature docs describe a stack where the Computer Use Server manages per-chat containers, serves output files, streams a Chromium browser, and exposes MCP tools. docker-compose.yml shows a computer-use-server service that mounts the Docker socket, a /tmp/computer-use-data host data path, a persistent data volume, and a skills cache. The server listens on port 8081 by default.
The repo's docker_manager.py makes the per-chat container idea concrete. Container names are derived from chat IDs, resource defaults include CONTAINER_MEM_LIMIT=2g and CONTAINER_CPU_LIMIT=1.0, uploads are mounted read-only, outputs are mounted read-write, and public/user skills are mounted into the sandbox. The code also writes a rendered /home/assistant/README.md into each container so the model can recover its environment from inside the workspace.
That is a practical architecture. It is also a reminder that "containerized" is not a magic safety word.
The orchestrator has strong authority because it talks to Docker. The sandbox image includes a non-root assistant user, but that user has passwordless sudo. Network can be toggled through ENABLE_NETWORK, but the default is enabled in the source. The server can run commands, create files, edit files, expose output URLs, proxy a browser, and inject sub-agent credentials according to selected runtime.
An operator should read the boundary like this:
| Layer | Source-visible purpose | Operator question |
|---|---|---|
| MCP endpoint | Accept tool calls over Streamable HTTP | Is auth required and is X-Chat-Id enforced? |
| Computer Use Server | Manage containers, files, browser, and tools | Who can reach the server and Docker socket? |
| Docker workspace | Execute code and hold session files | What gets mounted, persisted, cleaned, and networked? |
| Skills | Add procedures and scripts | Which skill folders are trusted and read-only? |
| Sub-agent | Delegate to coding CLIs | Which auth env vars cross into the sandbox? |
| Browser | Shared Chromium session | What page state or credentials can the agent infer? |
That table is the real value of this source read.
What Does The MCP Tool Surface Actually Expose?
The MCP tool surface is intentionally small.
docs/MCP.md names five tools: bash_tool, view, create_file, str_replace, and sub_agent. mcp_tools.py matches that shape. It registers a FastMCP server named computer-use-mcp, validates chat ID mode, creates or reuses the matching Docker container, and then executes each tool inside that container.
The small tool list is a design choice. Instead of exposing many tiny APIs for every file and process operation, Open Computer Use pushes a lot of power through bash_tool. That is flexible, but it also makes shell scope a primary review surface.
The source-visible command contract has useful details:
curl -sD - -X POST "http://localhost:8081/mcp" \
-H "Authorization: Bearer $MCP_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "X-Chat-Id: my-session" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}'
That command comes from the repo's MCP docs. It shows the useful boundary pieces: bearer auth if configured, JSON-RPC over HTTP, X-Chat-Id as session key, and the MCP initialize handshake.
The chat ID behavior matters. docs/MCP.md documents three modes. With no SINGLE_USER_MODE, requests without X-Chat-Id fall back to a shared default container and receive a warning. With SINGLE_USER_MODE=true, all sessions share one default container. With SINGLE_USER_MODE=false, X-Chat-Id is required and missing headers are rejected.
For production-like use, that one setting is not minor configuration. It decides whether independent sessions get isolated containers or quietly share a workspace.
Why Does The Shared Browser Matter?
The browser path is the most interesting part of this project because it is collaborative rather than screenshot-only.
docs/FEATURES.md describes one Chromium instance inside the sandbox container. The AI uses Playwright/CDP. The user watches through CDP streaming and can interact in the same browser session. The README contrasts this with screenshot-based browser control.
That shape gives the operator something useful: the human can see what the model sees, intervene, type, click, and inspect the same browser state.
It also creates a trust boundary that needs plain language.
The docs say the user can enter sensitive values directly into the browser and the AI does not see raw credentials, only resulting page state. That is useful as a design goal. It is not the same as a security proof. A model that can observe page state after login may still infer private information, trigger account actions, or operate inside a credential-bearing session.
This is where the adjacent Starkslab pages matter. Browser Harness focuses on CDP and helper-code authority. UI-TARS Desktop focuses on local computer/browser operator surfaces. This runtime sits between those pages: it is not just a browser harness, and it is not just a visual operator. It is a whole computer-use workspace with browser, terminal, files, and skills bound together.
The operator rule stays the same:
A shared browser is collaboration infrastructure. It is not proof that account workflows are safe.
Where Do Skills And Sub-Agents Change The Boundary?
Skills and sub-agents are where the project stops being "a terminal in a container" and becomes a computer-use runtime.
The README lists public skills for documents, spreadsheets, presentations, PDFs, browser automation, vision, frontend design, webapp testing, test-driven development, skill creation, GitLab exploration, and sub-agent delegation. docs/SKILLS.md shows these as mounted skill folders with SKILL.md instructions, scripts, templates, and examples. skill_manager.py shows default public skills and read-only mount behavior.
That is useful because the agent gets procedure, not only raw tools. A document task can route into a docx skill. A browser task can route into Playwright guidance. A design task can route into a frontend skill.
It is also authority expansion.
Skills can carry scripts. They can define workflows. They can normalize what the assistant thinks is allowed or expected. If custom or per-user skills are mounted through the settings wrapper, the operator has to treat that skill supply chain as part of the runtime.
Sub-agents widen the boundary again. The README says the sub-agent runtime supports Claude Code by default and can switch to Codex or OpenCode through SUBAGENT_CLI. docs/multi-cli.md describes the three runtime modes and the auth isolation goal: only the selected CLI's relevant auth variables should pass into the sandbox. cli_runtime.py and docker_manager.py make that runtime selector visible in code.
That is the Starkslab link to Agent CLI Control Surfaces: once the runtime can launch another coding agent inside the workspace, the review question becomes recursive. Which model runs? Which credentials are present? Is the result structured? Are retries bounded by SUB_AGENT_TIMEOUT? Can the operator reconstruct what changed?
That is the useful part: the repo makes those questions concrete.
Which Claims Stay Blocked?
The source supports architecture. It does not prove runtime behavior.
This is the blocked-claims table Starkslab would keep in front of any operator evaluating the project:
| Source supports | Still blocked |
|---|---|
| The repo exposes a Streamable HTTP MCP endpoint and five main tools. | The endpoint is secure in a public deployment. |
The docs describe per-chat Docker workspaces and X-Chat-Id isolation modes. |
Docker isolation prevents all cross-session, credential, or network leaks. |
| The image includes browser, terminal, languages, document tooling, and skills. | The 11 GB image is efficient, hardened, or suitable for every deployment. |
| The browser is designed around shared CDP streaming. | Logged-in browser workflows are safe for real accounts. |
| Skills are mounted and injected into prompts. | Skill content is always correct, current, or safe to run. |
| Sub-agent runtime can route to Claude, Codex, or OpenCode. | All three CLIs behave equivalently or safely under the same task. |
| The license permits internal self-hosting under stated terms. | Managed-service or commercial usage is automatically allowed without reading the license. |
The important license note is not legal advice. The LICENSE file is Business Source License 1.1. It allows production use under an additional grant, explicitly permits internal self-hosting, restricts offering the licensed work as a managed or hosted service without a commercial arrangement, and lists a change date of 2029-04-04 to Apache 2.0.
For an operator, that changes the adoption question. "Open source" is not enough. If the plan is internal self-hosting, the source says that path is explicitly permitted. If the plan is to offer a managed service using the substantial features, the source says to handle licensing before shipping.
What Went Wrong In This Evidence Pass?
The evidence pass stayed intentionally narrow.
Starkslab did not run the stack. The local shell used for this article could not resolve GitHub over normal network commands, so the source read used GitHub's repository surfaces and connector-backed file reads rather than a local clone. That means there is no command output from docker compose, no built image digest, no MCP response body from a live server, no browser stream, no skill execution, no sub-agent session, and no hosted endpoint check.
That limitation protects the page from overclaiming.
A runtime validation pass would need a different artifact:
- Pin a commit SHA or release.
- Build the workspace image.
- Start
computer-use-serverand Open WebUI. - Set
MCP_API_KEYandSINGLE_USER_MODE=false. - Run MCP initialize and
tools/list. - Execute one harmless
bash_toolcommand. - Create and read one output file.
- Open the browser stream against a disposable page.
- Confirm cleanup and volume behavior.
- Record logs, commands, outputs, screenshots, and failure states.
Until that exists, the public claim stays source-read-only.
What Would Starkslab Steal From The Runtime?
The useful output is not "use Open Computer Use."
The useful output is the runtime pattern.
Starkslab would steal:
- treating MCP as a transport into a real workspace, not just a tool catalog;
- making
X-Chat-Idand session isolation explicit; - using a small set of powerful tools instead of a giant brittle schema;
- putting file outputs behind stable URLs instead of burying artifacts in chat text;
- giving the human a live browser view of the same state the agent is operating on;
- mounting skills as reviewable procedure packages;
- keeping sub-agent runtime choice explicit through
SUBAGENT_CLI; - separating browser-facing
PUBLIC_BASE_URLfrom internal Docker DNS; - warning loudly when the MCP API key is empty;
- documenting license boundaries where managed-service use changes the decision.
Those ideas support Starkslab's own stack because they are about control surfaces. OpenClaw, Symphony, Codex, Claude Code, Browser Harness, UI-TARS Desktop, and Open Computer Use all point at the same doctrine: agent capability only becomes useful when it is bounded, visible, reviewable, and routed into artifacts.
How Should Operators Inspect The Runtime?
Before trusting the project with real work, inspect the runtime boundary instead of the demo.
Use this checklist:
- Is
MCP_API_KEYset for every non-local deployment? - Is
SINGLE_USER_MODE=falseused when multiple users or sessions exist? - Does every client pass a unique
X-Chat-Id? - What host paths and Docker volumes are mounted into the server and sandbox?
- Is the Docker socket exposure acceptable for this deployment?
- Is network access enabled or intentionally disabled?
- Which skills are mounted, and can custom skills enter the system?
- Which CLI does
SUBAGENT_CLIselect? - Which auth env vars reach the sandbox for that CLI?
- Can the operator view browser, terminal, files, and sub-agent state without guessing?
- Are outputs downloadable and attributable to the right session?
- What cleanup removes stale containers, volumes, and data?
- What is the license position for the intended use?
- What evidence would be enough to move from source interest to operational use?
That is how this page should help a reader. It does not say "install this." It gives the reader the control questions that matter before an LLM gets a computer.
Where This Fits In The Starkslab Stack
This article belongs in the AI Agent Tools cluster as a named tool teardown for computer-use runtime intent.
It supports the Build AI Agent cluster because builders need to understand the runtime substrate behind browser, terminal, code, files, and sub-agent workflows. A serious agent is not only a loop around model calls. It needs a computer boundary.
It supports the OpenClaw cluster by contrast. OpenClaw and Symphony are operator-control systems around sessions, memory, work queues, review gates, and delivery paths. This project is a computer-use runtime surface. The useful overlap is not branding. It is the same operator rule: make the authority boundary explicit before asking an agent to act.
The next clicks should stay clean:
- for browser-specific control surfaces, read Browser Harness;
- for visual computer-use surfaces, read UI-TARS Desktop;
- for execution substrates, read AI Agent Sandbox Substrates;
- for skills, MCP, config, and gates, read What Is a Coding-Agent Control Plane?;
- for broader agent CLI comparison, read Agent CLI Control Surfaces.
This is useful Starkslab material because it makes a hidden category visible. The article-worthy lesson is not that every LLM should get a computer. The lesson is that once an LLM gets one, the operator has to inspect the transport, workspace, browser, files, skills, sub-agents, credentials, and evidence trail as one runtime boundary.