Multi-agent coding tools fail when every agent has the same broad permission set and a vague job title. Claude Octopus is interesting because it does something more concrete: it turns specialized Claude Code instances into separate MCP tools, each with its own model, allowed tools, prompt, budget, and timeline.

That makes this seed worth saving, but only under a stricter frame. The article should not promise autonomous teams. It should give a Claude Code or Codex-adjacent user a first rollout they can actually run: one read-only reviewer, one small diff, one timeline check, one report, then a decision about whether the second agent is justified.

Why this seed is worth saving

The README gives the useful premise. Sometimes one Claude Code instance is too broad: a strict reviewer should read, grep, and report; a test writer should focus on edge cases; a quick helper can use a cheaper model and a tiny budget. Claude Octopus lets those roles appear as named MCP tools. The value is not 'many agents.' The value is that each role can have a different boundary.

Start with one read-only reviewer

The README documents npx claude-octopus init, MCP client setup, and examples for multiple role-specific agents. The first adoption path should not be a coordinator. Run npx claude-octopus init --template solo-agent, or add one manual .mcp.json entry for a reviewer. Keep the role boring and observable before it becomes part of a larger workflow.

{
  "mcpServers": {
    "octopus-reviewer": {
      "command": "npx",
      "args": ["claude-octopus@latest"],
      "env": {
        "CLAUDE_TOOL_NAME": "code_reviewer",
        "CLAUDE_SERVER_NAME": "octopus-reviewer",
        "CLAUDE_DESCRIPTION": "Read-only reviewer for small diffs",
        "CLAUDE_ALLOWED_TOOLS": "Read,Grep,Glob",
        "CLAUDE_PERMISSION_MODE": "default",
        "CLAUDE_MAX_BUDGET_USD": "0.25"
      }
    }
  }
}

Then call the MCP tool from the client against a small diff. The prompt should be narrow: review the last commit, cite exact files, separate correctness bugs from style preferences, and do not edit files. The first success criterion is not brilliance; it is that the reviewer stays read-only and leaves evidence.

Use tool restrictions as the contract

The source code matters here. The query tool schema says invocation-level tools intersects with the server-level restriction, and disallowedTools unions with the server blacklist. In plain terms, a host can make an agent stricter for one call but should not be able to widen a role beyond its configured boundary. That is the safety argument for role-scoped tools.

Verify that contract directly. Ask the reviewer to inspect files, then ask for an edit or shell-oriented action it should not have. A safe first run should return findings, file paths, and reasoning, not file mutations. If the MCP client can widen a reviewer into a writer during the same call, the configuration is not ready to be trusted.

Do not start with bypassPermissions

The README examples sometimes use CLAUDE_PERMISSION_MODE=bypassPermissions because it makes demos flow. The config source shows why that deserves caution: permission mode defaults to default, and the dangerous skip-permissions flag is set only when bypass mode is explicit. For a real project, start with default permissions, then decide case by case whether any agent deserves acceptEdits or bypass behavior.

The rule is simple: no coordinator, no extra directories, and no bypassPermissions in the first rollout. If a reviewer cannot create useful findings under default permissions and read-only tools, the problem is role design, not missing automation power.

Timeline is the audit surface

Every agent call can write a lightweight timeline entry with run ID, agent name, session ID, cost, turns, error state, prompt excerpt, and working directory. Full transcripts stay in Claude Code's own project storage. This is a good split. The timeline becomes a table of contents for multi-agent work, and _transcript or _report can pull deeper evidence when needed.

After the first run, inspect ~/.claude-octopus/timelines/timeline.jsonl. Confirm the agent name is code_reviewer, the working directory is the repo you intended, the cost is plausible, errors are visible, and the prompt excerpt matches the request. If a multi-agent run cannot be reconstructed later, it is not ready for serious code review.

Use reports before adding more agents

Claude Octopus can generate HTML reports through <name>_report or npx claude-octopus report --out index.html. A team should use that report after the first reviewer run.

npx claude-octopus report --out octopus-reviewer-report.html
npx claude-octopus dashboard

Open the report and check the agent sequence, cost, turns, session IDs, prompt excerpts, errors, and transcript visibility. Only add a second agent after the first agent's report is boring. The report is not decoration; it is how you notice that a test writer ran in the wrong directory or a quick helper exceeded its budget.

Copy-paste smoke test

A first run should take one branch and one intentional issue, not a full repo migration.

git checkout -b octopus-reviewer-smoke
npx claude-octopus init --template solo-agent

After the MCP client reloads, call the reviewer tool with a narrow prompt:

Review the last commit for correctness bugs only. Cite exact files and lines where possible. Do not edit files, do not run commands, and separate confirmed bugs from risks.

Then collect evidence:

tail -n 5 ~/.claude-octopus/timelines/timeline.jsonl
npx claude-octopus report --out octopus-reviewer-report.html

The run passes only if the answer cites files, the timeline row names the reviewer agent, the cost and turn count are visible, the report opens, and git status --short shows no unexpected edits.

Factory mode is useful but still needs review

Factory mode exposes create_claude_code_mcp, a wizard that returns a .mcp.json entry from a plain-language description. The factory source redacts Anthropic API keys and Claude Code OAuth tokens in the rendered output, which is a good default. Still, generated config should be reviewed like code.

Before pasting generated config into a real MCP client, check eight fields: model, allowed tools, disallowed tools, cwd, additional directories, max turns, max budget, and permission mode. Then check credential handling separately. Redaction in rendered output is helpful, but inherited API keys and OAuth tokens are still credentials.

Coordinator mode is the last step

The README shows a coordinator agent that receives other Claude Octopus servers through CLAUDE_MCP_SERVERS. That pattern is powerful because one agent can call researcher, architect, editor, or verifier tools and drive a pipeline. It is also where risk compounds. A coordinator should only be introduced after each inner agent has a narrow role, cost ceiling, max turns, and timeline visibility. Otherwise the system hides complexity behind one impressive prompt.

Why MCP boundaries beat prompt-only roles

The original insight is not that Claude can pretend to be a reviewer, tester, or architect. A single Claude Code session can already do that with prompts. The README and config source show why Claude Octopus is different: the role boundary moves out of the prompt and into the MCP tool surface through a different tool name, description, environment, allowed tools, model, budget, and timeline record.

That is why this pattern is more interesting than a shell alias or a prompt library. A shell alias can start a command, but it does not give the MCP client a stable reviewer tool with its own transcript path. A prompt library can describe a role, but it cannot enforce that the reviewer only has Read,Grep,Glob. Built-in subagent patterns can be useful, but Claude Octopus is worth considering when the workflow needs externally visible, per-role MCP tools that another client can call and audit.

First-hour verification matrix

Treat the first hour as a test, not a migration. Use a tiny branch with one intentional issue, then collect pass/fail evidence for the reviewer role.

1. MCP surface: the client should show only the expected reviewer tools, not a generic unrestricted agent. 2. Read-only boundary: ask the reviewer to fix the bug directly. The acceptable result is a refusal, a finding, or instructions; an unexpected file mutation is a failure. 3. Working directory: ask it to cite the repo path it inspected. A mismatch means the agent is reviewing the wrong project. 4. Evidence quality: require file paths and reasons for each finding. Generic review prose fails the test. 5. Timeline row: confirm run_id, agent, session_id, cost_usd, turns, cwd, and error state appear in the timeline file. 6. Transcript/report path: generate the report and make sure the run can be reconstructed without relying on chat memory. 7. Budget behavior: repeat the test with a low CLAUDE_MAX_BUDGET_USD; the agent should stop cheaply rather than run open-ended. 8. Rollback: remove the MCP entry and confirm the reviewer tool disappears from the client.

Only after all eight checks pass should the team add a test writer or quick helper.

A practical rollout sequence

Use this order. First, run npx claude-octopus init --template solo-agent or create one reviewer manually. Second, verify the MCP client sees exactly the expected tool names, such as code_reviewer, code_reviewer_timeline, code_reviewer_transcript, and code_reviewer_report. Third, run one review against a small diff. Fourth, inspect ~/.claude-octopus/timelines/timeline.jsonl. Fifth, generate a report. Sixth, add a test writer with its own tool name and model. Seventh, add a budget with CLAUDE_MAX_BUDGET_USD. Only then consider a coordinator.

The acceptance test is concrete: after the first run you should have a review answer, a timeline row, a transcript path, a report, and no unexpected write capability. If any one is missing, fix the role before adding more agents.

Where not to use it

Do not use Claude Octopus when a single Claude Code session would be clearer. Do not run multiple agents with broad tools just to imitate a team. Do not combine bypassPermissions, unlimited budget, high effort, and coordinator mode on a live repo. Do not give agents extra directories unless the need is explicit. Do not store per-agent API keys in shared config without treating them as credentials. Do not ignore timeline write failures if auditability is part of the reason you adopted the tool.

The Codex angle

Codex users should care because this is a clean pattern for externalizing specialist roles. A Codex workflow can ask a Claude Code reviewer tool for a second lineage, a test-writer tool for edge cases, or a report tool for transcript evidence. The same discipline applies: name the role, narrow the tool set, cap cost, keep the transcript path visible, and treat the generated report as review evidence rather than agent theater.

Save Claude Octopus as a role-scoped Claude Code agent article. It is useful when one workflow needs separate reviewer, tester, helper, or coordinator roles with explicit tools, budgets, and timeline records; it is risky when used as broad autonomous multi-agent machinery before single-agent boundaries are proven.

Practical takeaway

Start with one read-only reviewer, default permissions, a small diff, and timeline verification. Add reports, then a second specialist, then budget ceilings. Keep coordinator mode, bypassPermissions, extra directories, and per-agent credentials out of the first rollout. If the timeline and report cannot explain what each agent did, the setup is not ready.