워크플로우2026년 6월 1일 · 10 min read

Run Claude Agent Tools in Vercel Sandbox

The salvageable idea is the split-plane design: Anthropic keeps the agent loop, while Vercel spawns one reviewed sandbox runner per session.

𝕏 in

EDITOR'S NOTEUse Vercel Sandbox for Claude Managed Agents only when the control plane, runner tools, network policy, and credential brokering are reviewed as one system.

The old Vercel Sandbox article deserved a rewrite, not a drop. Its premise was valuable, but the draft treated the topic like a short setup guide and even used stale API shape. The real story is more specific: Claude Managed Agents can keep the agent loop with Anthropic while a Vercel app owns the tool execution boundary.

That boundary is the article. A Vercel Function receives Anthropic work, acknowledges it, creates a fresh sandbox from a snapshot, starts a runner, and lets that runner handle tool calls inside an isolated microVM. Secrets are not supposed to sit in the VM. Network policy decides where the runner can talk. The magazine version should teach that architecture, not repeat the word secure.

The useful idea is a split control plane

The Vercel changelog says Claude Managed Agents handles the model, harness, tools, and session state. Self-hosting changes the execution layer: when Claude calls a custom tool, your side runs it and posts the result back. The Vercel guide names the split clearly. The control plane is a Vercel Function that receives session.status_run_started webhooks. The compute plane is a Vercel Sandbox VM that attaches to the session event stream, executes tool calls such as run_shell and read_file, and exits when the session ends.

That is the right frame for Codex and Claude Code readers. This is not just another sandbox product note. It is a pattern for letting a managed agent touch private infrastructure without handing the managed loop a broad shell on your server.

Start from the reference app, then cut it down

The reference implementation gives a concrete adoption path:

pnpm create next-app --example https://github.com/vercel-labs/cma-vercel-sandbox my-cma-sandbox
cd my-cma-sandbox
vercel link
vercel env pull .env.local
pnpm tsx scripts/create-environment.ts
pnpm tsx scripts/create-agent.ts
pnpm tsx scripts/build-snapshot.ts
pnpm tsx scripts/test-e2e.ts

That sequence matters. create-environment.ts creates a self-hosted Anthropic environment. create-agent.ts creates an agent with custom tools. build-snapshot.ts prepares the runner image. test-e2e.ts proves the path from session creation to sandbox-spawned tool result. Do not start by copying the webhook into production. Start by making this chain pass once with a throwaway agent and a harmless command.

Make the first implementation reviewable

Before the first real deploy, reduce the demo into a small review unit. Put these files in one PR: app/api/webhook/route.ts, sandbox/runner.ts, scripts/create-environment.ts, scripts/create-agent.ts, scripts/build-snapshot.ts, scripts/test-e2e.ts, and the exact env var list. Then run a local audit pass:

rg -n "run_shell|read_file|execSync|networkPolicy|transform|heartbeat|work.stop" app sandbox scripts
pnpm tsx scripts/create-environment.ts
pnpm tsx scripts/create-agent.ts
pnpm tsx scripts/build-snapshot.ts
pnpm tsx scripts/test-e2e.ts

A reviewer should be able to answer five questions from that PR alone. Which webhook event starts work? Which code acknowledges the work item? Which snapshot ID boots the runner? Which domains and paths receive brokered credentials? Which tool calls can Claude make? If any answer requires dashboard memory or tribal knowledge, the rollout is not ready.

The webhook is where ownership starts

The demo app/api/webhook/route.ts is small enough to review. It unwraps the Anthropic webhook with a signing secret, ignores events that are not session.status_run_started, polls the environment work queue, acknowledges one work item, and hands sandbox creation to waitUntil so the HTTP request can return quickly.

The important part is spawn. It creates a sandbox from SANDBOX_SNAPSHOT_ID, uses runtime: "node24", sets a timeout, applies networkPolicy, and runs npx tsx runner.ts detached with only ENVIRONMENT_ID, WORK_ID, and SESSION_ID in the VM environment. That is the operational surface a reviewer should own. If this file grows without review, the whole design loses its clean boundary.

Credential brokering is the main security feature

The article should not say, vaguely, that the sandbox keeps secrets safe. It should name the mechanism. In the reference route, ANTHROPIC_ENVIRONMENT_KEY stays in the control plane. The sandbox runner uses new Anthropic({ authToken: "_brokered_" }). The firewall injects Authorization: Bearer <key> only when outbound traffic matches paths for the current session or the current environment work item.

That path scoping is the difference between a useful design and a decorative sandbox. If compromised code inside the VM prints process.env, the key is not there. If it calls an unrelated Anthropic work endpoint, the network policy should not attach auth. When teams add customer APIs, they need the same rule: allow only the domains, paths, methods, and headers that a session should use.

Network policy is not optional documentation

Vercel's SDK reference documents updateNetworkPolicy with deny-all, domain allowlists, subnet controls, and header injection for credential brokering. The guide also points out a practical pattern: dependency installation may need broader egress, but the policy should be tightened before agent-generated code or private data is processed.

For a Claude agent runner, this should become a review checklist. Which domains are allowed? Are private subnets open? Are transforms scoped by path? Are write endpoints using different credentials from read endpoints? Does the plan support the required transform behavior? If the answer is not written down, the team does not yet have a deployment pattern. It has a demo.

The runner is intentionally underpowered

The demo runner is a teaching implementation. It supports run_shell with execSync and a 30-second timeout, and read_file with readFile. It keeps a handled set, lists existing session events first, streams new events, sends user.custom_tool_result, heartbeats every 30 seconds, and calls work.stop in finally. Those details are good. They prevent lost tool calls, duplicate handling, and abandoned work leases.

But run_shell is also the dangerous part. A production runner should restrict commands, working directories, output size, file paths, and network reach. It should return exit status and stderr clearly. It should avoid letting the agent install arbitrary packages after the network policy has been tightened. The reference app shows the shape; it is not a permission model by itself.

Replace broad shell with named tools

The fastest hardening step is to replace the demo's broad run_shell habit with named tools. For example, a docs agent may need run_tests, read_file, and list_changed_files; it does not need arbitrary curl, package installation, or access to every path. The tool schema in scripts/create-agent.ts and the implementation in sandbox/runner.ts must move together.

Use a small mapping table in the PR description:

Claude tool        Runner function        Allowed cwd          Network
run_tests          runTests()              /vercel/sandbox/app  none
read_file          readProjectFile()       /vercel/sandbox/app  none
call_customer_api  callCustomerApi()       n/a                  api.example.com only

That table is not decoration. It lets reviewers catch mismatches before the agent starts calling tools. It also gives Codex a clean artifact to compare against the actual code.

Snapshots are latency control, not persistent workspace

scripts/build-snapshot.ts creates a sandbox, writes package.json and runner.ts into /vercel/sandbox, installs the Anthropic SDK and tsx, takes a snapshot, prints SANDBOX_SNAPSHOT_ID, and stops the sandbox. The SDK reference is clear that taking a snapshot shuts down the sandbox, and later sandboxes can start from that snapshot.

That makes snapshots useful for startup latency because every session does not need to reinstall the runner dependencies. It does not mean the session workspace is durable product state. If the agent creates artifacts that matter, export them before the sandbox ends. Rebuild the snapshot when runner.ts, the SDK version, or tool dependencies change. Treat the snapshot ID like a deploy artifact.

The failure modes are operational

The highest-risk bugs are not syntax mistakes. They are workflow mistakes. A deployed webhook and a local poll.ts loop can compete for the same work items. A runner that skips the initial event listing can miss a tool call emitted while the VM was booting. A runner that does not deduplicate can process the same call twice. A missing heartbeat can let a lease expire. A broad network policy can turn one session into a path to other data.

Those are the checks Codex should be asked to review. Ask it to trace the session ID from webhook to Sandbox.create to runner env. Ask it to inspect every network-policy matcher. Ask it to prove the runner handles restart, replay, timeout, and work.stop behavior. That is where a coding agent helps the architecture instead of merely writing boilerplate.

Where this belongs in a team stack

Use this pattern when the agent needs cloud-local access: private APIs, internal services, per-customer tools, or data that should not pass through a generic external shell runner. It is a better fit for hosted agent products than for a single developer already running Claude Code locally. It also fits teams that want Vercel Functions to remain the audited control plane while one sandbox handles one agent session.

Do not use it to hide broad tools behind a nicer diagram. If the agent can run arbitrary shell commands against sensitive systems, the sandbox boundary only solves part of the problem. The tool contract, network policy, credentials, logs, and human approval rules still need owners.

A publication-grade rollout path

A rollout I would accept has five gates. First, run the reference app locally and make test-e2e.ts pass with a harmless command. Second, replace the default run_shell and read_file behavior with the smallest tool set your agent needs. Third, build a snapshot and record exactly which runner version produced it. Fourth, deploy the webhook with signature verification and one active control plane. Fifth, verify a production session where the VM cannot read ANTHROPIC_ENVIRONMENT_KEY and cannot call an unlisted domain.

That is the rescued article. The value is not that Vercel Sandbox exists. The value is that Claude Managed Agent tool execution can be made reviewable as a split-plane system.

Save this seed as an architecture guide. Vercel Sandbox is valuable for Claude Managed Agents when one reviewed control plane spawns one scoped sandbox runner per session; it is risky when teams copy a broad run_shell demo without network, credential, and replay review.

Practical takeaway

Start with the Vercel reference app, run create-environment, create-agent, build-snapshot, and test-e2e, then review app/api/webhook/route.ts and sandbox/runner.ts before deployment. Keep credentials in the control plane, broker them through path-scoped network policy, restrict the runner tools, rebuild snapshots deliberately, and never run a local poll loop and deployed webhook against the same work queue.

SOURCES

[1] Primary sourcevercel.com

[2] Vercel guidevercel.com

[3] CMA Vercel Sandbox demo READMEgithub.com

[4] CMA webhook routegithub.com

[5] CMA sandbox runnergithub.com

[6] CMA snapshot buildergithub.com

[7] CMA agent setup scriptgithub.com

[8] Vercel Sandbox SDK referencevercel.com

[9] Vercel Sandbox docsvercel.com

[10] Vercel Sandbox pricing and limitsvercel.com

[11] CMA end-to-end testgithub.com

vercel sandboxclaude managed agentsanthropiccodexagent security

Claude Code 생태계를 앞서가세요

MCP 서버, 스킬, 에이전트 패턴, 바이브코딩 인사이트를 매주 전해드립니다.

무료 구독

OpenCode 전환 후 /code-review · /security-review 공백: opencode-power-pack의 SKILL.md 포팅 구조와 도입 조건

Anthropic 공식 Claude Code 플러그인의 code-review, security-review, feature-dev는 OpenCode에서 그대로 동작하지 않는다. opencode-power-pack은 이 워크플로우들을 OpenCode 네이티브 SKILL.md 포맷으로 번역하고, ~/.config/opencode/opencode.json 한 줄 설정으로 11개 스킬을 적재한다.

2026년 6월 11일

opencodeskillscode-review

도구8분 읽기

guard-skills: Claude Code diff에 catch-all 오류·환각 import·HPOS 패턴을 잡는 5개 리뷰 Skill

guard-skills는 Claude Code, Codex, Cursor가 생성한 코드·테스트·문서에 second-pass 리뷰를 수행하는 5개 Skill 패키지입니다. clean-code-guard는 GitClear·USENIX 연구에 근거한 14가지 AI 실패 패턴을 검사하고, woo-guard는 AI가 반복 생성하는 pre-HPOS WooCommerce 코드 패턴을 직접 겨냥합니다.

2026년 6월 11일

skillsclaude-codecode-review

도구7 min read

baoyu-design: What Changes When You Run claude.ai/design Locally in Claude Code

baoyu-design ports the claude.ai/design methodology to Claude Code by detecting the agent environment at runtime and loading a tool substitution table from `references/claude.md`. The design system pipeline — `compile-design-system.mjs` produces static lint config, `import-design-system.mjs` generates a token allowlist as `_ds_prompt.md` — enforces the same constraint at two layers. Neither is optional if you want session-to-session consistency.

2026년 6월 10일

skillsclaude-codedesign-systems