The old Vercel Sandbox article deserved a rewrite, not a drop. Its premise was valuable, but the draft treated the topic like a short setup guide and even used stale API shape. The real story is more specific: Claude Managed Agents can keep the agent loop with Anthropic while a Vercel app owns the tool execution boundary.
That boundary is the article. A Vercel Function receives Anthropic work, acknowledges it, creates a fresh sandbox from a snapshot, starts a runner, and lets that runner handle tool calls inside an isolated microVM. Secrets are not supposed to sit in the VM. Network policy decides where the runner can talk. The magazine version should teach that architecture, not repeat the word secure.
The useful idea is a split control plane
The Vercel changelog says Claude Managed Agents handles the model, harness, tools, and session state. Self-hosting changes the execution layer: when Claude calls a custom tool, your side runs it and posts the result back. The Vercel guide names the split clearly. The control plane is a Vercel Function that receives session.status_run_started webhooks. The compute plane is a Vercel Sandbox VM that attaches to the session event stream, executes tool calls such as run_shell and read_file, and exits when the session ends.
That is the right frame for Codex and Claude Code readers. This is not just another sandbox product note. It is a pattern for letting a managed agent touch private infrastructure without handing the managed loop a broad shell on your server.
Start from the reference app, then cut it down
The reference implementation gives a concrete adoption path:
pnpm create next-app --example https://github.com/vercel-labs/cma-vercel-sandbox my-cma-sandbox
cd my-cma-sandbox
vercel link
vercel env pull .env.local
pnpm tsx scripts/create-environment.ts
pnpm tsx scripts/create-agent.ts
pnpm tsx scripts/build-snapshot.ts
pnpm tsx scripts/test-e2e.ts
That sequence matters. create-environment.ts creates a self-hosted Anthropic environment. create-agent.ts creates an agent with custom tools. build-snapshot.ts prepares the runner image. test-e2e.ts proves the path from session creation to sandbox-spawned tool result. Do not start by copying the webhook into production. Start by making this chain pass once with a throwaway agent and a harmless command.
Make the first implementation reviewable
Before the first real deploy, reduce the demo into a small review unit. Put these files in one PR: app/api/webhook/route.ts, sandbox/runner.ts, scripts/create-environment.ts, scripts/create-agent.ts, scripts/build-snapshot.ts, scripts/test-e2e.ts, and the exact env var list. Then run a local audit pass:
rg -n "run_shell|read_file|execSync|networkPolicy|transform|heartbeat|work.stop" app sandbox scripts
pnpm tsx scripts/create-environment.ts
pnpm tsx scripts/create-agent.ts
pnpm tsx scripts/build-snapshot.ts
pnpm tsx scripts/test-e2e.ts
A reviewer should be able to answer five questions from that PR alone. Which webhook event starts work? Which code acknowledges the work item? Which snapshot ID boots the runner? Which domains and paths receive brokered credentials? Which tool calls can Claude make? If any answer requires dashboard memory or tribal knowledge, the rollout is not ready.
The webhook is where ownership starts
The demo app/api/webhook/route.ts is small enough to review. It unwraps the Anthropic webhook with a signing secret, ignores events that are not session.status_run_started, polls the environment work queue, acknowledges one work item, and hands sandbox creation to waitUntil so the HTTP request can return quickly.
The important part is spawn. It creates a sandbox from SANDBOX_SNAPSHOT_ID, uses runtime: "node24", sets a timeout, applies networkPolicy, and runs npx tsx runner.ts detached with only ENVIRONMENT_ID, WORK_ID, and SESSION_ID in the VM environment. That is the operational surface a reviewer should own. If this file grows without review, the whole design loses its clean boundary.
Credential brokering is the main security feature
The article should not say, vaguely, that the sandbox keeps secrets safe. It should name the mechanism. In the reference route, ANTHROPIC_ENVIRONMENT_KEY stays in the control plane. The sandbox runner uses new Anthropic({ authToken: "_brokered_" }). The firewall injects Authorization: Bearer <key> only when outbound traffic matches paths for the current session or the current environment work item.
That path scoping is the difference between a useful design and a decorative sandbox. If compromised code inside the VM prints process.env, the key is not there. If it calls an unrelated Anthropic work endpoint, the network policy should not attach auth. When teams add customer APIs, they need the same rule: allow only the domains, paths, methods, and headers that a session should use.
Network policy is not optional documentation
Vercel's SDK reference documents updateNetworkPolicy with deny-all, domain allowlists, subnet controls, and header injection for credential brokering. The guide also points out a practical pattern: dependency installation may need broader egress, but the policy should be tightened before agent-generated code or private data is processed.
For a Claude agent runner, this should become a review checklist. Which domains are allowed? Are private subnets open? Are transforms scoped by path? Are write endpoints using different credentials from read endpoints? Does the plan support the required transform behavior? If the answer is not written down, the team does not yet have a deployment pattern. It has a demo.
The runner is intentionally underpowered
The demo runner is a teaching implementation. It supports run_shell with execSync and a 30-second timeout, and read_file with readFile. It keeps a handled set, lists existing session events first, streams new events, sends user.custom_tool_result, heartbeats every 30 seconds, and calls work.stop in finally. Those details are good. They prevent lost tool calls, duplicate handling, and abandoned work leases.
But run_shell is also the dangerous part. A production runner should restrict commands, working directories, output size, file paths, and network reach. It should return exit status and stderr clearly. It should avoid letting the agent install arbitrary packages after the network policy has been tightened. The reference app shows the shape; it is not a permission model by itself.
Replace broad shell with named tools
The fastest hardening step is to replace the demo's broad run_shell habit with named tools. For example, a docs agent may need run_tests, read_file, and list_changed_files; it does not need arbitrary curl, package installation, or access to every path. The tool schema in scripts/create-agent.ts and the implementation in sandbox/runner.ts must move together.
Use a small mapping table in the PR description:
Claude tool Runner function Allowed cwd Network
run_tests runTests() /vercel/sandbox/app none
read_file readProjectFile() /vercel/sandbox/app none
call_customer_api callCustomerApi() n/a api.example.com only
That table is not decoration. It lets reviewers catch mismatches before the agent starts calling tools. It also gives Codex a clean artifact to compare against the actual code.
Snapshots are latency control, not persistent workspace
scripts/build-snapshot.ts creates a sandbox, writes package.json and runner.ts into /vercel/sandbox, installs the Anthropic SDK and tsx, takes a snapshot, prints SANDBOX_SNAPSHOT_ID, and stops the sandbox. The SDK reference is clear that taking a snapshot shuts down the sandbox, and later sandboxes can start from that snapshot.
That makes snapshots useful for startup latency because every session does not need to reinstall the runner dependencies. It does not mean the session workspace is durable product state. If the agent creates artifacts that matter, export them before the sandbox ends. Rebuild the snapshot when runner.ts, the SDK version, or tool dependencies change. Treat the snapshot ID like a deploy artifact.
The failure modes are operational
The highest-risk bugs are not syntax mistakes. They are workflow mistakes. A deployed webhook and a local poll.ts loop can compete for the same work items. A runner that skips the initial event listing can miss a tool call emitted while the VM was booting. A runner that does not deduplicate can process the same call twice. A missing heartbeat can let a lease expire. A broad network policy can turn one session into a path to other data.
Those are the checks Codex should be asked to review. Ask it to trace the session ID from webhook to Sandbox.create to runner env. Ask it to inspect every network-policy matcher. Ask it to prove the runner handles restart, replay, timeout, and work.stop behavior. That is where a coding agent helps the architecture instead of merely writing boilerplate.
Where this belongs in a team stack
Use this pattern when the agent needs cloud-local access: private APIs, internal services, per-customer tools, or data that should not pass through a generic external shell runner. It is a better fit for hosted agent products than for a single developer already running Claude Code locally. It also fits teams that want Vercel Functions to remain the audited control plane while one sandbox handles one agent session.
Do not use it to hide broad tools behind a nicer diagram. If the agent can run arbitrary shell commands against sensitive systems, the sandbox boundary only solves part of the problem. The tool contract, network policy, credentials, logs, and human approval rules still need owners.
A publication-grade rollout path
A rollout I would accept has five gates. First, run the reference app locally and make test-e2e.ts pass with a harmless command. Second, replace the default run_shell and read_file behavior with the smallest tool set your agent needs. Third, build a snapshot and record exactly which runner version produced it. Fourth, deploy the webhook with signature verification and one active control plane. Fifth, verify a production session where the VM cannot read ANTHROPIC_ENVIRONMENT_KEY and cannot call an unlisted domain.
That is the rescued article. The value is not that Vercel Sandbox exists. The value is that Claude Managed Agent tool execution can be made reviewable as a split-plane system.
Save this seed as an architecture guide. Vercel Sandbox is valuable for Claude Managed Agents when one reviewed control plane spawns one scoped sandbox runner per session; it is risky when teams copy a broad run_shell demo without network, credential, and replay review.
Practical takeaway
Start with the Vercel reference app, run create-environment, create-agent, build-snapshot, and test-e2e, then review app/api/webhook/route.ts and sandbox/runner.ts before deployment. Keep credentials in the control plane, broker them through path-scoped network policy, restrict the runner tools, rebuild snapshots deliberately, and never run a local poll loop and deployed webhook against the same work queue.