The original claude-replay article should not be kept as written. It sold the project as an interactive code-review breakthrough. The repo points to a better article: agent work needs review artifacts.
Claude Code and Codex sessions leave a dense trail of prompts, assistant text, tool calls, patch attempts, command output, and sometimes reasoning blocks. Raw JSONL is precise but painful to read. A screen recording is easier to watch but hard to search or quote. claude-replay sits between those two: it converts local session logs into a self-contained HTML replay that a reviewer can step through.
That makes it a real tool for vibe coding teams, but only if the team treats the replay as sensitive output. The same HTML that makes a session easy to share can also expose commands, file paths, secrets, internal prompts, and customer details. The article worth publishing is about that boundary.
The artifact is the product
claude-replay supports Claude Code, Cursor, Codex CLI, Gemini CLI, and OpenCode session logs. The README's strongest claim is not that the tool reviews code by itself. It is that those local transcripts can become one HTML file with no external runtime dependency.
That matters because agent work is hard to audit after the fact. A teammate does not need every raw event in a JSONL file. They need the user request, the assistant's decision path, the tool calls, the files touched, and enough timing to understand the sequence. A replay gives them that without asking them to install the original agent runtime.
What the Codex adapter adds
The Codex support is not a badge in the README only. In src/formats/codex.mjs, exec_command is mapped to a Bash-style tool call. apply_patch is parsed into Write or Edit-style operations. Codex terminal wrappers such as Chunk ID, wall time, exit code lines, token count lines, and Output headers are stripped before display. Encrypted reasoning blocks are skipped.
That is the difference between a transcript dump and a review artifact. If a Codex session created a file, changed a line, or ran a failing command, the replay can show the event in a shape a Claude Code reviewer already understands. It is not perfect provenance, but it is far easier to inspect than mixed console metadata.
Start with one local replay
Do not begin by publishing a replay. Start with a local file and prove that the transcript shape works for your agent.
npm install -g claude-replay
claude-replay ~/.codex/sessions/2026/03/12/rollout-<id>.jsonl -o replay.html
For Claude Code, pass a full ~/.claude/projects/<project>/<session>.jsonl path or a session ID. For Codex, pass the rollout JSONL path under ~/.codex/sessions/<year>/<month>/<day>/. The resolver can search known session folders when you pass an ID, but I would start with an explicit path so the first run is unambiguous.
Open the HTML locally. Check whether the user request, tool calls, command outputs, and file edits tell the story you expected. If the replay is missing the part a teammate would need, fix the input selection before touching sharing settings.
Trim before you redact
The best privacy move is to remove unnecessary turns before redaction. claude-replay has turn and time filters, excluded turns, thinking/tool-call toggles, bookmarks, labels, and extract/regenerate support. Use those controls to make the smallest artifact that still explains the work.
claude-replay session.jsonl --turns 5-15 --speed 2.0 -o focused.html
claude-replay session.jsonl --exclude-turns 1,2 --no-thinking -o review.html
claude-replay extract review.html -o review.jsonl
claude-replay review.jsonl --theme dracula -o review-v2.html
For public demos, start with --no-thinking and often --no-tool-calls. For teammate review, keep tool calls when they explain the patch, but remove command output that contains env values, absolute private paths, or third-party data.
Redaction is a guardrail, not a privacy proof
src/secrets.mjs redacts a useful set of known shapes: private keys, AWS access key IDs, Anthropic keys, sk- keys, bearer tokens, JWTs, connection strings, generic key-value secrets, env vars, and long hex tokens. The tests cover those cases and nested object redaction.
That is good engineering hygiene, not a reason to trust raw logs. A transcript can leak a customer name, a private repository path, an internal task, a pasted stack trace, or a secret in an unusual format. Use automatic redaction, then add explicit rules for project-specific strings:
claude-replay session.jsonl \
--redact "customer@example.com" \
--redact "internal-project=PROJECT" \
-o redacted.html
Avoid --no-auto-redact unless you are generating a throwaway local artifact from synthetic data. Before anything leaves your machine, search the generated HTML for company names, emails, tokens, hostnames, and project paths.
The editor has a sensible local boundary
Running claude-replay with no arguments launches a web editor. The README says it runs on 127.0.0.1, keeps edits in memory, and does not modify the original JSONL files. The server code backs that up: sessions are stored in a Map, autosave goes under ~/.claude-replay/autosave, file browsing is restricted to paths under the user's home directory, request bodies are capped at 10 MB, and cross-origin API calls are rejected unless the option is disabled.
That is the right default for private agent logs. Keep it that way. Do not bind the editor to a public interface for convenience. Do not pass --no-origin-check on a shared machine. If you use Docker, mount session directories read-only like the README example.
Use it for reviews, postmortems, and docs
The cleanest adoption path is internal review. When a Codex or Claude Code session fixes a bug, generate a replay and ask a teammate to inspect three points: the original user request, the commands or patches that changed behavior, and the final verification step. The file activity sidebar is valuable when tool-call inputs include file paths, because it lets the reviewer jump from touched files back to the relevant turns.
For postmortems, add bookmarks to the turns that matter: first bad assumption, first failing command, patch attempt, passing test, final answer. For docs, embed only the trimmed replay that explains the workflow. A replay should support the written explanation, not replace it.
Know when raw logs are better
Do not use claude-replay as the only audit record. Raw JSONL is still the better source for exact ingestion, search, diffing parser behavior, or compliance workflows. The replay is a presentation layer built from parsed turns. Format adapters make judgment calls, such as mapping Codex apply_patch into Write or Edit and filtering internal reasoning records.
That is fine for human review. It is not fine when the question is whether a specific raw event existed. Keep the original session file in the investigation folder, then attach the replay as the readable companion.
A rollout I would accept
Start with one non-sensitive agent session from a repo you control. Generate a replay with an explicit path. Trim it to the turns that matter. Add manual redactions. Open the HTML locally and search it for secrets and private identifiers. Ask one teammate to review whether the replay explains the work without the raw transcript.
If that passes, add a small AGENTS.md or CLAUDE.md rule:
When sharing an agent session, generate a local claude-replay artifact first.
Use --turns or --exclude-turns to remove unrelated work.
Use --no-thinking for external demos unless the reasoning trace is explicitly approved.
Add --redact rules for customer names, project names, hostnames, emails, and tokens.
Keep the raw JSONL privately; share only the reviewed HTML artifact.
That rule is narrow enough to follow and strict enough to prevent the worst mistake: treating a raw agent transcript as harmless because it renders nicely.
Skip it for live observability
The README includes --serve --watch, and that can be handy on a remote box or inside a container when you want to watch a session as it changes. I would not use it as production observability. It watches transcript files and regenerates a replay; it does not give you metrics, alerts, trace correlation, or policy enforcement.
Use live watch for demos and debugging an agent run. Use real logs, traces, and access controls for production operations. The replay should make human review easier after a session, not become the system that governs the session.
Save claude-replay as a review tool, not as a magic code reviewer. It is strong when a team needs a readable, redacted session artifact; it is risky when raw agent logs are published without trimming and manual inspection.
Practical takeaway
Adopt claude-replay with a two-file rule: keep the original JSONL private, and share only a generated HTML replay that has been trimmed, auto-redacted, manually searched, and reviewed. For the first week, use it on one internal Codex or Claude Code bug-fix session, run claude-replay <session.jsonl> --turns N-M --no-thinking --redact "private-string" -o review.html, inspect the output locally, and record which kinds of transcript data must never appear in shared artifacts.