도구2026년 6월 1일 · 8 min read

Review Agent Logs with claude-replay

The tool is worth saving as a session-review artifact: turn local Claude Code, Codex, Cursor, Gemini, and OpenCode logs into redacted HTML before a teammate or reader sees them.

𝕏 in

EDITOR'S NOTETreat replay HTML as a sensitive review artifact. Redact first, limit the turns, and publish only the session details a teammate or reader needs.

The original claude-replay article should not be kept as written. It sold the project as an interactive code-review breakthrough. The repo points to a better article: agent work needs review artifacts.

Claude Code and Codex sessions leave a dense trail of prompts, assistant text, tool calls, patch attempts, command output, and sometimes reasoning blocks. Raw JSONL is precise but painful to read. A screen recording is easier to watch but hard to search or quote. claude-replay sits between those two: it converts local session logs into a self-contained HTML replay that a reviewer can step through.

That makes it a real tool for vibe coding teams, but only if the team treats the replay as sensitive output. The same HTML that makes a session easy to share can also expose commands, file paths, secrets, internal prompts, and customer details. The article worth publishing is about that boundary.

The artifact is the product

claude-replay supports Claude Code, Cursor, Codex CLI, Gemini CLI, and OpenCode session logs. The README's strongest claim is not that the tool reviews code by itself. It is that those local transcripts can become one HTML file with no external runtime dependency.

That matters because agent work is hard to audit after the fact. A teammate does not need every raw event in a JSONL file. They need the user request, the assistant's decision path, the tool calls, the files touched, and enough timing to understand the sequence. A replay gives them that without asking them to install the original agent runtime.

What the Codex adapter adds

The Codex support is not a badge in the README only. In src/formats/codex.mjs, exec_command is mapped to a Bash-style tool call. apply_patch is parsed into Write or Edit-style operations. Codex terminal wrappers such as Chunk ID, wall time, exit code lines, token count lines, and Output headers are stripped before display. Encrypted reasoning blocks are skipped.

That is the difference between a transcript dump and a review artifact. If a Codex session created a file, changed a line, or ran a failing command, the replay can show the event in a shape a Claude Code reviewer already understands. It is not perfect provenance, but it is far easier to inspect than mixed console metadata.

Start with one local replay

Do not begin by publishing a replay. Start with a local file and prove that the transcript shape works for your agent.

npm install -g claude-replay
claude-replay ~/.codex/sessions/2026/03/12/rollout-<id>.jsonl -o replay.html

For Claude Code, pass a full ~/.claude/projects/<project>/<session>.jsonl path or a session ID. For Codex, pass the rollout JSONL path under ~/.codex/sessions/<year>/<month>/<day>/. The resolver can search known session folders when you pass an ID, but I would start with an explicit path so the first run is unambiguous.

Open the HTML locally. Check whether the user request, tool calls, command outputs, and file edits tell the story you expected. If the replay is missing the part a teammate would need, fix the input selection before touching sharing settings.

Trim before you redact

The best privacy move is to remove unnecessary turns before redaction. claude-replay has turn and time filters, excluded turns, thinking/tool-call toggles, bookmarks, labels, and extract/regenerate support. Use those controls to make the smallest artifact that still explains the work.

claude-replay session.jsonl --turns 5-15 --speed 2.0 -o focused.html
claude-replay session.jsonl --exclude-turns 1,2 --no-thinking -o review.html
claude-replay extract review.html -o review.jsonl
claude-replay review.jsonl --theme dracula -o review-v2.html

For public demos, start with --no-thinking and often --no-tool-calls. For teammate review, keep tool calls when they explain the patch, but remove command output that contains env values, absolute private paths, or third-party data.

Redaction is a guardrail, not a privacy proof

src/secrets.mjs redacts a useful set of known shapes: private keys, AWS access key IDs, Anthropic keys, sk- keys, bearer tokens, JWTs, connection strings, generic key-value secrets, env vars, and long hex tokens. The tests cover those cases and nested object redaction.

That is good engineering hygiene, not a reason to trust raw logs. A transcript can leak a customer name, a private repository path, an internal task, a pasted stack trace, or a secret in an unusual format. Use automatic redaction, then add explicit rules for project-specific strings:

claude-replay session.jsonl \
  --redact "customer@example.com" \
  --redact "internal-project=PROJECT" \
  -o redacted.html

Avoid --no-auto-redact unless you are generating a throwaway local artifact from synthetic data. Before anything leaves your machine, search the generated HTML for company names, emails, tokens, hostnames, and project paths.

The editor has a sensible local boundary

Running claude-replay with no arguments launches a web editor. The README says it runs on 127.0.0.1, keeps edits in memory, and does not modify the original JSONL files. The server code backs that up: sessions are stored in a Map, autosave goes under ~/.claude-replay/autosave, file browsing is restricted to paths under the user's home directory, request bodies are capped at 10 MB, and cross-origin API calls are rejected unless the option is disabled.

That is the right default for private agent logs. Keep it that way. Do not bind the editor to a public interface for convenience. Do not pass --no-origin-check on a shared machine. If you use Docker, mount session directories read-only like the README example.

Use it for reviews, postmortems, and docs

The cleanest adoption path is internal review. When a Codex or Claude Code session fixes a bug, generate a replay and ask a teammate to inspect three points: the original user request, the commands or patches that changed behavior, and the final verification step. The file activity sidebar is valuable when tool-call inputs include file paths, because it lets the reviewer jump from touched files back to the relevant turns.

For postmortems, add bookmarks to the turns that matter: first bad assumption, first failing command, patch attempt, passing test, final answer. For docs, embed only the trimmed replay that explains the workflow. A replay should support the written explanation, not replace it.

Know when raw logs are better

Do not use claude-replay as the only audit record. Raw JSONL is still the better source for exact ingestion, search, diffing parser behavior, or compliance workflows. The replay is a presentation layer built from parsed turns. Format adapters make judgment calls, such as mapping Codex apply_patch into Write or Edit and filtering internal reasoning records.

That is fine for human review. It is not fine when the question is whether a specific raw event existed. Keep the original session file in the investigation folder, then attach the replay as the readable companion.

A rollout I would accept

Start with one non-sensitive agent session from a repo you control. Generate a replay with an explicit path. Trim it to the turns that matter. Add manual redactions. Open the HTML locally and search it for secrets and private identifiers. Ask one teammate to review whether the replay explains the work without the raw transcript.

If that passes, add a small AGENTS.md or CLAUDE.md rule:

When sharing an agent session, generate a local claude-replay artifact first.
Use --turns or --exclude-turns to remove unrelated work.
Use --no-thinking for external demos unless the reasoning trace is explicitly approved.
Add --redact rules for customer names, project names, hostnames, emails, and tokens.
Keep the raw JSONL privately; share only the reviewed HTML artifact.

That rule is narrow enough to follow and strict enough to prevent the worst mistake: treating a raw agent transcript as harmless because it renders nicely.

Skip it for live observability

The README includes --serve --watch, and that can be handy on a remote box or inside a container when you want to watch a session as it changes. I would not use it as production observability. It watches transcript files and regenerates a replay; it does not give you metrics, alerts, trace correlation, or policy enforcement.

Use live watch for demos and debugging an agent run. Use real logs, traces, and access controls for production operations. The replay should make human review easier after a session, not become the system that governs the session.

Save claude-replay as a review tool, not as a magic code reviewer. It is strong when a team needs a readable, redacted session artifact; it is risky when raw agent logs are published without trimming and manual inspection.

Practical takeaway

Adopt claude-replay with a two-file rule: keep the original JSONL private, and share only a generated HTML replay that has been trimmed, auto-redacted, manually searched, and reviewed. For the first week, use it on one internal Codex or Claude Code bug-fix session, run claude-replay <session.jsonl> --turns N-M --no-thinking --redact "private-string" -o review.html, inspect the output locally, and record which kinds of transcript data must never appear in shared artifacts.

SOURCES

[1] Primary sourcegithub.com

[2] package metadatagithub.com

[3] parser public APIgithub.com

[4] Codex format adaptergithub.com

[5] session resolvergithub.com

[6] renderergithub.com

[7] secret redactiongithub.com

[8] editor servergithub.com

[9] parser testsgithub.com

[10] redaction testsgithub.com

[11] shared parser utilitiesgithub.com

[12] player e2e testsgithub.com

claude codecodexsession replayagent logsvibe coding

Claude Code 생태계를 앞서가세요

MCP 서버, 스킬, 에이전트 패턴, 바이브코딩 인사이트를 매주 전해드립니다.

무료 구독

OpenCode 전환 후 /code-review · /security-review 공백: opencode-power-pack의 SKILL.md 포팅 구조와 도입 조건

Anthropic 공식 Claude Code 플러그인의 code-review, security-review, feature-dev는 OpenCode에서 그대로 동작하지 않는다. opencode-power-pack은 이 워크플로우들을 OpenCode 네이티브 SKILL.md 포맷으로 번역하고, ~/.config/opencode/opencode.json 한 줄 설정으로 11개 스킬을 적재한다.

2026년 6월 11일

opencodeskillscode-review

도구8분 읽기

guard-skills: Claude Code diff에 catch-all 오류·환각 import·HPOS 패턴을 잡는 5개 리뷰 Skill

guard-skills는 Claude Code, Codex, Cursor가 생성한 코드·테스트·문서에 second-pass 리뷰를 수행하는 5개 Skill 패키지입니다. clean-code-guard는 GitClear·USENIX 연구에 근거한 14가지 AI 실패 패턴을 검사하고, woo-guard는 AI가 반복 생성하는 pre-HPOS WooCommerce 코드 패턴을 직접 겨냥합니다.

2026년 6월 11일

skillsclaude-codecode-review

도구7 min read

baoyu-design: What Changes When You Run claude.ai/design Locally in Claude Code

baoyu-design ports the claude.ai/design methodology to Claude Code by detecting the agent environment at runtime and loading a tool substitution table from `references/claude.md`. The design system pipeline — `compile-design-system.mjs` produces static lint config, `import-design-system.mjs` generates a token allowlist as `_ds_prompt.md` — enforces the same constraint at two layers. Neither is optional if you want session-to-session consistency.

2026년 6월 10일

skillsclaude-codedesign-systems