워크플로우2026년 6월 1일 · 9 min read

Agent Flow Makes Agent Traces Reviewable

The useful story is not a pretty node graph. It is a review surface for Claude Code hooks, Codex rollout JSONL, tool calls, token counts, transcripts, and file attention.

𝕏 in

EDITOR'S NOTEThe rescue angle is observability, not visual novelty: Agent Flow is useful when it turns Claude Code and Codex execution into traces a reviewer can inspect.

The old article should not be published as a visual-tool puff piece. The HN launch post says Agent Flow makes Claude Code behavior visible, but the repository is more interesting than that.

Agent Flow is useful because it treats an agent run as a trace. It reads Claude Code events through hooks, tails Claude transcripts, parses Codex rollout JSONL, and turns the run into a timeline, graph, transcript, token view, and file-attention surface. For Codex and Claude Code users, that is not decoration. It is a way to review how a patch happened before deciding whether to trust it.

The Review Problem Comes Before the Graph

Most agent tooling shows the final answer and a scrollback. That is not enough for serious engineering work. A reviewer wants to know which tools ran, which files were read repeatedly, where the agent branched, whether a subagent returned useful evidence, and whether the final patch came from source inspection or from a lucky guess.

Agent Flow's graph is only valuable if it answers those questions. The article should therefore start with the review problem: agent execution is often observable only after the fact, and even then the signal is buried inside terminal logs or JSONL files.

Two Runtimes, Two Event Sources

The README now claims support for both Claude Code and Codex. The implementation backs that up with different collection paths.

For Claude Code, the extension configures hooks for lifecycle and tool-use events: SessionStart, PreToolUse, PostToolUse, PostToolUseFailure, SubagentStart, SubagentStop, Notification, Stop, and SessionEnd. The runtime then combines a hook server with a session watcher and filters duplicate subagent lifecycle events.

For Codex, the parser reads rollout JSONL from ~/.codex/sessions/**/rollout-*.jsonl and respects CODEX_HOME. That matters because Codex already emits a structured execution record. Agent Flow's job is not to invent observability. It is to render that record without losing the semantics.

The Codex Parser Is the Real Evidence

The strongest source file is codex-rollout-parser.ts. Its opening comment names the five record types it handles: session_meta, turn_context, response_item, event_msg, and compacted.

The dedup rules are also important. Messages come from response_item.message, not mirrored event_msg copies. Reasoning comes from event_msg.agent_reasoning because the response-item reasoning payload is not display-friendly. Tool results come from function-call output records, while parallel exec lifecycle messages are skipped. Token counts use the authoritative event_msg.token_count when available.

That is the difference between a visualization toy and a usable trace viewer. If the parser gets source-of-truth rules wrong, the graph misleads the reviewer.

Use It to Ask Better Review Questions

A reviewer should not open Agent Flow just to watch nodes move. The practical workflow is a checklist:

Which files did the agent read before editing?
Which file or tool call consumed the most time?
Did subagents return evidence or just summaries?
Did the agent retry the same failing command?
Did token use spike before the final decision?
Does the transcript show a source-backed reason for the patch?

Those questions turn the visual surface into a review aid. If the graph shows a patch was written before the relevant file was read, the reviewer has a concrete reason to distrust the result.

Hooks Are Power, So Treat Setup as a Change

Agent Flow can configure Claude Code hooks automatically. That is convenient, but it is still a settings change. The hook configuration writes into ~/.claude/settings.json, preserves existing hooks, and filters out prior Agent Flow entries before adding new ones. The runtime uses discovery files so the settings do not need to hardcode a live port.

That is sensible engineering. It still deserves an adoption step: inspect the settings diff, know how to remove the hook, and run one session before turning it on for shared work. Hooks sit in the tool lifecycle. A broken hook can change the observability path, slow a run, or create duplicate events.

Codex Replay Is the Safer First Use

For a Codex-heavy team, the lowest-risk starting point is replay. Point Agent Flow at an existing JSONL event log or let it watch ~/.codex/sessions. This avoids changing Claude Code hook settings and lets the team compare the visual trace against the raw rollout file.

Start with a completed session that produced a real patch. Check whether the graph lines up with what actually happened: user request, tool calls, outputs, reasoning, compaction, token count, and final message. If the viewer hides an important failure or merges two separate tool calls, do not use it as a review surface yet.

Telemetry Must Be Part of the Recommendation

The README includes the right privacy caveat: telemetry is opt-out for the published npx agent-flow-app binary and disabled for pnpm run dev and the VS Code extension. The implementation serializes aggregate fields such as session duration, event count, OS, arch, source, model IDs, runtimes, and error class. It says prompts, file paths, tool calls, user info, and environment variables are not sent.

That is not a reason to reject the tool. It is a reason to write the adoption command carefully:

AGENT_FLOW_TELEMETRY=false npx agent-flow-app
# or
DO_NOT_TRACK=1 npx agent-flow-app

For teams reviewing private code, this disclosure belongs in the first-run checklist.

Where It Does Not Help

Agent Flow does not prove a patch is correct. It does not replace tests, source review, or a reproducible command log. It can also create false confidence if a reviewer treats a clean graph as evidence that the agent understood the code.

Skip it for small one-shot edits where terminal scrollback is enough. Use it when the session has multiple agents, repeated tool calls, unclear timing, token pressure, or a final result that needs forensic review. The higher the cost of a wrong patch, the more useful the trace becomes.

A Practical Rollout

I would introduce it in three steps:

1. Replay one completed Codex session from JSONL and compare the graph to the raw rollout file.
2. Run `AGENT_FLOW_TELEMETRY=false npx agent-flow-app` during one live local session.
3. If Claude Code hooks are needed, inspect `~/.claude/settings.json` before and after running setup.

After that, decide what the tool changes. A good rule is: use Agent Flow for postmortems and high-risk multi-step runs; attach the trace observations to the handoff only when they explain a review decision.

Verdict

Agent Flow is publishable because it gives Codex and Claude Code users a concrete observability pattern. The source code has enough substance: a Codex rollout parser with source-of-truth decisions, Claude hook configuration, duplicate-event handling, multi-runtime selection, JSONL replay, and an explicit telemetry model.

The article should not claim that visualization makes agents reliable. It should say the opposite: reliable agent workflows need traces that reviewers can inspect. Agent Flow is interesting because it makes those traces easier to read.

Save Agent Flow as an agent-observability article: use it to review how Claude Code and Codex sessions reached a result, not as proof that the result is correct.

Practical takeaway

Start with Codex JSONL replay, compare Agent Flow's graph against the raw rollout file, run the npx app with telemetry disabled for private work, and only configure Claude Code hooks after inspecting the settings diff. Use the trace to answer review questions about files read, tools called, token pressure, subagent evidence, and repeated failures.

SOURCES

[1] Primary sourcegithub.com

[2] VS Code extension manifestgithub.com

[3] Codex rollout parsergithub.com

[4] Claude hook configurationgithub.com

[5] Claude runtime wiringgithub.com

[6] Telemetry implementationgithub.com

[7] HN launch posthacker-news.firebaseio.com

codexclaude codeagent observabilityhookstool traces

Claude Code 생태계를 앞서가세요

MCP 서버, 스킬, 에이전트 패턴, 바이브코딩 인사이트를 매주 전해드립니다.

무료 구독

OpenCode 전환 후 /code-review · /security-review 공백: opencode-power-pack의 SKILL.md 포팅 구조와 도입 조건

Anthropic 공식 Claude Code 플러그인의 code-review, security-review, feature-dev는 OpenCode에서 그대로 동작하지 않는다. opencode-power-pack은 이 워크플로우들을 OpenCode 네이티브 SKILL.md 포맷으로 번역하고, ~/.config/opencode/opencode.json 한 줄 설정으로 11개 스킬을 적재한다.

2026년 6월 11일

opencodeskillscode-review

도구8분 읽기

guard-skills: Claude Code diff에 catch-all 오류·환각 import·HPOS 패턴을 잡는 5개 리뷰 Skill

guard-skills는 Claude Code, Codex, Cursor가 생성한 코드·테스트·문서에 second-pass 리뷰를 수행하는 5개 Skill 패키지입니다. clean-code-guard는 GitClear·USENIX 연구에 근거한 14가지 AI 실패 패턴을 검사하고, woo-guard는 AI가 반복 생성하는 pre-HPOS WooCommerce 코드 패턴을 직접 겨냥합니다.

2026년 6월 11일

skillsclaude-codecode-review

도구7 min read

baoyu-design: What Changes When You Run claude.ai/design Locally in Claude Code

baoyu-design ports the claude.ai/design methodology to Claude Code by detecting the agent environment at runtime and loading a tool substitution table from `references/claude.md`. The design system pipeline — `compile-design-system.mjs` produces static lint config, `import-design-system.mjs` generates a token allowlist as `_ds_prompt.md` — enforces the same constraint at two layers. Neither is optional if you want session-to-session consistency.

2026년 6월 10일

skillsclaude-codedesign-systems