The old article should not be published as a visual-tool puff piece. The HN launch post says Agent Flow makes Claude Code behavior visible, but the repository is more interesting than that.
Agent Flow is useful because it treats an agent run as a trace. It reads Claude Code events through hooks, tails Claude transcripts, parses Codex rollout JSONL, and turns the run into a timeline, graph, transcript, token view, and file-attention surface. For Codex and Claude Code users, that is not decoration. It is a way to review how a patch happened before deciding whether to trust it.
The Review Problem Comes Before the Graph
Most agent tooling shows the final answer and a scrollback. That is not enough for serious engineering work. A reviewer wants to know which tools ran, which files were read repeatedly, where the agent branched, whether a subagent returned useful evidence, and whether the final patch came from source inspection or from a lucky guess.
Agent Flow's graph is only valuable if it answers those questions. The article should therefore start with the review problem: agent execution is often observable only after the fact, and even then the signal is buried inside terminal logs or JSONL files.
Two Runtimes, Two Event Sources
The README now claims support for both Claude Code and Codex. The implementation backs that up with different collection paths.
For Claude Code, the extension configures hooks for lifecycle and tool-use events: SessionStart, PreToolUse, PostToolUse, PostToolUseFailure, SubagentStart, SubagentStop, Notification, Stop, and SessionEnd. The runtime then combines a hook server with a session watcher and filters duplicate subagent lifecycle events.
For Codex, the parser reads rollout JSONL from ~/.codex/sessions/**/rollout-*.jsonl and respects CODEX_HOME. That matters because Codex already emits a structured execution record. Agent Flow's job is not to invent observability. It is to render that record without losing the semantics.
The Codex Parser Is the Real Evidence
The strongest source file is codex-rollout-parser.ts. Its opening comment names the five record types it handles: session_meta, turn_context, response_item, event_msg, and compacted.
The dedup rules are also important. Messages come from response_item.message, not mirrored event_msg copies. Reasoning comes from event_msg.agent_reasoning because the response-item reasoning payload is not display-friendly. Tool results come from function-call output records, while parallel exec lifecycle messages are skipped. Token counts use the authoritative event_msg.token_count when available.
That is the difference between a visualization toy and a usable trace viewer. If the parser gets source-of-truth rules wrong, the graph misleads the reviewer.
Use It to Ask Better Review Questions
A reviewer should not open Agent Flow just to watch nodes move. The practical workflow is a checklist:
Which files did the agent read before editing?
Which file or tool call consumed the most time?
Did subagents return evidence or just summaries?
Did the agent retry the same failing command?
Did token use spike before the final decision?
Does the transcript show a source-backed reason for the patch?
Those questions turn the visual surface into a review aid. If the graph shows a patch was written before the relevant file was read, the reviewer has a concrete reason to distrust the result.
Hooks Are Power, So Treat Setup as a Change
Agent Flow can configure Claude Code hooks automatically. That is convenient, but it is still a settings change. The hook configuration writes into ~/.claude/settings.json, preserves existing hooks, and filters out prior Agent Flow entries before adding new ones. The runtime uses discovery files so the settings do not need to hardcode a live port.
That is sensible engineering. It still deserves an adoption step: inspect the settings diff, know how to remove the hook, and run one session before turning it on for shared work. Hooks sit in the tool lifecycle. A broken hook can change the observability path, slow a run, or create duplicate events.
Codex Replay Is the Safer First Use
For a Codex-heavy team, the lowest-risk starting point is replay. Point Agent Flow at an existing JSONL event log or let it watch ~/.codex/sessions. This avoids changing Claude Code hook settings and lets the team compare the visual trace against the raw rollout file.
Start with a completed session that produced a real patch. Check whether the graph lines up with what actually happened: user request, tool calls, outputs, reasoning, compaction, token count, and final message. If the viewer hides an important failure or merges two separate tool calls, do not use it as a review surface yet.
Telemetry Must Be Part of the Recommendation
The README includes the right privacy caveat: telemetry is opt-out for the published npx agent-flow-app binary and disabled for pnpm run dev and the VS Code extension. The implementation serializes aggregate fields such as session duration, event count, OS, arch, source, model IDs, runtimes, and error class. It says prompts, file paths, tool calls, user info, and environment variables are not sent.
That is not a reason to reject the tool. It is a reason to write the adoption command carefully:
AGENT_FLOW_TELEMETRY=false npx agent-flow-app
# or
DO_NOT_TRACK=1 npx agent-flow-app
For teams reviewing private code, this disclosure belongs in the first-run checklist.
Where It Does Not Help
Agent Flow does not prove a patch is correct. It does not replace tests, source review, or a reproducible command log. It can also create false confidence if a reviewer treats a clean graph as evidence that the agent understood the code.
Skip it for small one-shot edits where terminal scrollback is enough. Use it when the session has multiple agents, repeated tool calls, unclear timing, token pressure, or a final result that needs forensic review. The higher the cost of a wrong patch, the more useful the trace becomes.
A Practical Rollout
I would introduce it in three steps:
1. Replay one completed Codex session from JSONL and compare the graph to the raw rollout file.
2. Run `AGENT_FLOW_TELEMETRY=false npx agent-flow-app` during one live local session.
3. If Claude Code hooks are needed, inspect `~/.claude/settings.json` before and after running setup.
After that, decide what the tool changes. A good rule is: use Agent Flow for postmortems and high-risk multi-step runs; attach the trace observations to the handoff only when they explain a review decision.
Verdict
Agent Flow is publishable because it gives Codex and Claude Code users a concrete observability pattern. The source code has enough substance: a Codex rollout parser with source-of-truth decisions, Claude hook configuration, duplicate-event handling, multi-runtime selection, JSONL replay, and an explicit telemetry model.
The article should not claim that visualization makes agents reliable. It should say the opposite: reliable agent workflows need traces that reviewers can inspect. Agent Flow is interesting because it makes those traces easier to read.
Save Agent Flow as an agent-observability article: use it to review how Claude Code and Codex sessions reached a result, not as proof that the result is correct.
Practical takeaway
Start with Codex JSONL replay, compare Agent Flow's graph against the raw rollout file, run the npx app with telemetry disabled for private work, and only configure Claude Code hooks after inspecting the settings diff. Use the trace to answer review questions about files read, tools called, token pressure, subagent evidence, and repeated failures.