워크플로우2026년 6월 1일 · 10 min read

Test the Facts Your Agents Read with ctxharness

ctxharness is worth saving as a gate for CLAUDE.md, AGENTS.md, and .cursorrules drift, not as another generic documentation checker.

𝕏 in

EDITOR'S NOTEUse ctxharness as a fact-test gate for agent instruction files, not as a behavior eval.

The first ctxharness article failed because it framed the project as generic documentation consistency. That misses the real risk. Claude Code, Codex, Cursor, and similar agents do not merely read docs for reference. They use CLAUDE.md, AGENTS.md, .cursorrules, skills, hooks, and local rules as operating context.

When that context lies, the agent can follow a dead path, run an old command, skip a moved module, or spend review time on the wrong boundary. ctxharness is useful because it treats those claims as facts to test before the agent acts.

The useful frame is fact testing

The README examples are small, but they expose the right problem. One example says CLAUDE.md still points at src/config/auth.ts after the real auth config moved to src/modules/auth/config.ts. Another says the doc still instructs npm run typecheck after the package script changed to npm run type-check.

That is not a copyediting issue. In an agent workflow, those are bad inputs. A Codex task that starts from the wrong auth path can produce a confident but irrelevant patch. A Claude Code session that runs the wrong type command can report false confidence. The article is worth rescuing because the tool gives maintainers a way to test the facts their agents are about to trust.

Start with scan before policy

The adoption path should start with discovery, not a blocking CI gate. Run the zero-config path first:

npx ctxharness scan CLAUDE.md

The scanner detects semver claims, file paths, and package-manager scripts in markdown or text files. That matters because a team can inspect the first drift report without writing config. If scan finds old paths or renamed scripts, use those as the first assertions. If it finds mostly low-value trivia, do not make the check mandatory yet.

Turn only high-risk claims into config

The config schema is intentionally narrow. Version 1 config includes a files block and an assertions list. By default, the important files are CLAUDE.md, AGENTS.md, and .cursorrules; node_modules/** is excluded. Each assertion pairs an extractor, which reads the repo truth, with a scanner, which checks the docs.

version: 1
files:
  include:
    - 'CLAUDE.md'
    - 'AGENTS.md'
    - '.cursorrules'
  exclude:
    - 'node_modules/**'
assertions:
  - id: typecheck-script
    extractor: packageScript
    extractorArgs:
      script: type-check
    scanner: literalInMd
    scannerArgs:
      literal: npm run type-check

This is where editorial judgment matters. Do not assert every path in the repo. Assert claims that change agent behavior: auth entry points, migration commands, test commands, package versions that affect builds, MCP server setup, hook files, rule globs, and skill locations.

The extractor and scanner split is the design

The core runner first runs an extractor for the expected value, then runs a scanner against either the global file set or assertion-level scopeFiles. That split is why ctxharness is more valuable than a grep wrapper. fileExists can tell whether a path exists. packageScript can read package scripts. packageJson, nvmrc, tsconfigPaths, prismaModel, trpcRouter, pyprojectToml, cargoToml, and goMod can pull facts from language-specific files.

Scanners then decide whether those facts appear where agents read them. literalInMd, pathReference, inlineRegex, codeBlockRegex, yamlField, and jsonField cover factual claims. vaguenessPattern, negativeConstraintDensity, and contextBudget cover instruction quality. ruleGlobValidity, hookValidity, skillValidity, freshnessScore, and coverageRatio cover context assembly. That map is the article's main point: test context at the same layer where it can fail.

Use scopeFiles and allowlist before blaming the tool

False positives are not a side issue in this category. Agent instructions often include historical notes, changelog fragments, examples, or migration guides that intentionally mention old paths. ctxharness has two controls that should appear in any serious setup: scopeFiles and allowlist.

Use scopeFiles.include when an assertion should only apply to CLAUDE.md and not to every doc file. Use scopeFiles.exclude to keep examples out of enforcement. Use allowlist when a mismatch in a specific file is intentional. The runner marks allowlisted failures as skip, and it tracks warn separately from fail and error. That lets a team keep useful signal without turning every historical mention into a broken build.

CI should move in stages

The GitHub Actions template points to the right rollout sequence. It installs ctxharness, runs ctxharness scan --exit-zero when no .ctxharness.yml exists, and leaves ctxharness run --format gha commented until config exists. That is the right default for a living repo.

A practical rollout has four stages. First, run scan --exit-zero and save the report. Second, create .ctxharness.yml with the five to ten claims most likely to mislead an agent. Third, run ctxharness run in CI and fail only on drift that the team has agreed is real. Fourth, add ctxharness snapshot, ctxharness diff, or ctxharness trend if score movement itself becomes useful in review.

Treat fix as a patch proposal

ctxharness fix is useful because the CLI prints line-level changes when it can map expected and actual values. It only writes files when --apply is passed. That should remain a branch-local action.

In a Codex or Claude Code workflow, ask the agent to run ctxharness fix first and paste the proposed lines into the task. If the changes are simple path or script replacements, apply them on a branch and review the diff. Do not put fix --apply into an unattended merge hook. The tool can repair stale text, but it cannot know whether the stale text was a deliberate migration note, an example, or a product decision.

What ctxharness does not test

The boundary is clear: ctxharness tests context facts, not agent behavior. It can tell whether CLAUDE.md points at an existing auth file. It can tell whether a package script appears in docs. It can flag vague instructions, broken hook paths, oversized context, or invalid rule globs. It cannot tell whether the agent will make the right design choice after reading correct docs.

That means it complements, rather than replaces, output evals. Use Promptfoo or Braintrust for agent response quality. Use code review and tests for implementation behavior. Use ctxharness earlier in the chain: before the agent reads a stale map and starts working from the wrong premise.

A sane first config for agent-heavy repos

For a Codex-heavy repo, the first config should be small. Test the primary setup files: AGENTS.md, CLAUDE.md, .cursorrules, .claude/settings.json, and any checked-in skill index. Add one assertion for the main typecheck command, one for the test command, one for the auth or data-access path, one for MCP or tool configuration, one for hook validity, and one for context budget.

Run ctxharness doctor after that. The doctor view groups issues into L1 doc drift, L2 quality, and L3 assembly. That grouping is more useful than a flat failure list because it tells the team whether the problem is a false fact, weak instruction language, or a broken context-loading path.

Save ctxharness as a fact-test gate for agent context. It is valuable when a team turns a few high-risk claims in CLAUDE.md, AGENTS.md, and .cursorrules into assertions; it is weak when used as a broad doc linter or mistaken for an agent behavior eval.

Practical takeaway

Run npx ctxharness scan CLAUDE.md, convert only the claims that can mislead an agent into .ctxharness.yml, start CI with scan --exit-zero, move to ctxharness run after the signal is trusted, and keep fix --apply as a reviewed branch action.

SOURCES

[1] Primary sourcegithub.com

[2] ctxharness config schemagithub.com

[3] ctxharness CLI implementationgithub.com

[4] ctxharness runnergithub.com

[5] ctxharness scannergithub.com

[6] ctxharness extractorsgithub.com

[7] ctxharness scannersgithub.com

[8] ctxharness GitHub Actions templategithub.com

[9] ctxharness Husky post-merge templategithub.com

ctxharnessclaude codecodexagent contextdocumentation drift

Claude Code 생태계를 앞서가세요

MCP 서버, 스킬, 에이전트 패턴, 바이브코딩 인사이트를 매주 전해드립니다.

무료 구독

OpenCode 전환 후 /code-review · /security-review 공백: opencode-power-pack의 SKILL.md 포팅 구조와 도입 조건

Anthropic 공식 Claude Code 플러그인의 code-review, security-review, feature-dev는 OpenCode에서 그대로 동작하지 않는다. opencode-power-pack은 이 워크플로우들을 OpenCode 네이티브 SKILL.md 포맷으로 번역하고, ~/.config/opencode/opencode.json 한 줄 설정으로 11개 스킬을 적재한다.

2026년 6월 11일

opencodeskillscode-review

도구8분 읽기

guard-skills: Claude Code diff에 catch-all 오류·환각 import·HPOS 패턴을 잡는 5개 리뷰 Skill

guard-skills는 Claude Code, Codex, Cursor가 생성한 코드·테스트·문서에 second-pass 리뷰를 수행하는 5개 Skill 패키지입니다. clean-code-guard는 GitClear·USENIX 연구에 근거한 14가지 AI 실패 패턴을 검사하고, woo-guard는 AI가 반복 생성하는 pre-HPOS WooCommerce 코드 패턴을 직접 겨냥합니다.

2026년 6월 11일

skillsclaude-codecode-review

도구7 min read

baoyu-design: What Changes When You Run claude.ai/design Locally in Claude Code

baoyu-design ports the claude.ai/design methodology to Claude Code by detecting the agent environment at runtime and loading a tool substitution table from `references/claude.md`. The design system pipeline — `compile-design-system.mjs` produces static lint config, `import-design-system.mjs` generates a token allowlist as `_ds_prompt.md` — enforces the same constraint at two layers. Neither is optional if you want session-to-session consistency.

2026년 6월 10일

skillsclaude-codedesign-systems