The first ctxharness article failed because it framed the project as generic documentation consistency. That misses the real risk. Claude Code, Codex, Cursor, and similar agents do not merely read docs for reference. They use CLAUDE.md, AGENTS.md, .cursorrules, skills, hooks, and local rules as operating context.

When that context lies, the agent can follow a dead path, run an old command, skip a moved module, or spend review time on the wrong boundary. ctxharness is useful because it treats those claims as facts to test before the agent acts.

The useful frame is fact testing

The README examples are small, but they expose the right problem. One example says CLAUDE.md still points at src/config/auth.ts after the real auth config moved to src/modules/auth/config.ts. Another says the doc still instructs npm run typecheck after the package script changed to npm run type-check.

That is not a copyediting issue. In an agent workflow, those are bad inputs. A Codex task that starts from the wrong auth path can produce a confident but irrelevant patch. A Claude Code session that runs the wrong type command can report false confidence. The article is worth rescuing because the tool gives maintainers a way to test the facts their agents are about to trust.

Start with scan before policy

The adoption path should start with discovery, not a blocking CI gate. Run the zero-config path first:

npx ctxharness scan CLAUDE.md

The scanner detects semver claims, file paths, and package-manager scripts in markdown or text files. That matters because a team can inspect the first drift report without writing config. If scan finds old paths or renamed scripts, use those as the first assertions. If it finds mostly low-value trivia, do not make the check mandatory yet.

Turn only high-risk claims into config

The config schema is intentionally narrow. Version 1 config includes a files block and an assertions list. By default, the important files are CLAUDE.md, AGENTS.md, and .cursorrules; node_modules/** is excluded. Each assertion pairs an extractor, which reads the repo truth, with a scanner, which checks the docs.

version: 1
files:
  include:
    - 'CLAUDE.md'
    - 'AGENTS.md'
    - '.cursorrules'
  exclude:
    - 'node_modules/**'
assertions:
  - id: typecheck-script
    extractor: packageScript
    extractorArgs:
      script: type-check
    scanner: literalInMd
    scannerArgs:
      literal: npm run type-check

This is where editorial judgment matters. Do not assert every path in the repo. Assert claims that change agent behavior: auth entry points, migration commands, test commands, package versions that affect builds, MCP server setup, hook files, rule globs, and skill locations.

The extractor and scanner split is the design

The core runner first runs an extractor for the expected value, then runs a scanner against either the global file set or assertion-level scopeFiles. That split is why ctxharness is more valuable than a grep wrapper. fileExists can tell whether a path exists. packageScript can read package scripts. packageJson, nvmrc, tsconfigPaths, prismaModel, trpcRouter, pyprojectToml, cargoToml, and goMod can pull facts from language-specific files.

Scanners then decide whether those facts appear where agents read them. literalInMd, pathReference, inlineRegex, codeBlockRegex, yamlField, and jsonField cover factual claims. vaguenessPattern, negativeConstraintDensity, and contextBudget cover instruction quality. ruleGlobValidity, hookValidity, skillValidity, freshnessScore, and coverageRatio cover context assembly. That map is the article's main point: test context at the same layer where it can fail.

Use scopeFiles and allowlist before blaming the tool

False positives are not a side issue in this category. Agent instructions often include historical notes, changelog fragments, examples, or migration guides that intentionally mention old paths. ctxharness has two controls that should appear in any serious setup: scopeFiles and allowlist.

Use scopeFiles.include when an assertion should only apply to CLAUDE.md and not to every doc file. Use scopeFiles.exclude to keep examples out of enforcement. Use allowlist when a mismatch in a specific file is intentional. The runner marks allowlisted failures as skip, and it tracks warn separately from fail and error. That lets a team keep useful signal without turning every historical mention into a broken build.

CI should move in stages

The GitHub Actions template points to the right rollout sequence. It installs ctxharness, runs ctxharness scan --exit-zero when no .ctxharness.yml exists, and leaves ctxharness run --format gha commented until config exists. That is the right default for a living repo.

A practical rollout has four stages. First, run scan --exit-zero and save the report. Second, create .ctxharness.yml with the five to ten claims most likely to mislead an agent. Third, run ctxharness run in CI and fail only on drift that the team has agreed is real. Fourth, add ctxharness snapshot, ctxharness diff, or ctxharness trend if score movement itself becomes useful in review.

Treat fix as a patch proposal

ctxharness fix is useful because the CLI prints line-level changes when it can map expected and actual values. It only writes files when --apply is passed. That should remain a branch-local action.

In a Codex or Claude Code workflow, ask the agent to run ctxharness fix first and paste the proposed lines into the task. If the changes are simple path or script replacements, apply them on a branch and review the diff. Do not put fix --apply into an unattended merge hook. The tool can repair stale text, but it cannot know whether the stale text was a deliberate migration note, an example, or a product decision.

What ctxharness does not test

The boundary is clear: ctxharness tests context facts, not agent behavior. It can tell whether CLAUDE.md points at an existing auth file. It can tell whether a package script appears in docs. It can flag vague instructions, broken hook paths, oversized context, or invalid rule globs. It cannot tell whether the agent will make the right design choice after reading correct docs.

That means it complements, rather than replaces, output evals. Use Promptfoo or Braintrust for agent response quality. Use code review and tests for implementation behavior. Use ctxharness earlier in the chain: before the agent reads a stale map and starts working from the wrong premise.

A sane first config for agent-heavy repos

For a Codex-heavy repo, the first config should be small. Test the primary setup files: AGENTS.md, CLAUDE.md, .cursorrules, .claude/settings.json, and any checked-in skill index. Add one assertion for the main typecheck command, one for the test command, one for the auth or data-access path, one for MCP or tool configuration, one for hook validity, and one for context budget.

Run ctxharness doctor after that. The doctor view groups issues into L1 doc drift, L2 quality, and L3 assembly. That grouping is more useful than a flat failure list because it tells the team whether the problem is a false fact, weak instruction language, or a broken context-loading path.

Save ctxharness as a fact-test gate for agent context. It is valuable when a team turns a few high-risk claims in CLAUDE.md, AGENTS.md, and .cursorrules into assertions; it is weak when used as a broad doc linter or mistaken for an agent behavior eval.

Practical takeaway

Run npx ctxharness scan CLAUDE.md, convert only the claims that can mislead an agent into .ctxharness.yml, start CI with scan --exit-zero, move to ctxharness run after the signal is trusted, and keep fix --apply as a reviewed branch action.