워크플로우2026년 6월 1일 · 8 min read

Run a CLAUDE.md Break-Even Test for Claude Code

claude-token-efficient is useful when repeated output waste is real, but the 63% figure is a directional warning to measure total cost, not a universal promise.

𝕏 in

EDITOR'S NOTEUse concise CLAUDE.md rules only when they reduce repeated output enough to beat their input-token cost.

A bad token-saving article would put the 63% number in the headline and stop there. That would miss the most useful part of claude-token-efficient: the repo repeatedly warns that instruction files cost input tokens on every turn.

The salvageable idea is narrower and stronger. A short CLAUDE.md can reduce repeated output waste in Claude Code when the same project or automation loop keeps producing pleasantries, repeated context, unsolicited alternatives, or oversized explanations. The configuration is worth publishing only if the article teaches the break-even point: where concise output rules save more than the file costs.

The adoption boundary

Use this method when output waste is repeated and measurable. Good targets are automation pipelines, coding loops, routine reviews, resume bots, and team workflows where response shape matters on every run. Skip it for one-off questions, short chats, and exploratory architecture sessions where long-form reasoning and alternatives are the point. Also skip it when the downstream system needs guaranteed JSON or schema compliance. Prompt rules can nudge behavior; structured outputs and tools enforce contracts.

What the source actually proves

The README proves a practical mechanism: a project-level CLAUDE.md or pasted chat rule can tell Claude Code to read files first, keep output concise, avoid sycophantic openers, avoid unnecessary abstractions, and test before declaring done. The benchmark proves a directional response-side word reduction across five prompts, not a controlled universal token study. The before-after examples prove the kind of waste being removed: a 120-word code review becomes a 30-word off-by-one fix. Issue #1 proves the caution: shorter configurations can beat longer ones on total cost even when the longer rules are reasonable.

Do not sell the 63 percent number

The benchmark says the quiet part directly. Sample size is five prompts. No repeated runs or variance controls. Claude output length varies. Output token savings are response-side only. The CLAUDE.md file itself loads as input tokens on every message. That means the number is useful as a signal, not as a promise. In a serious workflow, the metric is not words saved on one answer. It is total tokens or cost to a correct, tested result across repeated tasks.

Start from the smallest file

The safest install path is to read the file first, then copy it: curl -o CLAUDE.md https://raw.githubusercontent.com/drona23/claude-token-efficient/main/CLAUDE.md. Do not blindly drop it into a repo and call the pipeline fixed. The root file is intentionally short: read existing files before writing, keep reasoning thorough but output concise, avoid re-reading unchanged files, skip files over 100KB unless required, remove sycophantic openers, and verify APIs or package names before asserting them. If one of those rules does not match your actual failure modes, remove it.

Use profiles only when the task needs them

The repo includes profiles for coding, benchmarks, agents, analysis, and versioned strategies. Treat those as starting points, not upgrades. The coding profile adds rules such as code first, simplest working solution, no speculative features, no blind edits, and state the bug then show the fix. That is useful for review and debugging. It may be too restrictive for product design, incident analysis, or architecture tradeoff work. The best CLAUDE.md is not the most complete one. It is the shortest file that corrects behavior you actually see.

Measure the break-even point

Run a before-after test in your own repo. Pick five representative tasks: one code review, one small bug fix, one test failure, one refactor, and one explanation request. Run each without the file, then with the file. Record total input tokens, output tokens, tool calls, retries, and whether the final result passed tests. If output drops but retries rise, you did not save anything. If short answers make humans ask follow-up questions, count that too. The article-worthy metric is cost to accepted result, not a pretty word-count table.

Use this measurement recipe

A simple trial is enough to avoid cargo-culting the file. First, create a branch named claude-md-test. Second, run a normal task without CLAUDE.md and copy the final transcript stats from Claude Code or your provider logs. Third, add the root file with curl -o CLAUDE.md https://raw.githubusercontent.com/drona23/claude-token-efficient/main/CLAUDE.md. Fourth, run the same class of task, not the exact same already-solved prompt. Fifth, record input_tokens, output_tokens, tool calls, retries, elapsed time, and test command result. Sixth, remove any rule that did not address an observed local failure. The pass condition is lower total cost with equal or better acceptance, not lower output alone.

Start from a seven-line local file

For most teams I would start even shorter than the repo profile. A practical first version is: Read existing files before writing. Keep output concise. Prefer targeted edits. Do not re-read unchanged files. Test before saying done. No sycophantic openers or closings. Keep solutions simple. That mirrors the independent issue's lesson that short, high-impact rules often beat long permanent context. Add the repo's stricter coding profile only after the seven-line version fails in a specific way.

Layer rules by scope

The README notes that Claude can read multiple CLAUDE.md files: global, project, and subdirectory-level. Use that instead of one bloated file. Put broad style preferences in the global file. Put repo safety rules in the project file. Put local constraints near risky directories. A payment integration folder can say never change webhook verification without tests. A generated-code folder can say do not edit files manually. This keeps token overhead closer to the task that benefits from it.

Use prompt rules for one-off sessions

The README's quick rule is useful: Rules: Read files first. Write complete solution. Test once. No over-engineering. Paste that into a temporary session when you do not want a persistent instruction file. This is often better for a short task because the rule is visible, scoped, and disappears after the conversation. Persistent CLAUDE.md is for repeated work. Chat rules are for one-off compression.

Keep parser-critical output out of CLAUDE.md

The README explicitly says prompt-based formatting is not enough when parser reliability matters at scale. That is the correct boundary. If a pipeline needs JSON, use JSON mode, schema-backed tools, or an API contract. A CLAUDE.md line that says 'return parseable JSON' may reduce chatter, but it is still a request to the model, not an enforcement layer. Use the file for tone, scope, and recurring workflow habits. Use structured interfaces for machine contracts.

Issue one is the important warning

The independent benchmark in Issue #1 is more valuable than it first looks. It found that the underlying ideas were sound, but an older 61-line version of the rules performed worse than shorter summaries on token-to-green cost. That does not invalidate the project. It clarifies the method. A long instruction file can spend tokens describing behavior the model already follows. The win comes from high-impact rules, not from converting every preference into permanent context.

A team rollout I would accept

For a team, I would not start by copying every profile. I would start with a two-week audit. Collect examples of waste: preambles, repeated explanations, speculative alternatives, blind edits, no-test completion, and parser-hostile formatting. Convert only the top five into rules. Keep the file under a short, reviewable threshold. Add a comment explaining what each rule prevents. Re-run the before-after tasks monthly. Delete rules that no longer change behavior. The file should earn its place every turn.

Where it fits in the Codex ecosystem

This is a configuration method, not an MCP server or a new agent. Its place is beside AGENTS.md, CLAUDE.md, and repo-level workflow rules. For Codex-style work, the same principle applies: persistent instructions are expensive context, so they should encode durable constraints and recurring failure fixes. If a rule is only useful for one task, put it in the prompt. If it protects a repo every day, it belongs in the project instruction file.

Save claude-token-efficient as a measurement-first CLAUDE.md method. It is valuable when short, local rules reduce repeated output waste, but the rules must stay smaller than the problem they solve.

Practical takeaway

Read the root CLAUDE.md, copy it only if its rules match your real failures, and measure five repeated tasks before and after. Track total input tokens, output tokens, retries, tool calls, and test result. Keep permanent rules short, put local constraints in scoped CLAUDE.md files, and use structured outputs instead of prompt rules when downstream parsing must be guaranteed.

SOURCES

[1] Primary sourcegithub.com

[2] Universal CLAUDE.mdgithub.com

[3] Benchmarkgithub.com

[4] Before and after examplesgithub.com

[5] Coding profilegithub.com

[6] Independent benchmark issuegithub.com

[7] Core rulesgithub.com

[8] Reference patternsgithub.com

claude.mdclaude codetoken efficiencyagent workflowsconfiguration

Claude Code 생태계를 앞서가세요

MCP 서버, 스킬, 에이전트 패턴, 바이브코딩 인사이트를 매주 전해드립니다.

무료 구독

OpenCode 전환 후 /code-review · /security-review 공백: opencode-power-pack의 SKILL.md 포팅 구조와 도입 조건

Anthropic 공식 Claude Code 플러그인의 code-review, security-review, feature-dev는 OpenCode에서 그대로 동작하지 않는다. opencode-power-pack은 이 워크플로우들을 OpenCode 네이티브 SKILL.md 포맷으로 번역하고, ~/.config/opencode/opencode.json 한 줄 설정으로 11개 스킬을 적재한다.

2026년 6월 11일

opencodeskillscode-review

도구8분 읽기

guard-skills: Claude Code diff에 catch-all 오류·환각 import·HPOS 패턴을 잡는 5개 리뷰 Skill

guard-skills는 Claude Code, Codex, Cursor가 생성한 코드·테스트·문서에 second-pass 리뷰를 수행하는 5개 Skill 패키지입니다. clean-code-guard는 GitClear·USENIX 연구에 근거한 14가지 AI 실패 패턴을 검사하고, woo-guard는 AI가 반복 생성하는 pre-HPOS WooCommerce 코드 패턴을 직접 겨냥합니다.

2026년 6월 11일

skillsclaude-codecode-review

도구7 min read

baoyu-design: What Changes When You Run claude.ai/design Locally in Claude Code

baoyu-design ports the claude.ai/design methodology to Claude Code by detecting the agent environment at runtime and loading a tool substitution table from `references/claude.md`. The design system pipeline — `compile-design-system.mjs` produces static lint config, `import-design-system.mjs` generates a token allowlist as `_ds_prompt.md` — enforces the same constraint at two layers. Neither is optional if you want session-to-session consistency.

2026년 6월 10일

skillsclaude-codedesign-systems