mcp-token-saver deserves to be saved, but not as another article about reducing tokens. The stronger version is about putting a usage gate in front of Claude Code before a long agent task starts. That is useful because Claude Code's visible usage bar is a human signal, not a model-readable operating rule.

The current project is also a good test of the rescue pipeline itself. The README describes v0.2.1 tools that read live Claude.ai subscription utilization, forecast burn rate, decide whether a task should proceed, measure real task delta, and inspect cache hit rate. But the repository's root CLAUDE.md still describes old v0.1 tools named estimate_tokens, optimize_context, and check_budget. That mismatch is not a reason to drop the source. It is the editorial angle: if a usage MCP can influence whether an agent proceeds, the first job is to audit the protocol that tells the agent how to use it.

What v0.2 Actually Exposes

server.ts registers five MCP tools: usage_status, usage_forecast, should_proceed, usage_delta, and cache_stats. That makes the project different from a tokenizer plugin. It is trying to turn Claude.ai subscription state into a decision surface for coding agents.

usage_status reads current 5-hour, 7-day, Sonnet weekly, and extra-usage fields, then appends a snapshot to local history. usage_forecast uses that history to estimate burn rate and whether 100% arrives before reset, but only after enough samples exist. should_proceed is the gate: it maps a planned task size to a projected usage increase and returns proceed, downgrade, or abort. usage_delta marks and measures one task's real usage delta. cache_stats reads Claude Code JSONL logs and reports cache hit rate.

That set is article-worthy because it changes the question from 'how many tokens might this prompt use?' to 'should this agent task run in this session right now?'

Why Claude Code Needs a Tool, Not a Reminder

The specific Claude Code problem is visibility. A human can glance at a usage bar and decide not to start a broad refactor. The model cannot see that bar unless usage enters the session as tool output or hook-injected context. That is why this MCP matters to agent work: it makes session pressure available at the moment Claude Code is about to read files, generate a long answer, or keep running a multi-step plan.

Claude Code's MCP docs also change the operational check. A configured server is not enough. claude mcp list, claude mcp get token-saver, and the /mcp panel tell the user whether the server is approved, connected, and exposing tools. With tool search enabled, the tool definitions may be deferred until needed, so the prompt protocol has to name the decision point clearly: before large reads and long responses, call should_proceed.

That is the Codex-style lesson. A usage gate is only useful if it is tied to a repeatable work phase: understand scope, check budget, run the task, measure delta, then report what changed.

The Credential and Endpoint Boundary

The implementation is local, but it is not risk-free. anthropicUsage.ts reads ~/.claude/.credentials.json, checks the Claude OAuth token expiry, and calls https://api.anthropic.com/api/oauth/usage with the anthropic-beta: oauth-2025-04-20 header and x-app: vscode. The README is direct that this endpoint is not an official public Anthropic API and may change without notice.

That means the tool should not be installed as a shared project dependency by default. It belongs in user-local MCP configuration first. If it fails, the failure mode should be small: one developer loses a usage signal, not a whole team losing a required gate. The tool does not send code to a new service, but it does depend on a local OAuth token and an internal usage endpoint. Those are the boundaries the article needs to name.

Install It as a User-Local Trial

A reversible trial is better than committing .mcp.json into a repo. The package exposes mcp-token-saver as a Node >=18 binary, so a user-local Claude Code MCP entry can run it through npx:

{
  "mcpServers": {
    "token-saver": {
      "command": "npx",
      "args": ["-y", "mcp-token-saver"]
    }
  }
}

Then verify the server, not just the config file:

claude login
claude mcp list
claude mcp get token-saver

In Claude Code, the /mcp panel should show that the server exposes tools. If the server is pending approval, rejected, or has no tools, stop there. The usage gate should not be treated as active until the client can see the current v0.2 tool names.

Copy the Current Protocol, Not the Stale One

This is the part the old article would have missed. The repository root CLAUDE.md is stale. It says the server ships estimate_tokens, optimize_context, and check_budget, and it describes a daily USD budget file. The README says v0.2 dropped those tools because they were estimates and fake local cost accounting. The current protocol lives in src/prompts/system.md and names the five live tools.

That drift matters because Claude Code memory is context, not enforcement. Official docs say CLAUDE.md guides behavior but does not guarantee compliance. If the copied instructions name tools that no longer exist, the model may ignore the protocol or waste turns trying to call dead tools.

Before adding this to ~/.claude/CLAUDE.md, run this check:

Ask Claude Code to list the token-saver tools it can see.
It must name usage_status, usage_forecast, should_proceed, usage_delta, and cache_stats.
If it names estimate_tokens, optimize_context, or check_budget, the protocol is stale.

A One-Task Acceptance Test

The first real test should be boring and measurable. Start with a small task, mark a baseline, perform the work, and measure the delta:

Call usage_status and summarize the current 5h and 7d usage.
Call usage_delta with action="mark" and label="small-doc-edit".
Before editing, call should_proceed with task_size="small".
After the edit and verification, call usage_delta with action="measure".

The result should show three things: current usage exists, the gate decision matches the task size, and the final delta is plausible. If usage_forecast says there is not enough history yet, that is expected until enough snapshots have been collected. If any tool returns an OAuth error, run claude login and continue without usage gating for that turn.

Where Hooks Fit

The README also includes a UserPromptSubmit hook that prints [claude-usage] and an optional [claude-usage-directive] before each prompt. Claude Code's hook docs support that pattern: a UserPromptSubmit hook can add stdout or additionalContext before the model processes the prompt.

That hook is stronger than hoping the model remembers to call usage_status, but it should still be treated as an advisory signal. Keep the timeout short. Make it fail open. Confirm it does not print sensitive token material. If the hook injects compression directives at 60%, 80%, or 95% usage, the team should decide whether that style override is desirable before installing it globally.

Hooks are the right layer when the signal must appear every turn. CLAUDE.md is the right layer when the model should know how to interpret that signal.

Cache Stats Are a Separate Decision

cache_stats is valuable for a different reason. It does not call Anthropic. It reads local Claude Code session logs under ~/.claude/projects, finds recent assistant usage records, and reports the share of prompt input served from cache. If the hit rate drops below 40%, the tool warns that prompt cache may be invalidating frequently.

That can help a long Codex-style task, but it should not be mixed up with subscription gating. High cache hit rate does not mean the next generation is cheap enough to run. Low cache hit rate does not prove the task should stop. Treat it as a debugging signal for session shape: tool lists, system prompt changes, and context churn.

When the Gate Should Say No

The most useful adoption rule is negative. Do not let this tool become a fake safety guarantee. should_proceed uses rough task-size constants: small is 0.5 percentage points, medium is 2, large is 6, and huge is 12. Those are useful heuristics, not a contract with Anthropic billing.

Use the gate when the alternative is silently starting a huge scan at 92% session usage. Do not use it to promise exact cost, skip human judgment, or hide usage from the user. If the planned task is broad, first split the task. If the endpoint is unavailable, state that usage is unknown. If a team needs enforceable budget policy, the local MCP is not enough by itself.

Adoption Verdict

mcp-token-saver is worth publishing because it captures a real agent-operations problem: the model cannot see the usage bar unless usage becomes a tool or hook signal. The implementation is concrete enough to inspect, and the move from v0.1 estimates to v0.2 real usage is the right direction.

But the article should be strict. Install it user-locally first. Verify the current tool list. Copy src/prompts/system.md, not the stale root CLAUDE.md. Treat /api/oauth/usage as an internal endpoint. Measure one task before trusting the gate. Remove the server if it fails in a way the user cannot explain.

The source is not a finished operating policy. It is a good local usage gate that needs a verification checklist before it earns a place in a serious Claude Code setup.

mcp-token-saver is publishable as a Claude Code usage-gate audit: useful for local visibility and pre-flight decisions, unsafe when copied as a stale or shared protocol without checking the live tool contract.

Practical takeaway

Install it user-locally, run claude mcp list and claude mcp get token-saver, verify the five current tools, copy src/prompts/system.md, run one usage_status plus usage_delta trial, inspect ~/.mcp-token-saver/usage_history.jsonl, and only then decide whether the gate belongs in regular Claude Code work.