도구2026년 6월 1일 · 9 min read

Reviewing Claude Code on Steroids for Runtime Safety

The useful review is not whether the skill count is larger. It is whether the installer, hooks, overrides, tests, and token evidence are safe enough to become part of Claude Code's operating layer.

𝕏 in

EDITOR'S NOTEThe useful lesson is not that a larger Claude Code skill pack is automatically better. Treat it as agent infrastructure: audit installer side effects, pinning behavior, hooks, tests, and local cost evidence before adoption.

The original article seed is salvageable, but not as a cheerful upgrade note. For Claude Code practitioners, a skill pack is not just content once it writes into ~/.claude, adds startup hooks, injects boot context, pins overrides, and changes which workflow the agent sees before a task begins. At that point it is developer infrastructure.

GadaaLabs' Claude Code on Steroids is interesting because it touches that runtime directly. The repository does not only publish skill text. It ships an installer, writes into ~/.claude, copies 24 skill directories, adds a /tokenburn command, creates a memory template, builds a local CLI, and uses a SessionStart hook plus apply.sh to reapply pinned overrides when the upstream plugin changes.

That is the article. Treat the project as agent infrastructure. Useful, but only after review.

Start with the side effects

The README's install path is convenient:

curl -fsSL https://raw.githubusercontent.com/GadaaLabs/claude-code-on-steroids/main/install.sh | bash

For a magazine article, convenience is not the main point. The inspected install.sh changes user-level Claude state. It checks for the claude CLI, looks for an existing superpowers plugin under ~/.claude/plugins/cache/claude-plugins-official, otherwise creates a standalone skills directory under ~/.claude/superpowers/skills, copies skill directories, installs command files, creates examples and memory templates, and builds tokenburn when Node.js 20+ is present.

That is enough scope to deserve a clone-first recommendation. A team should inspect the script locally before piping it into a shell, especially on a workstation that already has project-specific Claude behavior.

Run this first-pass audit

A reader does not need to trust the article or the README. Run a local audit before installation:

git clone https://github.com/GadaaLabs/claude-code-on-steroids
cd claude-code-on-steroids
git rev-parse HEAD
rg -n "CLAUDE_HOME|PLUGINS_BASE|SKILLS_DIR|COMMANDS_DIR|TOKENBURN_SRC|npm link" install.sh
rg -n "for skill in|installed_plugins|cp \"\$src\"" apply.sh
rg -n "additionalContext|hookSpecificOutput|ascend" hooks/session-start
rg -n "SessionStart|startup|compact" hooks/hooks.json
bash tests/claude-code/run-skill-tests.sh
bash tests/skill-triggering/run-all.sh

Interpret the output before proceeding. If install.sh cannot find claude, stop and install Claude Code deliberately. If the apply.sh pinned list includes skills your team has not reviewed, do not enable override behavior yet. If the tests fail before installation, do not treat the skill pack as a stable dependency. If the audit passes, install from the clone with bash install.sh rather than from the pipe so the reviewed source is the source you run.

Verify the install before using the skills

After a clone-based install, run a separate smoke check. The point is not to prove every workflow. The point is to verify that the expected files landed and that the user-level configuration now matches the behavior you agreed to adopt:

claude --version
find ~/.claude -path '*skills/oracle/SKILL.md' -print
find ~/.claude -path '*skills/forge/SKILL.md' -print
find ~/.claude -path '*commands/tokenburn.md' -print
find ~/.claude -path '*superpowers-overrides*' -maxdepth 4 -type f | sort | head -40

Then open one fresh Claude Code session in a low-risk repo and run only non-mutating checks:

/oracle
/tokenburn week

The expected result is boring: oracle should describe task intake behavior, /tokenburn should either open the dashboard or clearly report that tokenburn is missing, and startup should not inject duplicate warnings. If any of those checks are surprising, stop before using the skills on production work.

The skill count is not the adoption argument

The README comparison table says the package moves from 14 skills to 24, adds 10 new skills, adds domain expertise for ML, AI, embedded, and UI work, adds an intelligence layer, adds multi-agent templates, adds API pre-verification in TDD, adds skill chains, and adds token analytics. Those are useful claims, but they are not enough to justify adoption.

The review question is narrower: which part changes a team's daily failure mode? oracle can reduce cold starts by forcing task classification. pathfinder can make codebase entry less random. forge can make test writing less imaginary by requiring API verification. vector can prevent expensive models from handling mechanical work. horizon can make long-session context drift visible. The article should evaluate those behaviors, not count skills.

Override protection is power with ownership

The most important file after the installer is apply.sh. It reads ~/.claude/plugins/installed_plugins.json, finds superpowers@claude-plugins-official, and copies pinned files from ~/.claude/plugins/superpowers-overrides into the live plugin directory. The pinned list is specific: ascend, chronicle, blueprint, forge, pathfinder, phantom, vector, legion, and commander.

That is useful when a team needs stable local behavior across plugin updates. It is also an ownership line. Once pinned overrides win at session start, the team is no longer simply using upstream superpowers. It is carrying a local operating contract. The safe version of this article tells readers to keep a changelog for pinned skills and review upstream changes before letting overrides mask them.

SessionStart turns skills into boot context

hooks/session-start is not a minor helper. It reads skills/ascend/SKILL.md, escapes it into JSON, and injects it as additional context for Cursor, Claude Code, or SDK-standard clients. hooks/hooks.json registers that behavior for startup, clear, and compact through hooks/run-hook.cmd session-start.

This is the right layer to inspect because it changes what the agent sees before the user asks anything. It can improve skill discovery and keep the workflow rules present. It also consumes boot context and makes the first response depend on a generated hook payload. A team adopting it should verify one fresh Claude Code session and one compacted session, then confirm the expected ascend content appears without duplicate or stale warnings.

Ambient rules are the low-risk first step

The Karpathy guidelines example is a better first adoption step than the full installer for many teams. It adds two ambient rules to ~/.claude/CLAUDE.md or a project CLAUDE.md: Think Before Coding and Goal-Driven Execution. Those rules do not need a skill invocation. They apply every session.

That matters because many skill systems fail at the invocation boundary. The user asks for a fix, the agent forgets to call the right skill, and the workflow never starts. Ambient rules are not as rich as forge or sentinel, but they cover the gap between no workflow and an invoked workflow. A careful rollout can start there, then add /tokenburn, then add selected skills, then decide whether global override hooks are justified.

Choose one adoption path

There are three practical paths, and they should not be mixed on day one.

Path A is ambient policy only. Use it when the team wants better default behavior but does not want startup hooks or pinned overrides:

cd claude-code-on-steroids
sed -n '1,220p' examples/karpathy-guidelines.md
cp ~/.claude/CLAUDE.md ~/.claude/CLAUDE.md.backup 2>/dev/null || true
cat examples/karpathy-guidelines.md >> ~/.claude/CLAUDE.md

Path B is observability first. Use it when the team wants token evidence before workflow changes. Install from the clone, run /tokenburn week, and record the highest-cost projects, activities, and shell commands before enforcing any new skill chain.

Path C is full runtime adoption. Use it only after reviewing apply.sh and deciding who owns pinned overrides. The acceptance check is explicit: after install, find ~/.claude -path '*skills/forge/SKILL.md' -print and find ~/.claude -path '*skills/pathfinder/SKILL.md' -print must return the expected files, and a new Claude Code session must show the expected startup behavior without duplicate warnings.

TokenBurn should be used as evidence, not marketing

The README's token-savings table is the easiest part to overstate. It describes repeated debugging, context-window management, mechanical tasks, codebase exploration, and bootstrap overhead. Those are plausible categories, but they are maintainer-side estimates unless readers measure their own sessions.

The useful tool is /tokenburn. The command opens a native terminal dashboard on macOS and falls back to a direct shell command elsewhere. The reference script parses local ~/.claude/projects/**/*.jsonl, groups usage by day, project, model, activity, tools, shell commands, and MCP servers, and calculates cost from baked-in Claude pricing. That gives teams a way to verify whether their own habits changed. It does not prove that a specific skill caused savings by itself.

The workflow chain is a review checklist

The README's six workflow chains are useful because they name order. A feature path starts with oracle and chronicle, moves through domain skills, architect, blueprint, horizon, vector, legion, phantom, sentinel, tribunal, then stores the result through chronicle. A debug path starts with memory, root-cause investigation, a failing test, verification, and memory storage.

That chain should not be treated as ceremony. It should become a review checklist. For a non-trivial feature, did the agent classify the task? Did it search past patterns? Did it pick a domain skill only when relevant? Did it define an implementation plan? Did it run verification before claiming completion? If the answer is no, the skill pack is installed but not operating.

Forge is the clearest behavior change

forge is worth calling out because it changes a concrete agent mistake. Before writing tests that reference an external library, internal export, or API, the skill requires verification that the import path, method, or function exists. It then applies the TDD rule: no production code before a failing test.

For Codex readers, that is not philosophical. It prevents a common failure where an agent writes tests against functions that do not exist, then spends the next hour making the repo match the invented API. A good adoption rule can be small: for behavior changes, run the existing test command first, verify the function or library surface, write one failing test, then implement. If the team adopts only one skill behavior from the pack, this is a defensible one.

Pathfinder protects inherited codebases

pathfinder is valuable because it tells the agent not to full-read large files on the first pass. Its progressive strategy starts with directory shape, then file headers, then function signatures, then full source only after relevance is confirmed. That matches how a senior engineer enters an unfamiliar codebase.

This is a practical Codex use case. Add a project rule: before editing an inherited or unfamiliar module, map entry points, tests, conventions, traps, environment dependencies, and large files. The payoff is not that the skill sounds structured. The payoff is fewer patches that fight the codebase's established patterns.

Vector and horizon need local policy

vector routes tasks by complexity: no LLM for mechanical transforms, cheap models for obvious work, standard models for multi-file or investigative work, and high-capability models for architecture, security, cross-system, or novel tasks. In Claude Code itself, that routing may be advisory unless the workflow dispatches subagents, but the policy is still useful.

horizon handles context health: compress before dispatching subagents, prepare a fresh-session handoff when context is critical, and avoid sending bloated history into focused workers. Together, these skills need a local rule. Teams should define which tasks are Tier 0 shell work, when escalation is required, and when a fresh handoff is safer than continuing a long session.

Tests exist, but they are not total proof

The repository has test scripts, which is a strong signal for a skill pack. tests/claude-code/run-skill-tests.sh runs fast tests for phantom, forge, and hunter by default, and supports integration tests. tests/skill-triggering/run-all.sh checks prompting for forge, hunter, blueprint, phantom, tribunal, and commander.

That does not prove every chain works in every host. It does prove the maintainer thought about skill behavior as something that can regress. A reader should run the fast tests before editing skills, run triggering tests when changing descriptions or invocation language, and treat integration tests as the bar before publishing a forked skill pack to other users.

There are small consistency warnings

A source-backed article should mention minor but real inconsistencies. The plugin metadata names the package claude-code-superpowers and points homepage/repository fields at GadaaLabs/claude-code-superpowers, while the README and installer use GadaaLabs/claude-code-on-steroids. The installer's final next-step commands mention /task-intake, /ml-engineering, and /ai-engineering, while the inspected skill names and README quick start point readers to /oracle, /gradient, and /nexus.

These are not fatal problems. They are review findings. They tell adopters to verify commands from the installed command and skill directories rather than trusting every README line blindly.

The rollout I would accept

I would not install the full stack on a production workstation first. Use a staged rollout and make each stage observable:

git clone https://github.com/GadaaLabs/claude-code-on-steroids
cd claude-code-on-steroids
sed -n '1,260p' install.sh
sed -n '1,220p' apply.sh
sed -n '1,220p' hooks/session-start
sed -n '1,120p' hooks/hooks.json
sed -n '1,120p' .claude-plugin/plugin.json
bash tests/claude-code/run-skill-tests.sh
bash tests/skill-triggering/run-all.sh

If that review passes, back up user-level Claude state or use a disposable account before installation:

cp -R ~/.claude ~/.claude.backup.$(date +%Y%m%d-%H%M%S)
bash install.sh
find ~/.claude -path '*skills/oracle/SKILL.md' -print
find ~/.claude -path '*commands/tokenburn.md' -print

Then adopt one layer at a time. Add only the Karpathy CLAUDE.md section if the team needs baseline behavior. Run /tokenburn week for a baseline before changing workflows. Pick either forge for TDD or pathfinder for inherited codebases as the first enforced skill. Enable pinned overrides only after someone owns the local fork, upstream review, and skill tests.

What to put in AGENTS.md

A Codex team can borrow the system without adopting the installer. The project-level rule can be short:

For non-trivial work, classify the task before implementation, verify APIs before writing tests, map unfamiliar code before editing, and record the verification command before claiming completion. Use token logs to identify expensive repeated workflows; do not cite token savings without local evidence.

If the team does use the GadaaLabs pack, add the operational details too: where overrides live, who reviews pinned skill changes, which tests must pass after skill edits, and whether startup hooks are allowed in shared machines.

When I would skip it

Skip the full installer if the team cannot own global Claude configuration. Skip pinned overrides if upstream plugin behavior should remain canonical. Skip the multi-skill chains for one-off scripts, small docs edits, or repos where a project-local AGENTS.md already encodes the needed behavior. Skip token-savings claims until /tokenburn has enough local history to show a before-and-after pattern.

The full pack is strongest for teams that repeatedly use Claude Code across complex repos, care about workflow discipline, and are willing to operate their agent environment like developer infrastructure. It is too much for a user who only wants a single prompt improvement.

Save the GadaaLabs article, but save it as an adoption audit. The source is useful when readers learn how to inspect installer side effects, startup hooks, pinned overrides, tests, and token evidence before treating a skill pack as part of their Claude Code runtime.

Practical takeaway

Do not start by piping the installer into a shell. Clone the repo, inspect install.sh, apply.sh, hooks/session-start, hooks/hooks.json, and .claude-plugin/plugin.json, run bash tests/claude-code/run-skill-tests.sh and bash tests/skill-triggering/run-all.sh, then choose one adoption layer: ambient CLAUDE.md rules, /tokenburn, forge, pathfinder, or the pinned override system. Measure token patterns locally and only enable global overrides when a specific owner will review future upstream drift.

SOURCES

[1] Primary sourcegithub.com

[2] installer scriptgithub.com

[3] override appliergithub.com

[4] SessionStart hookgithub.com

[5] hook registrationgithub.com

[6] plugin metadatagithub.com

[7] tokenburn slash commandgithub.com

[8] tokenburn Python referencegithub.com

[9] Karpathy guidelines examplegithub.com

[10] oracle skillgithub.com

[11] forge skillgithub.com

[12] skill testsgithub.com

[13] chronicle skillgithub.com

[14] pathfinder skillgithub.com

[15] vector skillgithub.com

[16] horizon skillgithub.com

[17] triggering testsgithub.com

claude codeskillscodexagent infrastructuretoken analytics

Claude Code 생태계를 앞서가세요

MCP 서버, 스킬, 에이전트 패턴, 바이브코딩 인사이트를 매주 전해드립니다.

무료 구독

OpenCode 전환 후 /code-review · /security-review 공백: opencode-power-pack의 SKILL.md 포팅 구조와 도입 조건

Anthropic 공식 Claude Code 플러그인의 code-review, security-review, feature-dev는 OpenCode에서 그대로 동작하지 않는다. opencode-power-pack은 이 워크플로우들을 OpenCode 네이티브 SKILL.md 포맷으로 번역하고, ~/.config/opencode/opencode.json 한 줄 설정으로 11개 스킬을 적재한다.

2026년 6월 11일

opencodeskillscode-review

도구8분 읽기

guard-skills: Claude Code diff에 catch-all 오류·환각 import·HPOS 패턴을 잡는 5개 리뷰 Skill

guard-skills는 Claude Code, Codex, Cursor가 생성한 코드·테스트·문서에 second-pass 리뷰를 수행하는 5개 Skill 패키지입니다. clean-code-guard는 GitClear·USENIX 연구에 근거한 14가지 AI 실패 패턴을 검사하고, woo-guard는 AI가 반복 생성하는 pre-HPOS WooCommerce 코드 패턴을 직접 겨냥합니다.

2026년 6월 11일

skillsclaude-codecode-review

도구7 min read

baoyu-design: What Changes When You Run claude.ai/design Locally in Claude Code

baoyu-design ports the claude.ai/design methodology to Claude Code by detecting the agent environment at runtime and loading a tool substitution table from `references/claude.md`. The design system pipeline — `compile-design-system.mjs` produces static lint config, `import-design-system.mjs` generates a token allowlist as `_ds_prompt.md` — enforces the same constraint at two layers. Neither is optional if you want session-to-session consistency.

2026년 6월 10일

skillsclaude-codedesign-systems