워크플로우2026년 6월 1일 · 10 min read

Using gstack for AI Workflow Boundaries in Claude Code

The practical lesson is not that gstack turns an agent into a project manager. It shows how to put planning, review, browser QA, release, memory, and host-specific install rails around AI coding.

𝕏 in

EDITOR'S NOTEThe useful story is not that gstack makes Claude Code a project manager. It is a full workflow boundary for agent work: planning, review, browser QA, release, memory, and host-specific skill installation.

The old draft should be rescued because the source is much stronger than the angle. gstack is not mainly a project-management story. It is a workflow boundary around AI coding: product framing, architecture review, design critique, code review, browser QA, security, shipping, deployment, and retrospectives encoded as slash skills.

That matters for Claude Code and Codex users because most agent failures happen between tools, not inside one prompt. The agent writes code before the product question is clear. It skips browser verification. It says tests passed without checking the real path. It opens a PR without a review boundary. gstack tries to put named stops between those moments.

The right magazine question is not 'does gstack make Claude Code agile?' It is: when should a team accept this much workflow machinery, and how should they adopt it without turning every small edit into ceremony?

The core idea is a boundary, not a manager

The README describes gstack as a virtual engineering team: CEO reviewer, eng manager, designer, reviewer, QA lead, security officer, release engineer, and more. That phrasing can sound like hype if repeated directly.

The practical interpretation is sharper. gstack creates boundaries around phases where AI agents commonly skip work: office-hours before scope hardens, plan-eng-review before implementation, review before landing, qa against a real browser, ship before PR creation, land-and-deploy before production verification, and retro after the week ends. The value is not personality labels. It is forcing a state transition before the next expensive step.

Run the five-command trial first

The README's own quick start is conservative: install, run /office-hours, run /plan-ceo-review, run /review, run /qa, then stop. That is the right adoption posture. Do not begin with the full skill catalog.

A serious trial should look like this:

git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/gstack-audit
cd ~/gstack-audit
git rev-parse HEAD
sed -n '1,260p' setup
sed -n '1,220p' package.json
bun install
bun test
bun run skill:check

Then install only for the host you actually use. A Codex user should inspect ./setup --host codex behavior before assuming Claude instructions map cleanly to Codex skill discovery.

Host installation is part of the product

gstack is not just a directory of Markdown files. The setup script parses host flags, supports Claude, Codex, Kiro, Factory, OpenCode, and auto-detection, and treats OpenClaw, Hermes, and GBrain as different integration models. For Codex, it generates .agents/skills/gstack-* output and creates a minimal ~/.codex/skills/gstack runtime root so Codex does not discover both raw source skills and generated skills.

That detail matters. A portable skill pack has to answer where instructions live, which paths are rewritten, which runtime assets are available, and how duplicate discovery is avoided. gstack is worth covering because it exposes those packaging decisions in source instead of pretending all agents read skills the same way.

Team mode is a runtime decision

The README recommends team mode for shared repos. The setup script shows what that means: it sets auto_upgrade and team_mode, registers a SessionStart hook through gstack-settings-hook, and points users at gstack-team-init required.

That can be valuable for teams that want consistent agent behavior. It is also no longer just documentation. A startup hook that updates or routes skills at session start changes the runtime environment for every contributor using that repo. The rule should be explicit: team mode needs an owner, a rollback path, and a review policy for gstack upgrades. Without that, a team can accidentally outsource its agent operating policy to the latest upstream version.

Generated skills change the contribution model

AGENTS.md repeats an important convention: SKILL.md files are generated from .tmpl templates. Contributors should edit the template, run bun run gen:skill-docs, and commit both source and generated output.

That is a mark of maturity because one skill system now emits host-specific variants. It is also a trap for casual forks. If a team edits a generated SKILL.md directly, the next generation pass can erase the change. If they forget to run generation, the template and published skill drift. A magazine article should tell readers this because it decides whether gstack is a safe fork target or only a user-level install.

Browser QA is the strongest concrete subsystem

The most concrete technical asset is the browser layer. The README and AGENTS.md describe /browse as a fast headless browser surface. The package depends on Playwright and Puppeteer Core. The browse skill says it can navigate URLs, click, diff state, take screenshots, test forms, handle dialogs, and check responsive layouts.

This is where gstack becomes more than prompt discipline. /qa can run against a staging URL, find bugs, fix them, commit each fix atomically, and re-verify. Whether a team wants that much autonomy is a product decision, but the loop itself is exactly what AI coding needs: a real browser as evidence, not a text-only confidence report.

Review is where the workflow earns its keep

The review skill is positioned as pre-landing review. Its source says it analyzes diffs against the base branch for SQL safety, LLM trust-boundary violations, conditional side effects, and structural issues. That list is better than a generic 'code review' promise because it names classes of bugs that can pass ordinary CI.

For Codex readers, the adoption path is straightforward. Run /review on a branch that already has tests. Compare the findings to human review. If it catches real issues and does not flood the team with noise, keep it. If it only restates style preferences, do not promote the whole stack.

Ship is a release protocol, not a button

/ship is described as detecting and merging the base branch, running tests, reviewing the diff, bumping VERSION, updating CHANGELOG, committing, pushing, and opening a PR. That is powerful because release work is where AI agents often skip boring but important details.

It is also why teams should not start here. A release skill must match the repo's branch model, version policy, changelog style, test commands, and PR rules. The safe rollout is to use /ship first in report mode or on a low-risk repo, then decide which parts become local policy.

Use the productivity story as context only

The README opens with a strong productivity narrative. The LOC controversy document is more useful because it admits the critique: LOC is a weak metric, AI inflates lines, greenfield work is different from maintenance, and quality-adjusted productivity is not fully proven.

That honesty should shape the article. The point is not to repeat a 240x or 810x claim as if it proves gstack will work for every team. The point is that the maintainer connected high AI output to tests, reviews, browser QA, slop scans, and release discipline. That connection is the transferable lesson. Speed without verification is not leverage. It is risk.

OpenClaw shows the right portability boundary

The OpenClaw document is useful because it does not pretend every runtime should receive the same install. It says OpenClaw integration uses gstack as a methodology source, not a ported codebase. OpenClaw spawns Claude Code sessions, and routing decides whether the task is simple, medium, heavy, full, or plan-only.

That is the right mental model for Codex too. Some tasks need no workflow stack. Some need a lightweight planning discipline. Some need /qa or /review. Full gstack should be reserved for work that benefits from multiple boundaries, not typed before every small change.

The host config system is the quiet lesson

docs/ADDING_A_HOST.md explains that each supported AI coding host is a typed config object. The config defines global roots, local roots, frontmatter handling, path rewrites, generated metadata, skipped skills, runtime assets, and install behavior. The generator, setup script, health checks, and tests consume those configs.

This is the design pattern other skill authors should copy. Do not fork the entire generator for each agent. Put host differences in configuration, validate them, and make path leakage a testable failure. That is more useful to Codex readers than another list of available slash commands.

Safety skills need a local policy

The /guard skill combines destructive command warnings with a directory edit boundary. It depends on sibling /careful and /freeze hooks, asks the user which directory should be protected, writes a freeze boundary state file, and blocks edits outside that path.

That is a good pattern for production debugging and live systems. It should not be magical. A team still needs to define when guard mode is mandatory, who can override destructive warnings, and how to remove stale freeze boundaries. The skill gives a mechanism; the repo must supply policy.

A selective rollout I would trust

Start with one concrete loop, not the entire catalog:

Loop A: planning quality
/office-hours -> /plan-ceo-review -> /plan-eng-review -> written plan

Loop B: landing quality
existing tests -> /review -> fix real findings -> tests again

Loop C: product behavior
staging URL -> /qa-only -> triage -> /qa on approved fixes

Loop D: release discipline
manual release checklist -> /ship dry run or low-risk repo -> PR review

After one week, keep only the loops that improved outcomes. If /qa-only finds real bugs, promote /qa. If /review finds production risks, keep it before merge. If /office-hours produces better scope, use it before new projects. If a skill adds ceremony without changing decisions, do not keep it just because it is part of the stack.

Pick the path that matches the user

A solo Claude Code user should start with the global Claude path and one workflow:

git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack
cd ~/.claude/skills/gstack
./setup --no-team
bun test

Then try only this sequence on a real branch:

/plan-eng-review
/review
/qa-only https://staging.example.com

A Codex CLI user should inspect the generated skill output instead of assuming Claude paths apply:

cd ~/gstack-audit
./setup --host codex
ls ~/.codex/skills | grep gstack
find ~/.codex/skills/gstack -maxdepth 3 -type f | sort | head -40

A team should not start with automatic updates. First run the same branch through manual review and QA for a week. Only then consider:

(cd ~/.claude/skills/gstack && ./setup --team)
~/.claude/skills/gstack/bin/gstack-team-init required
git diff -- .claude CLAUDE.md

If the diff adds routing the team does not understand, stop and rewrite the local policy before committing it.

What to write into CLAUDE.md or AGENTS.md

The local repo should own the final policy. A practical rule can be short:

Use gstack selectively. For new product work, run /office-hours and /plan-eng-review before implementation. For branches with meaningful diffs, run /review before PR. For web-facing changes, run /qa-only first and only allow /qa to modify code after triage. For releases, use /ship only after local tests and human approval. Do not enable team mode or auto-update hooks without an owner and rollback plan.

That keeps the useful method while avoiding blind adoption of every command.

When not to use it

Skip gstack for tiny edits, quick docs fixes, one-off scripts, or repos where a project-local AGENTS.md already gives enough structure. Skip team mode if the team is not ready to own SessionStart behavior. Skip browser-fixing autonomy if the app has high-risk state or weak rollback. Skip productivity claims unless your own repo metrics, tests, and bug reports show the loop is helping.

The strong version of gstack is a workflow operating layer. Operating layers have costs. The value appears when the work is complex enough that missing a boundary is more expensive than running the boundary.

Save the gstack article as a field guide for adopting a Claude Code workflow boundary. It is valuable when readers study host installation, generated skills, browser QA, review, release, safety, and team-mode tradeoffs; it is weak if reduced to a project-management or productivity headline.

Practical takeaway

Do not install gstack because the README lists many skills. Clone it, inspect setup, run bun test and bun run skill:check, then adopt one loop: planning, review, QA, or release. Codex users should inspect ./setup --host codex and verify generated .agents/skills/gstack-* output. Teams should avoid --team until someone owns auto-updates, SessionStart hooks, rollback, and generated-skill drift. Keep the loops that improve real decisions; drop the ones that only add ceremony.

SOURCES

[1] Primary sourcegithub.com

[2] gstack AGENTSgithub.com

[3] setup scriptgithub.com

[4] package metadatagithub.com

[5] OpenClaw integrationgithub.com

[6] host extension guidegithub.com

[7] LOC controversy methodologygithub.com

[8] qa skillgithub.com

[9] review skillgithub.com

[10] skill validation testsgithub.com

[11] gstack CLAUDEgithub.com

[12] browse skillgithub.com

[13] ship skillgithub.com

[14] guard skillgithub.com

[15] context-save skillgithub.com

[16] plan-eng-review skillgithub.com

claude codecodexskillsworkflowbrowser qa

Claude Code 생태계를 앞서가세요

MCP 서버, 스킬, 에이전트 패턴, 바이브코딩 인사이트를 매주 전해드립니다.

무료 구독

OpenCode 전환 후 /code-review · /security-review 공백: opencode-power-pack의 SKILL.md 포팅 구조와 도입 조건

Anthropic 공식 Claude Code 플러그인의 code-review, security-review, feature-dev는 OpenCode에서 그대로 동작하지 않는다. opencode-power-pack은 이 워크플로우들을 OpenCode 네이티브 SKILL.md 포맷으로 번역하고, ~/.config/opencode/opencode.json 한 줄 설정으로 11개 스킬을 적재한다.

2026년 6월 11일

opencodeskillscode-review

도구8분 읽기

guard-skills: Claude Code diff에 catch-all 오류·환각 import·HPOS 패턴을 잡는 5개 리뷰 Skill

guard-skills는 Claude Code, Codex, Cursor가 생성한 코드·테스트·문서에 second-pass 리뷰를 수행하는 5개 Skill 패키지입니다. clean-code-guard는 GitClear·USENIX 연구에 근거한 14가지 AI 실패 패턴을 검사하고, woo-guard는 AI가 반복 생성하는 pre-HPOS WooCommerce 코드 패턴을 직접 겨냥합니다.

2026년 6월 11일

skillsclaude-codecode-review

도구7 min read

baoyu-design: What Changes When You Run claude.ai/design Locally in Claude Code

baoyu-design ports the claude.ai/design methodology to Claude Code by detecting the agent environment at runtime and loading a tool substitution table from `references/claude.md`. The design system pipeline — `compile-design-system.mjs` produces static lint config, `import-design-system.mjs` generates a token allowlist as `_ds_prompt.md` — enforces the same constraint at two layers. Neither is optional if you want session-to-session consistency.

2026년 6월 10일

skillsclaude-codedesign-systems