도구2026년 6월 1일 · 8 min read

CI Smoke Tests for MCP Servers with k6

The k6 example is useful when framed as a repeatable contract test for Dockerized stdio MCP servers, not as a broad load-testing story.

𝕏 in

EDITOR'S NOTEThis rescue reframes the source as CI contract testing for MCP servers. k6 is the harness; the value is repeatable stdio tool validation before publishing an agent-facing server.

The old article had a useful seed but the wrong promise. It made the k6 setup sound like a general integration win. The better article is narrower and more valuable: MCP servers need CI smoke tests.

If a Codex or Claude Code workflow depends on an MCP server, that server should be tested before an agent discovers it. It should start through the same transport shape, list the expected tools, accept valid arguments, return a stable response shape, and fail the build when the contract breaks. The Renato Groffe example is small, but it shows that loop end to end.

Call It Validation, Not Load Testing

The workflow uses k6, but the example is not a load test. It runs one virtual user and one iteration. That is a feature, not a flaw, if the goal is CI validation.

The point is to catch obvious MCP regressions before humans or agents hit them: the server does not start, ping fails, the tool list is wrong, a tool returns malformed JSON, or a schema change breaks a client. Treat the harness as a smoke and contract test first. Add load later only after the basic contract is trustworthy.

A useful first target is concrete: tests/mcp-fakedata_test.js must start the server, call the tools, and fail when any check() fails. The workflow must run that script on pull requests before the MCP server image or client config is treated as usable.

The Source Pattern

The test repo has three useful moving parts. First, the MCP server is packaged as a Docker image. Second, the k6 script starts that image through stdio with docker run -i --rm. Third, GitHub Actions builds a k6 binary with the MCP extension and runs the script on push, pull request, or manual dispatch.

That is exactly the shape an MCP publisher needs. The test should exercise the same path a local client uses, not a special testing endpoint.

What the k6 Script Proves

The script imports k6/x/mcp, creates an StdioClient, and launches Docker as the child process:

const client = new mcp.StdioClient({
  path: 'docker',
  args: ['run', '-i', '--rm', 'renatogroffe/dotnet10-consoleapp-mcp-fakedata'],
});

Then it calls ping, lists tools, calls four fake-data tools, parses each JSON response, and checks both isSuccess and expected record count. The k6 threshold requires checks to hit rate==1.0, so one broken assertion fails the run.

That is a good minimum bar. It proves the server is callable as an MCP process and that the response body still matches what an agent workflow expects.

The Server Gives You Negative Cases

The paired .NET MCP server is more than a placeholder. Program.cs registers stdio transport and four tool classes: contacts, companies, products, and messages. Each tool returns fake Brazilian data through Bogus. The validator rejects numberOfRecords <= 0 and numberOfRecords > 20.

That validator is an invitation to strengthen the CI suite. The sample validates happy paths with 3, 4, 5, and 6 records. A production article should tell readers to add failure cases:

numberOfRecords: 0 should fail
numberOfRecords: 21 should fail
unknown tool name should fail cleanly
malformed arguments should fail cleanly
startup timeout should fail the build

Those checks matter because agents tend to recover from vague failures by guessing. A clean tool error is part of the product contract.

Why k6 Is a Reasonable Harness

xk6-mcp exposes MCP clients inside k6: stdio, SSE, and Streamable HTTP. It can ping, list tools, list resources, list prompts, call tools, read resources, and fetch prompts. The implementation also emits MCP request duration, count, and error metrics.

That makes k6 useful even before load testing. You get JavaScript test code, checks, thresholds, summaries, and an HTML report artifact. If the MCP server later moves from stdio to Streamable HTTP, the same style of script can exercise that transport too.

The CI Build Is Part of the Contract

The workflow builds a custom k6 binary during CI using the grafana/xk6 Docker image and pins the MCP extension version. It then runs ./k6 as a sanity check, executes the script, lists generated files, and uploads results-report.html.

The minimum command shape is explicit:

cd tests
docker run --rm -u "$(id -u):$(id -g)" -v "${PWD}:/xk6" grafana/xk6 build v1.5.0 \
  --with github.com/dgzlopes/xk6-mcp@v0.0.3
./k6 run mcp-fakedata_test.js --vus 1 --iterations 1

That build step is more than plumbing. It keeps the repository from committing a platform-specific binary, and it makes extension version drift visible in CI. The tradeoff is slower setup and dependence on an experimental extension, so version pinning belongs in the article.

Do Not Overstate the Example

The repo is small. It does not test auth, permissions, resources, prompts, prompt-injection behavior, concurrent callers, long-running tools, or tool cancellation. xk6-mcp itself is marked experimental and not officially supported by Grafana Labs.

Those are not reasons to drop the source. They are boundaries. The source gives a practical first gate for MCP publishers. It should not be sold as complete reliability proof.

A CI Checklist I Would Accept

For a real MCP server, I would require a small matrix before publishing:

1. Start the server with the same transport users configure.
2. Assert ping succeeds within a startup timeout.
3. Assert the required tool names exist.
4. Call one happy path per core tool.
5. Call one negative path per core validator.
6. Assert response shape, not just non-empty text.
7. Upload the k6 report artifact.
8. Pin xk6-mcp and k6 versions.

If the server exposes resources or prompts, add listAllResources, readResource, listAllPrompts, and getPrompt checks. If the server is remote HTTP, add latency and error-rate thresholds after the contract suite is stable.

The Smallest Patch

For a new MCP server repo, the smallest useful patch is two files.

First, add tests/mcp-smoke_test.js:

import { check } from 'k6';
import mcp from 'k6/x/mcp';

export const options = { thresholds: { checks: ['rate==1.0'] } };

export default function () {
  const client = new mcp.StdioClient({
    path: 'docker',
    args: ['run', '-i', '--rm', 'your-org/your-mcp-server:ci'],
  });

  check(client, { ping: () => client.ping() === true });
  const tools = client.listAllTools().tools.map((tool) => tool.name);
  check(tools, { 'expected tool exists': (names) => names.includes('your_tool') });

  const result = client.callTool({ name: 'your_tool', arguments: { sample: true } });
  const body = JSON.parse(result.content[0].text);
  check(body, { 'shape is stable': (r) => r.isSuccess === true });
}

Second, add .github/workflows/mcp-smoke.yml with the same build-and-run shape as the source workflow:

name: mcp-smoke
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - run: |
          cd tests
          docker run --rm -u "$(id -u):$(id -g)" -v "${PWD}:/xk6" grafana/xk6 build v1.5.0 \
            --with github.com/dgzlopes/xk6-mcp@v0.0.3
          ./k6 run mcp-smoke_test.js --vus 1 --iterations 1

Those two files are enough to catch transport, tool-discovery, and response-shape regressions before the agent path breaks.

Where This Fits in Codex Work

This pattern is useful when Codex is maintaining an MCP server or when a repo's AGENTS.md tells agents to use one. The CI suite becomes a guardrail: after a change to tool schemas, Dockerfile, dependencies, or response models, Codex can run the same validation before handing the patch back.

It is also a good release gate. Before asking users to add an MCP server to Claude Code, Cursor, VS Code, or Codex CLI, publish the CI result that proves the transport and core tools still work.

Verdict

This candidate is salvageable because the practical method is real. It is not a long essay source, but it has enough implementation evidence: a Dockerized stdio MCP server, a k6 script that calls tools, a GitHub Actions workflow that builds xk6, and a paired server repo with validators.

The right title is not about making validation effortless. It is about responsibility. If agents are going to rely on MCP tools, those tools need repeatable CI smoke tests.

Save this as an MCP CI article: use k6/x/mcp to prove a server starts, lists tools, returns expected response shapes, and fails CI before an agent depends on it.

Practical takeaway

Build a custom k6 binary with xk6-mcp, start the MCP server through the same stdio or HTTP transport users configure, assert ping/list/call behavior, add negative cases from your validators, pin extension versions, and upload the k6 HTML report on every pull request.

SOURCES

[1] Primary sourcegithub.com

[2] k6 MCP test scriptgithub.com

[3] GitHub Actions workflowgithub.com

[4] .NET fake-data MCP READMEgithub.com

[5] .NET MCP server programgithub.com

[6] record-count validatorgithub.com

[7] xk6-mcp READMEgithub.com

[8] xk6-mcp implementationgithub.com

[9] Grafana xk6 Docker build docsgrafana.com

mcpk6github actionsdockerci

Claude Code 생태계를 앞서가세요

MCP 서버, 스킬, 에이전트 패턴, 바이브코딩 인사이트를 매주 전해드립니다.

무료 구독

OpenCode 전환 후 /code-review · /security-review 공백: opencode-power-pack의 SKILL.md 포팅 구조와 도입 조건

Anthropic 공식 Claude Code 플러그인의 code-review, security-review, feature-dev는 OpenCode에서 그대로 동작하지 않는다. opencode-power-pack은 이 워크플로우들을 OpenCode 네이티브 SKILL.md 포맷으로 번역하고, ~/.config/opencode/opencode.json 한 줄 설정으로 11개 스킬을 적재한다.

2026년 6월 11일

opencodeskillscode-review

도구8분 읽기

guard-skills: Claude Code diff에 catch-all 오류·환각 import·HPOS 패턴을 잡는 5개 리뷰 Skill

guard-skills는 Claude Code, Codex, Cursor가 생성한 코드·테스트·문서에 second-pass 리뷰를 수행하는 5개 Skill 패키지입니다. clean-code-guard는 GitClear·USENIX 연구에 근거한 14가지 AI 실패 패턴을 검사하고, woo-guard는 AI가 반복 생성하는 pre-HPOS WooCommerce 코드 패턴을 직접 겨냥합니다.

2026년 6월 11일

skillsclaude-codecode-review

도구7 min read

baoyu-design: What Changes When You Run claude.ai/design Locally in Claude Code

baoyu-design ports the claude.ai/design methodology to Claude Code by detecting the agent environment at runtime and loading a tool substitution table from `references/claude.md`. The design system pipeline — `compile-design-system.mjs` produces static lint config, `import-design-system.mjs` generates a token allowlist as `_ds_prompt.md` — enforces the same constraint at two layers. Neither is optional if you want session-to-session consistency.

2026년 6월 10일

skillsclaude-codedesign-systems