도구2026년 6월 1일 · 8 min read

Implement Source-Gated Web Research in Claude Code

web-researcher-mcp is useful when a research answer needs chosen source sets, full-page reads, and visible failure states.

𝕏 in

EDITOR'S NOTEUse web-researcher-mcp when source choice and recovery behavior matter more than a quick web answer.

The worst research answer from a coding agent is not obviously wrong. It is almost right, backed by links nobody checked, with one thin snippet doing the work of a source.

web-researcher-mcp is worth saving because its source is built around a stricter contract. It lets Claude Code search through chosen source sets, read full pages and documents, recover multi-step sessions, expose status resources, and return structured errors that tell the agent what failed. The magazine version should teach that contract, not repeat the README's promise of better scraping.

The decision rule

Use web-researcher-mcp only when the research answer needs a trail: a source set chosen before search, full-page or document reads instead of snippets, visible scrape failures, and a final answer that cites real URLs. If the task does not need that trail, use built-in search or a normal browser. If the answer will influence a PR, security note, legal-style claim, client memo, or architecture decision, make Claude Code run through the source gate first.

Use lenses before broad search

The main idea is source gating. The README calls these curated source lists search lenses; the tool docs say web_search accepts a lens parameter and that site cannot be combined with lens. That is exactly the boundary a Claude Code workflow needs. For a legal memo, search court and government sources first. For a package audit, search official docs, changelogs, and security advisories before blogs. If the agent starts with an open web query, it is already asking ranking systems to choose the evidence. Start with a lens, then broaden only when the lens fails.

Install it as a user-scoped MCP server

The clean Claude Code path is brew install zoharbabin/tap/web-researcher-mcp, then claude mcp add --scope user web-researcher -- web-researcher-mcp. The one-command installer is curl -fsSL https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.sh | sh, and other clients can use an mcpServers.web-researcher config with command: "web-researcher-mcp" plus provider keys in env. Keep the first install local and STDIO. The security docs state STDIO has no auth and trusts the calling process; that is fine for one developer, but not for a shared service.

Treat each tool as a different trust level

The core list includes web_search, scrape_page, search_and_scrape, image_search, news_search, academic_search, patent_search, sequential_search, and get_research_session. Do not give them one mental permission bucket. web_search selects candidate URLs. scrape_page reads a specific URL. search_and_scrape lets the server choose and read results in one flow. sequential_search keeps state across a multi-step investigation. For a Codex-style agent, that means a good prompt should ask for the query, lens, provider, source list, and scrape failures before the agent turns findings into advice.

Scraper tiers are fallback, not proof

The tool docs describe an ordered page extraction path: markdown negotiation, stealth HTTP, HTML extraction, then headless browser via go-rod and stealth. That stack is a recovery ladder. It does not make the content true. It only raises the chance that the server can read a page that a simpler scraper would miss. A browser-tier scrape of a vendor page is still vendor text. A successful PDF parse is still only one document. Ask Claude Code to compare sources, not to treat the highest-effort scrape as the highest-trust evidence.

Raw mode belongs in inspection tasks

scrape_page has full, preview, and raw modes. The docs say raw mode returns fetched bytes verbatim and skips the content processing pipeline, while still using URL validation, SSRF protection, allowlist checks, and length bounds. That is useful when a developer needs to inspect JSON, HTML, JavaScript, or odd markup. It is the wrong default for reading. Raw output may contain scripts, markup, and prompt-injection text. In a Claude Code workflow, use full for normal research and reserve raw for explicit source inspection where the agent treats every byte as data.

Security is transport-specific

The security doc is unusually useful because it separates local STDIO from HTTP mode. STDIO has no authentication because the local Claude Code or Cursor process is trusted. HTTP mode can add OAuth 2.1, scope checks, rate limiting, and metrics. The same document names SSRF as the highest-severity risk for a scraper and describes hostname blocking, DNS resolution, private-range validation, direct IP connection, and redirect re-validation. That should shape adoption. Use STDIO for a personal research tool. Use HTTP only when you can also operate auth, quotas, logs, and tenant boundaries.

Make failures part of the answer

The error handling docs are the strongest reason to prefer this over a throwaway scraper. Tool failures return a natural-language message plus a JSON block with kind, retryable, retryAfterSeconds, suggestedAction, provider data, alternatives, and sometimes recoveryHint. That gives Claude Code a recovery path: wait on rate limits, try a different provider, broaden a query, inform the user, or file a bug only when the server could plausibly improve. A serious research prompt should require the agent to include scrape failures and stats://rate-limits or stats://providers when the answer depends on external data.

Run a first research pass like a checklist

A safe first run is short and auditable. Install with brew install zoharbabin/tap/web-researcher-mcp, register with claude mcp add --scope user web-researcher -- web-researcher-mcp, and configure one provider before adding routing. Ask Claude Code to use web_search with a named lens or a narrow site filter. Before any summary, have it print the candidate URLs and explain why each source belongs in the set. Approve the source set, then call search_and_scrape for the query or scrape_page for specific URLs. Finish by checking stats://providers and stats://rate-limits, and include any structured errors in the final note. That workflow turns the MCP server into a reviewable research gate instead of an invisible citation machine.

Use it where source choice changes the outcome

The server is overkill for a casual lookup. It earns its keep when the source set changes the answer. For a dependency upgrade, start with official docs, changelogs, release notes, and security advisories; use search_and_scrape only after Claude Code shows the candidate URLs. For a package risk note, pair news_search with vendor docs and ask for scrape failures beside the conclusion. For research that cites papers, configure OPENALEX_EMAIL or CROSSREF_EMAIL and prefer academic_search before the open web. For a patent or market check, separate patent_search from general news. Each case uses the same rule: pick the source class first, then let the MCP server fetch it.

Skip it when the answer does not need a trail

The comparison with built-in search is not about speed. Claude's built-in search or a plain browser query is enough for quick package names, docs URLs, or one-off background checks. A single scraper script is enough when you already know the exact URL and only need its text. web-researcher-mcp starts to make sense when the research result will be quoted in a PR, client note, security review, grant draft, or architecture decision. That is where lenses, provider routing, get_research_session, structured errors, and status resources become editorial controls rather than extra plumbing. If the final answer will not cite sources or explain failures, do not add this server yet.

Save web-researcher-mcp as a source-gated research article. It is valuable when Claude Code must justify where evidence came from, not when the task only needs a quick web answer.

Practical takeaway

Install locally with brew install zoharbabin/tap/web-researcher-mcp and claude mcp add --scope user web-researcher -- web-researcher-mcp. Configure one provider, run a narrow web_search with a lens, review candidate URLs, then use search_and_scrape or scrape_page. Ask the agent to report stats://providers, stats://rate-limits, source links, and any structured errors before it writes conclusions.

SOURCES

[1] Primary sourcegithub.com

[2] Tool specificationsgithub.com

[3] Security architecturegithub.com

[4] Error handlinggithub.com

[5] Deployment guidegithub.com

[6] Privacy notesgithub.com

mcpclaude codeweb researchsource verificationscraping

Claude Code 생태계를 앞서가세요

MCP 서버, 스킬, 에이전트 패턴, 바이브코딩 인사이트를 매주 전해드립니다.

무료 구독

OpenCode 전환 후 /code-review · /security-review 공백: opencode-power-pack의 SKILL.md 포팅 구조와 도입 조건

Anthropic 공식 Claude Code 플러그인의 code-review, security-review, feature-dev는 OpenCode에서 그대로 동작하지 않는다. opencode-power-pack은 이 워크플로우들을 OpenCode 네이티브 SKILL.md 포맷으로 번역하고, ~/.config/opencode/opencode.json 한 줄 설정으로 11개 스킬을 적재한다.

2026년 6월 11일

opencodeskillscode-review

도구8분 읽기

guard-skills: Claude Code diff에 catch-all 오류·환각 import·HPOS 패턴을 잡는 5개 리뷰 Skill

guard-skills는 Claude Code, Codex, Cursor가 생성한 코드·테스트·문서에 second-pass 리뷰를 수행하는 5개 Skill 패키지입니다. clean-code-guard는 GitClear·USENIX 연구에 근거한 14가지 AI 실패 패턴을 검사하고, woo-guard는 AI가 반복 생성하는 pre-HPOS WooCommerce 코드 패턴을 직접 겨냥합니다.

2026년 6월 11일

skillsclaude-codecode-review

도구7 min read

baoyu-design: What Changes When You Run claude.ai/design Locally in Claude Code

baoyu-design ports the claude.ai/design methodology to Claude Code by detecting the agent environment at runtime and loading a tool substitution table from `references/claude.md`. The design system pipeline — `compile-design-system.mjs` produces static lint config, `import-design-system.mjs` generates a token allowlist as `_ds_prompt.md` — enforces the same constraint at two layers. Neither is optional if you want session-to-session consistency.

2026년 6월 10일

skillsclaude-codedesign-systems