The old article had the right source and the wrong shape. It talked about large datasets in MCP servers as if the main story were token efficiency. The more useful story is interface design.

An MCP tool call is a control message. It should say what to do and which artifact to use. It should not carry 10,000 rows of CSV as JSON just because the schema technically allows it. FutureSearch's writeup is worth saving because it shows a practical boundary: move the bytes through a side channel, create a server-side artifact, and keep the model context focused on decisions.

Inline Rows Are a Smell After the Demo

FutureSearch starts with the obvious implementation: a tool accepts a task plus data, where data is a list of row objects. That is a good demo. It proves the model can call the tool and the server can deserialize the payload.

It is also the point where an MCP builder should stop. A few rows are fine. A few hundred rows can spend the context budget before the model has reasoned about the job. A 10,000-row CSV is no longer tool input in the ordinary sense. It is bulk data, and bulk data needs a transfer path.

The Pattern Is Reference, Not Copy

The rescue-worthy pattern is simple: Claude asks for an upload URL, transfers the file to the server, and receives an artifact_id. Every later tool call uses that ID.

The control path stays small:

{"task":"Find the CEO of each company","artifact_id":"a1b2c3d4-e5f6-7890-abcd-ef1234567890"}

The server owns the bytes, parsing, schema detection, and lifecycle. The model owns the plan, questions, and interpretation. That split is what makes the idea portable beyond FutureSearch. Any MCP server that handles exports, logs, spreadsheets, traces, embeddings, benchmark tables, or generated reports can use the same shape.

Map the Client Before Designing the Tool

FutureSearch's useful detail is that the upload path changes by client while the downstream interface stays the same.

In Claude Code over stdio, the MCP server can read a local file path directly because the server process runs on the user's machine. In Claude Code over HTTP, the server is remote, so the client needs a signed upload URL and a terminal command. In Claude.ai and Claude Desktop, the file can sit inside a code execution sandbox, so Claude runs curl from that sandbox after the user allows the MCP server domain.

That means the MCP tool contract should not pretend every client has the same filesystem. Write separate ingestion tools if needed, but converge on one artifact object before the actual processing tool runs.

A Good Upload URL Is a Narrow Capability

The signed URL in the source is not just a convenience. It is a scoped capability. The example response includes an upload_url, an upload_id, a 300-second expiry, a 50MB max size, and a ready-made curl command.

That shape is right because it limits what Claude can do next. The URL should expire quickly, accept one upload, enforce size and content type, and bind the request to an upload ID. After the upload, the endpoint should parse the file, record rows and columns, and return an artifact reference. If the upload fails after expiry, the model should request a fresh URL rather than reuse the old one.

Security Is Part of the Interface

The source calls out a key distinction: presigned upload and URL fetch are not the same risk. With presigned upload, the server receives bytes pushed by the client. With URL fetch, the server makes an outbound request on the user's behalf.

That second path creates SSRF risk. A malicious URL can try to reach cloud metadata endpoints or internal services. FutureSearch lists the right kinds of defenses: DNS validation, IP blocklists, connection pinning, and redirect validation. A serious MCP article needs those details in the main body because agents are good at following instructions and bad at knowing which URL is safe.

The adoption rule is plain: default to client-pushed uploads for private files. Add URL fetch only when the product needs it, and treat it as a separate security surface.

Write the Artifact Contract

The server-side artifact should be more than a random ID. It needs enough metadata for later tools to behave predictably:

{
  "artifact_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "source_filename": "companies.csv",
  "content_type": "text/csv",
  "rows": 10000,
  "columns": ["company", "url", "industry", "founded"],
  "created_at": "2026-06-01T00:00:00Z"
}

That metadata lets the model ask better follow-up questions without loading the file. It can say, 'use the company and url columns,' or 'sample 20 rows first,' while the tool performs actual data access on the server. If the user later asks for provenance, the article ID ties the answer back to the uploaded file.

Do Not Use It for Every Input

This pattern has a cost. It adds an upload endpoint, storage, cleanup, retry behavior, authentication, observability, and security review. Do not build it for a tool that only receives small configuration objects or ten lines of text.

Use inline arguments when the model should see and reason over the whole payload. Use artifact upload when the data is too large, repetitive, sensitive, or operationally awkward to place in context. Use pagination or server-side filtering when the dataset already lives in your system. The artifact boundary is not a replacement for good data APIs; it is the bridge when the user brings a file to the agent.

A Local Acceptance Test

Before shipping this pattern, run it like an integration test instead of a blog demo:

1. Upload a valid 10,000-row CSV and confirm the model context only sees the artifact metadata.
2. Reuse the same signed URL and confirm the server rejects it.
3. Wait past the TTL and confirm the failure message tells Claude to request a new URL.
4. Try an oversized file and a wrong content type.
5. Try URL fetch against 169.254.169.254 and an internal hostname.
6. Run the same task from Claude Code stdio, Claude Code HTTP, Claude.ai, and Claude Desktop.

The pass condition is not just that upload works once. The pass condition is that each client fails clearly, the server never copies rows into the model context, and the artifact ID is enough for downstream tools to finish the task.

How I Would Put It in an MCP Server

Expose ingestion and processing as separate tools:

{"tool":"request_upload_url","arguments":{"filename":"companies.csv","content_type":"text/csv"}}
{"tool":"process_dataset","arguments":{"artifact_id":"...","task":"find CEOs"}}

For Claude Code stdio, add a direct local-path variant:

{"tool":"upload_local_file","arguments":{"path":"/Users/me/data/companies.csv"}}

Keep the processing tool boring. It should not care whether the artifact came from a local path, a sandbox curl command, or a future URL fetch mode. That separation is what prevents every new client from becoming a new processing API.

Verdict

This source should be published because it gives MCP builders a real design pattern. The old title made it sound like another prompt-era efficiency trick. The better article says something more durable: large inputs need an artifact boundary.

For Codex and Claude Code practitioners, the practical value is direct. If you are building an MCP server for research, analytics, QA logs, spreadsheet review, benchmark interpretation, or repository exports, do not push the dataset through the model. Move it beside the model, validate it like an API upload, and give the agent a small reference it can use safely.

Save the article as an MCP interface pattern: use side-channel ingestion for bulk files, store server-side artifacts, and keep tool calls focused on instructions plus references.

Practical takeaway

When an MCP tool needs large files, split the contract into ingestion and processing. Use local file reads for Claude Code stdio, signed upload URLs for remote clients, strict validation on the upload endpoint, SSRF controls for URL fetch, and an artifact_id as the only value passed into downstream tools.