Manifesto

Why your AI agent needs MCP-native scraping (2026)

Most "AI scrapers" were built for humans clicking buttons. The world changed. Here's what a scraper designed for AI agents looks like.

Stekpad TeamApril 19, 20269 min read

On this page

What MCP actually is
Why scrapers fit MCP unusually well
The dynamic gateway pattern
A concrete walk-through
Two things we learned building this
Why a static MCP server isn't enough
The bigger picture
Try it yourself — takes 30 seconds

Most of the "AI scrapers" you can buy today were designed before MCP existed. It shows. They are APIs built for a human developer sitting in a terminal, reading documentation, copy-pasting an endpoint, and stapling an Authorization header to a cURL. That human is slowly leaving the building. In their place is an agent: a Claude session, a Cursor composer, a Claude Code worker that needs to find tools, call them, and reason over the result — without a human in the middle explaining what a x-stekpad-api-key header is.

If you have ever tried to give an agent access to a scraper, you know the first problem. The agent does not know the scraper exists. You can paste the base URL into the system prompt, you can put the OpenAPI spec in a retrieval index, you can glue the documentation into a tool-use schema. None of that makes the scraper discoverable — it makes the scraper a trivia question the agent has to remember correctly every time.

This post is about what scraping looks like when you stop designing for humans and start designing for agents. It is not subtle. The architecture changes.

What MCP actually is

The Model Context Protocol is an open spec from Anthropic for letting an AI client list, inspect, and call tools at runtime. It's a thin contract: the client connects to a server, asks "what can you do?", gets back a typed list of tool names with JSON schemas for inputs and outputs, and then calls them by name. The server can also expose resources (static-ish data the model can read) and prompts (pre-baked templates), but tools are the piece that matters here.

The important word is runtime. The agent does not need to know your tool list at compile time. It does not need a README. It does not need a blog post teaching it your naming conventions. It connects, introspects, and discovers. When you add a new tool, the agent sees it the next time it reconnects. When you remove one, it disappears. There is no training step, no prompt engineering, no documentation to keep in sync.

MCP has been shipping in Claude Desktop, Cursor, Claude Code, Continue, Zed, and a handful of other editors for long enough now that the pattern is settled. A user installs a single config line pointing at an MCP URL. Every tool that server exposes becomes a first-class citizen of the agent's mind, the same way a filesystem tool or a shell tool is. The interesting question is what you put on the other end of that URL.

Why scrapers fit MCP unusually well

Not every API is interesting through MCP. A lot of them are just HTTP endpoints wrapped in a new protocol — the shape of the data is the same, the discovery is the same, and the agent is no better off than if you had given it a Postman collection. Scrapers are different, for three reasons.

First, scrapers produce lists. Not one thing. Not a blob of text. A list with a schema: rows with typed columns. Agents are good at reasoning over lists. "How many of these have changed since Tuesday? Which one raised? What's the median price?" Those are the questions users actually ask, and they are pleasant to answer over structured rows.

Second, scrapers update. A Stekpad robot isn't a one-shot — it rescrapes on a schedule, diffs the result against the last run, and emits structured events: row_added, row_removed, field_changed. An agent that can ask "what's new?" and get back a meaningful answer is substantially more useful than one that has to re-read the world every time.

Third, scrapers are per-user. The robots I care about are not the robots you care about. Your Clay replacement tracks a very different list than my YC batch monitor. This is where the standard "one big REST API" model falls over: there is nothing generic to expose. You have to introspect what this specific workspace has configured. MCP was designed for exactly this — dynamic, per-connection tool catalogs — but most people haven't wired their backends that way.

The dynamic gateway pattern

So here is what we built, and why.

Every Stekpad workspace has one MCP URL. It looks like https://mcp.stekpad.com/stkpd_live_xxxxxxxxxxxx. The key in the URL is the workspace's API key — we trade some elegance for a dead-simple install: paste one URL into Claude Desktop's config, no separate env var, no second secret.

When an MCP client connects, the gateway:

Validates the workspace key against our auth service.
Lists the workspace's robots (slug, name, column schema, run cadence).
Generates, per robot, a set of named tools — get_<slug>, run_<slug>, diff_<slug>, describe_<slug>, search_<slug> — with JSON schemas derived from the robot's column definitions.
Generates a handful of workspace-level tools — list_robots, get_quota, create_robot(url) — so an agent can bootstrap new scrapers on its own.
Streams the full tool list back over the MCP handshake.

When you create a new robot in the dashboard, the gateway pushes a tools/list_changed event on open SSE sessions. Clients that support it (Claude Desktop does) refresh transparently. You name a robot YC W26 companies and, thirty seconds later, Claude has get_yc_w26_companies() as a tool. No restart. No redeploy. No prompt edit.

The moment that pattern started feeling inevitable was the first time a teammate asked Claude to "check if my pricing watch has anything new" and Claude called diff_pricing_watch(since="24h") on the first try — without us having told it the tool existed, without any prompt priming. The tool's name was sufficient. The agent found it.

A concrete walk-through

Here is the exact flow, start to finish, for someone who has never touched Stekpad.

You paste a URL into the Stekpad dashboard — say, https://www.ycombinator.com/companies?batch=W26. A creation agent navigates that URL in a cloud browser, detects the main list, infers columns, handles pagination, and compiles a Robot Program. Thirty seconds later you have a running scraper with 219 rows, 6 columns, and an hourly monitoring schedule. You name it YC W26 companies.

Next, you open the Integrate tab. At the top is your MCP URL. Copy it. Open Claude Desktop's claude_desktop_config.json and add:

json

{
  "mcpServers": {
    "stekpad": {
      "url": "https://mcp.stekpad.com/stkpd_live_xxxxxxxxxxxx"
    }
  }
}

Restart Claude Desktop once. The tool menu now shows four Stekpad tools for that robot: get_yc_w26_companies, run_yc_w26_companies, diff_yc_w26_companies, describe_yc_w26_companies. Plus the workspace-level list_robots, get_quota, create_robot.

Now you can have a normal conversation:

You: What's new in YC W26 this week? Group by batch track. Claude: (calls `diff_yc_w26_companies(since="7d")`) 14 new companies added, 2 removed, 8 profiles updated. New this week: - AI/Agent (6): Corridor, Alloy, Tangent, Neutrino, Foundry.ai, Patchwork - Fintech (3): Vault, Piper, Ledgerly - Healthtech (2): Cadence, Noa - Hardware (3): Gantry, Muon, Stipple Want me to pull the full profile for any of these?

At no point did you explain to Claude how Stekpad works, what its API looks like, or where the data lives. The tool was named, typed, and discovered. The model found it and used it the way you would use a well-named function.

That is the product promise: agents on one side, a scraper they can find by name on the other.

Two things we learned building this

A plan like this looks clean on paper. In practice, two details mattered more than we expected.

Slug normalization is a surface. Robot names are user-typed — YC W26 companies, My LinkedIn leads (v2), Best bars ⭐ for date night. Tool names in MCP are [a-zA-Z0-9_]+. You can't just strip and underscore, or you'll collide v-2 and v2 in the same workspace. We ended up implementing a deterministic normalizer plus a dedupe suffix (robot, robot_2) and showing the resulting tool name in the dashboard next to the robot name. The first time a user renamed a robot and their agent "forgot" what the tool was called, we knew we had to surface the normalized name. Don't hide the contract.

Claude Desktop caches tool lists between restarts. The SSE tools/list_changed push works in newer builds, but older ones don't refresh until the user quits and relaunches. We spent a week thinking our gateway had a bug before we realized we were testing against a cached list. The fix on our side was to log every introspection request and pin a build version in the handshake — which, ironically, would have been impossible to debug without the very MCP tools we were building. If you ship a dynamic gateway, instrument the introspection path first.

Neither of these is deep. Both of them cost us days. If you're building something similar, write them on a sticky note.

Why a static MCP server isn't enough

You'll see a lot of projects ship "MCP support" by exposing a fixed set of tools — scrape_url, crawl_site, search_web. That's genuinely useful for a generic scraper. It is the floor, not the ceiling. The moment your agent starts doing real work, it will want to say things like "check the list I named pricing watch yesterday", and a static tool list has no answer. The agent has to be told, in the prompt, that pricing watch is a label in some database somewhere, and it has to pass that label as an argument, and it has to hope you typed it right. That's a fragile dance.

Dynamic, per-workspace tool catalogs collapse the dance. The agent doesn't pass a label — it calls get_pricing_watch directly, because that tool exists in its world. The naming is the disambiguation. There's no "which dataset do you mean?" round-trip. There's no drift between what the user called their robot in the dashboard and what the agent thinks it's called. The contract is the name.

Getting this right matters more than the tool implementation itself. A clunky get_pricing_watch() that returns slightly messy JSON is more useful to an agent than a clean query_dataset(dataset_id: str) with a beautiful schema, because the former is findable and the latter is guessable. Findability beats cleanness every time.

The bigger picture

Every generation of developer tool rediscovers the same truth a decade late: the contract is the product. For REST APIs, the contract was OpenAPI. For GraphQL, it was the schema. For SDKs, it was the type definitions. For agents, it's MCP. Anything you ship that doesn't expose a clean MCP surface is — from the agent's perspective — a manual tool, one that needs a human to pattern-match over documentation.

Scrapers are particularly behind. Most of the ones on the market today shipped before MCP was a thing, and they're still designed around a "give the user an API key, point them at a base URL, hope they figure it out" flow. That worked for a human. For an agent, it's static friction that will not go away with better prompt engineering. The fix is architectural: you introspect the workspace, you name the tools, you stream the updates, and you let the agent find them.

That's what we built. That's why we call ourselves MCP-native — not because we ship an MCP server on the side, but because the shape of the product follows the protocol.

Try it yourself — takes 30 seconds

If this post made sense, stop reading it and go see it work.

Paste any URL with a list into Stekpad. Watch the creation agent build a robot in 30 seconds. Open the Integrate tab. Copy the MCP URL. Paste it in Claude Desktop. Ask your agent what's in the data.

You'll know in a minute whether the pattern clicks for you. Most of the people it clicks for do not go back to regular scraping.

Stekpad Team

We build Stekpad. We scrape the web, store it, and enrich it — from an API, from an app, or from Claude.

Manifesto

Why your AI agent needs MCP-native scraping (2026)

Most "AI scrapers" were built for humans clicking buttons. The world changed. Here's what a scraper designed for AI agents looks like.

Stekpad TeamApril 19, 20269 min read

On this page

What MCP actually is
Why scrapers fit MCP unusually well
The dynamic gateway pattern
A concrete walk-through
Two things we learned building this
Why a static MCP server isn't enough
The bigger picture
Try it yourself — takes 30 seconds

This post is about what scraping looks like when you stop designing for humans and start designing for agents. It is not subtle. The architecture changes.

What MCP actually is

Why scrapers fit MCP unusually well

The dynamic gateway pattern

So here is what we built, and why.

When an MCP client connects, the gateway:

Validates the workspace key against our auth service.
Lists the workspace's robots (slug, name, column schema, run cadence).
Generates, per robot, a set of named tools — get_<slug>, run_<slug>, diff_<slug>, describe_<slug>, search_<slug> — with JSON schemas derived from the robot's column definitions.
Generates a handful of workspace-level tools — list_robots, get_quota, create_robot(url) — so an agent can bootstrap new scrapers on its own.
Streams the full tool list back over the MCP handshake.

A concrete walk-through

Here is the exact flow, start to finish, for someone who has never touched Stekpad.

Next, you open the Integrate tab. At the top is your MCP URL. Copy it. Open Claude Desktop's claude_desktop_config.json and add:

json

{
  "mcpServers": {
    "stekpad": {
      "url": "https://mcp.stekpad.com/stkpd_live_xxxxxxxxxxxx"
    }
  }
}

Now you can have a normal conversation:

You: What's new in YC W26 this week? Group by batch track. Claude: (calls `diff_yc_w26_companies(since="7d")`) 14 new companies added, 2 removed, 8 profiles updated. New this week: - AI/Agent (6): Corridor, Alloy, Tangent, Neutrino, Foundry.ai, Patchwork - Fintech (3): Vault, Piper, Ledgerly - Healthtech (2): Cadence, Noa - Hardware (3): Gantry, Muon, Stipple Want me to pull the full profile for any of these?

That is the product promise: agents on one side, a scraper they can find by name on the other.

Two things we learned building this

A plan like this looks clean on paper. In practice, two details mattered more than we expected.

Neither of these is deep. Both of them cost us days. If you're building something similar, write them on a sticky note.

Why a static MCP server isn't enough

The bigger picture

That's what we built. That's why we call ourselves MCP-native — not because we ship an MCP server on the side, but because the shape of the product follows the protocol.

Try it yourself — takes 30 seconds

If this post made sense, stop reading it and go see it work.

Paste any URL with a list into Stekpad. Watch the creation agent build a robot in 30 seconds. Open the Integrate tab. Copy the MCP URL. Paste it in Claude Desktop. Ask your agent what's in the data.

You'll know in a minute whether the pattern clicks for you. Most of the people it clicks for do not go back to regular scraping.

Stekpad Team

We build Stekpad. We scrape the web, store it, and enrich it — from an API, from an app, or from Claude.

Why your AI agent needs MCP-native scraping (2026)

What MCP actually is

Why scrapers fit MCP unusually well

The dynamic gateway pattern

A concrete walk-through

Two things we learned building this

Why a static MCP server isn't enough

The bigger picture

Try it yourself — takes 30 seconds

Your AI agent needs live data, not yesterday's CSVs

Building AI agents that scrape — an MCP-first guide

Why every existing scraper is broken for AI agents

Try the API. Free to start.

Why your AI agent needs MCP-native scraping (2026)

What MCP actually is

Why scrapers fit MCP unusually well

The dynamic gateway pattern

A concrete walk-through

Two things we learned building this

Why a static MCP server isn't enough

The bigger picture

Try it yourself — takes 30 seconds

Your AI agent needs live data, not yesterday's CSVs

Building AI agents that scrape — an MCP-first guide

Why every existing scraper is broken for AI agents

Try the API. Free to start.