Skip to article
Deep dive

The cookie bridge — architecture deep dive

How Stekpad scrapes pages that require a login without ever storing a single cookie on our servers. Diagrams, sequence flows, and the exact trust boundary.

Stekpad Team8 min read
On this page

Every scraping vendor that supports authenticated pages eventually has to answer one question. Where do the cookies live? The honest answers are few. The dishonest answers are many. Phantombuster stored session cookies on its own servers for years. Apify still does, by default. Firecrawl accepts server-side cookie headers. Most of the "AI scraping" startups that launched in 2024 took the easy road — a cookie jar in a managed database, encrypted at rest, promised to be safe.

We looked at that answer and picked a different one. The Stekpad cookie bridge stores zero cookies on Stekpad servers. Not encrypted. Not in logs. Not in the database. Not even briefly. The only place a session cookie ever exists, during a Stekpad authenticated scrape, is your own Chrome. The bridge is an architecture, not a policy.

This post is the full deep dive. Why we made this choice, the exact request and response sequence, the security envelope, what an attacker can and cannot see, what happens when the bridge is offline, how to test it locally, and a clear comparison with server-side cookie jars.

The one-line invariant

No Stekpad server ever receives a session cookie for a domain you scrape with the cookie bridge.

Say it out loud. Write it on a sticky note next to your monitor. Every piece of the design below exists to keep that sentence true under load, under failure, under a compromised backend, and under a motivated attacker.

The architecture, in text

Here is the full path a use_session scrape takes. Imagine seven boxes in a row.

text
[ your code ]
|
| 1. POST /v1/scrape { url, use_session: "stripe.com" }
v
[ Stekpad API edge ]
|
| 2. authenticates the API key, validates the request
v
[ Stekpad orchestrator ]
|
| 3. pushes a fetch job over a workspace-scoped websocket
v
[ your Chrome extension ]
|
| 4. fetches the page, Chrome attaches its own cookies
v
[ the target site ]
|
| 5. responds with rendered HTML
v
[ your Chrome extension ]
|
| 6. posts back HTML + final URL + cookies_used flag
v
[ Stekpad post-processor ]
|
| 7. markdown, JSON extract, storage, enrichment
v
[ your code ] the API response

The important arrows are 3 and 6. The job that flows from Stekpad to your extension contains the target URL, the action list, and a timeout. It does not contain any authentication material, because there is no authentication material on the Stekpad side to include. When the extension fetches the page in step 4, it is your Chrome making a regular authenticated request. Chrome attaches cookies the way it would for any tab. Stekpad never sees them.

The response that flows back from the extension to Stekpad in step 6 contains the rendered HTML, the final URL after redirects, and a cookies_used: false flag on the server-side metadata — because from the server's point of view, it did not use cookies, the client did.

The curl example, with real values

You call the public API the same way you call a normal scrape. One extra field.

bash
curl -X POST https://api.stekpad.com/v1/scrape \
-H "Authorization: Bearer stkpd_live_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://dashboard.stripe.com/test/customers",
"formats": ["markdown", "html"],
"use_session": "stripe.com"
}'

Three things happen.

  1. Stekpad looks up your workspace. It finds the workspace has one or more Chrome extensions registered. It picks a healthy one — by default, the most recently seen.
  2. It emits a bridge.fetch job on the workspace-scoped websocket. The job has a 60-second TTL and a unique run_id.
  3. Your Chrome extension receives the job, checks that stripe.com is on the authorized domain list for this workspace, opens a tab (or reuses an existing one), fetches the URL, posts back the rendered HTML.

The response you get on the API call is the same shape as a regular scrape. A run_id, a credits_charged: 1, a markdown field, an html field, and a metadata object with cookies_used: false. The Stekpad run log for run_id shows the fetch step with source: "cookie_bridge" and no auth artifacts.

The security envelope

Five things keep the invariant true.

Mutual authentication. The websocket between the Stekpad orchestrator and your Chrome extension is authenticated in both directions. The extension signs every message with a device key generated at install time, bound to your workspace's API key. The orchestrator verifies the signature before it routes any fetch job to the extension. A compromised backend with no device key cannot impersonate an extension.

Workspace-scoped channels. Each workspace gets its own websocket channel. A fetch job emitted on workspace A can only be received by an extension authorized for workspace A. No cross-workspace routing, no shared queue, no way for an attacker to see jobs meant for another tenant.

Per-domain authorization. The extension keeps a per-workspace list of domains that are allowed to run in bridge mode. The default is empty. You add domains explicitly, one at a time, from the extension popup's Sessions tab. A fetch job for evil.com against an extension that has only authorized stripe.com is rejected at the extension — not at Stekpad, because Stekpad has no way to know the domain list.

Three bridge policies. The extension supports always_allow, prompt_per_domain, and strict_whitelist. The default is prompt_per_domain: a first fetch on a new URL under an authorized domain pops a confirmation, everything after that is silent. strict_whitelist takes an explicit list of URL patterns and denies everything else.

An append-only local journal. Every fetch the extension performs on Stekpad's behalf is logged locally in a SQLite file inside the extension's storage. Each entry has timestamp | url | run_id | status | bytes_returned. The journal is exportable as CSV. It is never transmitted off your machine. It is your receipt.

The threat model — what can an attacker see?

Four attacker models worth thinking about.

A passive network observer between your Chrome and the target site. Sees the same thing they would see if you opened the target site manually — a TLS connection to the target. No new surface. The cookie bridge does not route traffic through Stekpad at step 4.

A passive network observer between your Chrome and Stekpad. Sees an authenticated websocket with signed binary frames. The payload of a bridge.fetch job is the URL, the action list, and the timeout. The payload of a response frame is the rendered HTML. No cookies. No auth headers. The URL itself may be sensitive (an invoice link, for example), which is why the channel is TLS-only and the extension rejects non-TLS target URLs by default.

A compromised Stekpad backend (the worst case). An attacker with full control of Stekpad's orchestrator can emit fetch jobs to your extension, but those jobs are constrained by your extension's domain authorization list. The attacker can scrape stripe.com in your browser as you — which is exactly what a legitimate Stekpad user is asking us to do for them. The attacker cannot exfiltrate the cookies themselves, because the cookies never leave your Chrome. This is the key point. A server-side cookie jar in the same compromise scenario is game over. The cookie bridge is a bad day, not a catastrophe.

A compromised Chrome extension. An attacker with a compromised extension already has full access to your browser's cookies by virtue of being an extension. The cookie bridge does not add new surface, it relies on the extension's isolation guarantees. This is why the Stekpad extension is open source and why the device-key signing prevents a modified extension from routing jobs into a legitimate channel.

What happens when the bridge is offline

Real users close laptops. Real browsers restart. Real extensions get disabled. The bridge has to degrade gracefully.

Synchronous calls with use_session return a session_unavailable error immediately. The error message is specific: the cookie bridge for stripe.com isn't connected. Open Chrome with the Stekpad extension active, or remove use_session from the request. The agent or the human knows the exact next step.

Asynchronous runs (crawls, batch extracts) enter a waiting_for_bridge state. Stekpad holds the run for up to 24 hours. When the extension reconnects, the run resumes from where it left off. Your dashboard shows a banner: 3 pending runs waiting for bridge — domains: stripe.com, linkedin.com. The banner is also emitted as a webhook event (session.unavailable) so your own monitoring can act on it.

The 24-hour TTL is deliberate. Long enough for a weekend, short enough that a forgotten run cannot silently drain credits weeks later. Runs that exceed the TTL transition to cancelled_bridge_timeout with a full explanation in the run log.

Three common server-side approaches and what they cost.

The "encrypted cookie jar" pattern (Phantombuster-era tooling). You paste your cookies into the vendor's dashboard. The vendor encrypts them at rest, decrypts them for every scrape, and promises to protect the key. Every scrape is a request that includes your raw session tokens going through a third-party server. Breach surface: total. Revocation: vendor dashboard only. Compliance: your cookies live in a vendor database that must be in your DPA.

The "cookie header injection" pattern (older Apify actors, Firecrawl's session headers). You pass the cookies as a request field on every call. The vendor does not persist them long-term, but they are logged, they live in memory during the request, and they may end up in traces and error reports. Breach surface: anything in the vendor's log pipeline. Revocation: up to the target site, not you. Compliance: every request is a new transfer of authentication material.

The Stekpad cookie bridge. Cookies stay in your Chrome. The vendor never receives them. Breach surface: your Chrome and your extension — the same surface that already protects your browsing. Revocation: close the tab, disable the extension, remove the domain authorization. Compliance: cookies are not a transfer to a data processor, because there is no transfer.

The tradeoff is honest. The cookie bridge adds ~500–1500 ms of round-trip latency compared to a Cloudflare Browser Rendering fetch on a public page. It requires a real Chrome with the extension installed and the user logged in. It does not work on a headless server. For authenticated pages, we think that is the only honest path.

Developer tips

A few things we learned running the bridge in production.

Test locally with a staging workspace. Create a second Stekpad workspace called bridge-test. Authorize httpbin.org in the extension for that workspace. Call /v1/scrape with use_session: "httpbin.org" against https://httpbin.org/cookies/set?demo=123. The response HTML will show the demo=123 cookie, which proves the bridge routed the request through your Chrome.

Revoke a domain in one click. Open the extension popup → Sessions → click the red × next to the domain. Revocation is pushed to the orchestrator over the websocket. Any in-flight run for that domain fails with session_unavailable within ~200 ms. You can also revoke from the web app — both paths reach the same state.

Read the fetch journal when something looks wrong. The extension popup has a Journal tab that opens the local SQLite file and shows the last 100 fetches. Each row links to the corresponding Stekpad run log entry. If a fetch failed, the journal has the error, the URL, and the status code before you ever open the dashboard.

Use `prompt_per_domain` during evaluation, `always_allow` in production. The prompts are noisy during a 1,000-URL crawl. Switch to always_allow once you trust the domain list, and rely on the domain authorization itself as the gate.

Never share an extension install across humans. The bridge is fundamentally per-user. Each human adds their own domains to their own extension. Trying to share an install across a team defeats the security model, and it defeats the audit trail.

Next steps

Stekpad Team
We build Stekpad. We scrape the web, store it, and enrich it — from an API, from an app, or from Claude.

Try the API. Free to start.

3 free runs a day on the playground. No credit card. Install MCP for Claude in 60 seconds.

The cookie bridge — architecture deep dive — Stekpad — Stekpad