POST /v1/crawl

POST /v1/crawl

Walk a site. One credit per page crawled. Async, with webhooks and a run id you can poll.

Get your API keyCost · 1 credit per page
Overview

What this verb does.

crawl walks a site from a seed URL, follows links inside the include_paths globs, skips anything in exclude_paths, and lands every page as a row in a dataset. The call returns a run_id immediately — poll GET /v1/runs/:id or subscribe to webhooks.

Per-page rendering uses the same engine as scrape, so all the scrape options apply: formats, schema-driven extraction, actions, even use_session for authenticated crawls.

Request

Make the call.

Pick your language. Every snippet is a real, runnable example.

curl -X POST https://api.stekpad.com/v1/crawl \
-H "Authorization: Bearer stkpd_live_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"limits": { "max_pages": 500, "max_depth": 5 },
"include_paths": ["/blog/**", "/docs/**"],
"scrape_options": { "formats": ["markdown"] },
"dataset": { "type": "markdown_bundle", "name": "Example docs" }
}'
Parameters

Request body.

NameTypeRequiredDescription
urlstringrequiredSeed URL. Crawl starts here.
limits.max_pagesintoptionalHard cap on pages crawled. Default 100.
limits.max_depthintoptionalLink depth from seed. Default 5.
limits.max_duration_secondsintoptionalWall-clock cap. Default 1800.
include_pathsstring[]optionalGlob patterns. Only matching paths are crawled.
exclude_pathsstring[]optionalGlob patterns. Matching paths are skipped.
respect_robotsbooleanoptionalDefault true. Override gated to Cloud Growth+.
deduplicate_bystringoptionalcanonical_url, content_hash, or a column name.
scrape_optionsobjectoptionalSame shape as /v1/scrape. Applied per page.
use_sessionstringoptionalTriggers the cookie bridge.
datasetobjectrequired{ type, name } to create, or { id } to append.
webhook_urlstringoptionalHTTPS URL for events.
webhook_eventsstring[]optionalrun.completed, row.added, run.failed.
Response

What you get back.

crawl-response.json
json
{
"run_id": "run_01HZ...",
"status": "queued",
"dataset_id": "ds_xyz",
"poll_url": "https://api.stekpad.com/v1/runs/run_01HZ...",
"estimated_credits": 500
}
Errors

Structured, typed, actionable.

CodeWhen
invalid_seed_urlSeed URL is malformed.
crawl_quota_exceededmax_pages reached — partial dataset is still saved.
crawl_duration_exceededmax_duration_seconds reached.
target_blockedSite blocked the crawler globally.
insufficient_creditsWallet hit 0 mid-crawl. The run pauses; top up to resume.

Every error includes a code, a human message, and a guidance field with the exact action to take.

Try /v1/crawl now.

Get an API key, paste the curl, watch the row land in your dataset.

POST /v1/crawl — walk a site | Stekpad API — Stekpad