POST /v1/crawl — walk a site | Stekpad API

POST /v1/crawl

Walk a site. One credit per page crawled. Async, with webhooks and a run id you can poll.

Get your API keyCost · 1 credit per page

Overview

What this verb does.

crawl walks a site from a seed URL, follows links inside the include_paths globs, skips anything in exclude_paths, and lands every page as a row in a dataset. The call returns a run_id immediately — poll GET /v1/runs/:id or subscribe to webhooks.

Per-page rendering uses the same engine as scrape, so all the scrape options apply: formats, schema-driven extraction, actions, even use_session for authenticated crawls.

Request

Make the call.

Pick your language. Every snippet is a real, runnable example.

curl -X POST https://api.stekpad.com/v1/crawl \
  -H "Authorization: Bearer stkpd_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limits": { "max_pages": 500, "max_depth": 5 },
    "include_paths": ["/blog/**", "/docs/**"],
    "scrape_options": { "formats": ["markdown"] },
    "dataset": { "type": "markdown_bundle", "name": "Example docs" }
  }'

Parameters

Request body.

Name	Type	Required	Description
url	string	required	Seed URL. Crawl starts here.
limits.max_pages	int	optional	Hard cap on pages crawled. Default 100.
limits.max_depth	int	optional	Link depth from seed. Default 5.
limits.max_duration_seconds	int	optional	Wall-clock cap. Default 1800.
include_paths	string[]	optional	Glob patterns. Only matching paths are crawled.
exclude_paths	string[]	optional	Glob patterns. Matching paths are skipped.
respect_robots	boolean	optional	Default true. Override gated to Cloud Growth+.
deduplicate_by	string	optional	canonical_url, content_hash, or a column name.
scrape_options	object	optional	Same shape as /v1/scrape. Applied per page.
use_session	string	optional	Triggers the cookie bridge.
dataset	object	required	{ type, name } to create, or { id } to append.
webhook_url	string	optional	HTTPS URL for events.
webhook_events	string[]	optional	run.completed, row.added, run.failed.

Response

What you get back.

crawl-response.json

json

{
  "run_id": "run_01HZ...",
  "status": "queued",
  "dataset_id": "ds_xyz",
  "poll_url": "https://api.stekpad.com/v1/runs/run_01HZ...",
  "estimated_credits": 500
}

Errors

Structured, typed, actionable.

Code	When
invalid_seed_url	Seed URL is malformed.
crawl_quota_exceeded	max_pages reached — partial dataset is still saved.
crawl_duration_exceeded	max_duration_seconds reached.
target_blocked	Site blocked the crawler globally.
insufficient_credits	Wallet hit 0 mid-crawl. The run pauses; top up to resume.

Every error includes a code, a human message, and a guidance field with the exact action to take.

The other four verbs.

scrape1 credit

Fetch one URL. Markdown, JSON, HTML or a screenshot.

POST /v1/scrape
{ "url": "https://example.com" }

map1 / 1k URLs

Discover URLs without rendering pages. Walks sitemaps in seconds.

POST /v1/map
{ "url": "https://example.com" }

extract5 / URL

JSON Schema in, validated structured fields out.

POST /v1/extract
{ "urls": ["https://acme.com/about"], "schema": {...} }

search5 + 1 / result

Web search and scrape, same call. Built for AI agents.

POST /v1/search
{ "query": "best RAG frameworks", "scrape_results": true }

Try /v1/crawl now.

Get an API key, paste the curl, watch the row land in your dataset.

Get your API key Read the docs

Name

Type

Required

Description

url

string

required

Seed URL. Crawl starts here.

limits.max_pages

int

optional

Hard cap on pages crawled. Default 100.

limits.max_depth

int

optional

Link depth from seed. Default 5.

limits.max_duration_seconds

int

optional

Wall-clock cap. Default 1800.

include_paths

string[]

optional

Glob patterns. Only matching paths are crawled.

exclude_paths

string[]

optional

Glob patterns. Matching paths are skipped.

respect_robots

boolean

optional

Default true. Override gated to Cloud Growth+.

deduplicate_by

string

optional

canonical_url, content_hash, or a column name.

scrape_options

object

optional

Same shape as /v1/scrape. Applied per page.

use_session

string

optional

Triggers the cookie bridge.

dataset

object

required

{ type, name } to create, or { id } to append.

webhook_url

string

optional

HTTPS URL for events.

webhook_events

string[]

optional

run.completed, row.added, run.failed.

Code

When

invalid_seed_url

Seed URL is malformed.

crawl_quota_exceeded

max_pages reached — partial dataset is still saved.

crawl_duration_exceeded

max_duration_seconds reached.

target_blocked

Site blocked the crawler globally.

insufficient_credits

Wallet hit 0 mid-crawl. The run pauses; top up to resume.