Storage is the product.
Every scrape lands in a dataset you own. Re-query it tomorrow. Add a column. Export it. Re-run the source.
The dataset is the unit.
A dataset in Stekpad is a persistent, editable container with one of two shapes:
- `table` — structured rows with typed columns. Use it for products, companies, jobs, articles, leads.
- `markdown_bundle` — a collection of markdown files addressable by canonical URL. Use it for docs sites, blog archives, RAG corpora.
Every dataset has a workspace owner, a name and description, a `source_spec` (the scrape/crawl/map config that built it, persisted so you can re-run any time), a soft-deletable archive state, and a retention policy from the workspace plan.
Other APIs return by value.
Other scraping APIs return data by value. You get a JSON blob, you save it somewhere, you forget the request. Next week you want to add a column → you re-scrape the whole list and reconcile.
Stekpad gives you the JSON blob and a row in a dataset you can re-query, edit, enrich, and export. The dataset is re-runnable from its source_spec. The columns are typed. The rows have versions. Storage is not a feature — it is the product.
What rows know about themselves.
Every row in a `table` dataset carries metadata columns:
_scraped_at— when it was first written_scraped_version— increments on re-scrape if content changed_changed_at— when the content hash last changed_source_run_id— the run that produced it_content_hash— for change detection
This is what makes change monitoring (Cloud Starter+) possible: re-scrape, compare hash, fire `row.changed` webhook.
What you can do with a dataset.
- Query rows — REST `GET /v1/datasets/:id/rows?filter=...`, MCP `query_dataset`. Free.
- Add a column manually — paste a value, write a formula, run an enricher.
- Re-enrich — kick off any of the 19 enrichers on the rows.
- Re-run the source — replay the original scrape/crawl with one click.
- Export — CSV, JSON, Markdown bundle zip, Google Sheets live sync.
- Pipe to a webhook — `row.added`, `row.changed`, `enrichment.completed` events.
# Create a dataset implicitly by scraping into onecurl -X POST https://api.stekpad.com/v1/scrape \ -H "Authorization: Bearer stkpd_live_..." \ -d '{ "url": "https://example.com/product/42", "dataset": { "type": "table", "name": "Example products" } }' # Append more rows to the same datasetcurl -X POST https://api.stekpad.com/v1/scrape \ -H "Authorization: Bearer stkpd_live_..." \ -d '{ "url": "https://example.com/product/43", "dataset": { "id": "ds_abc", "mode": "append" } }' # Query the datasetcurl https://api.stekpad.com/v1/datasets/ds_abc/rows?limit=10 \ -H "Authorization: Bearer stkpd_live_..."By plan.
| Plan | Retention |
|---|---|
| Free | 7 days |
| Packs | 30 days |
| Cloud Starter | 90 days |
| Cloud Growth | 1 year |
| Cloud Scale | Unlimited |
Retention is per-dataset, inherited from the workspace plan at the time of creation.
Common questions.
Can I disable storage?
Yes — pass `persist: false` on any verb. The response still contains the data; nothing is stored.
Can I rename a column?
Yes, from the dashboard. The underlying type stays.
Can I convert a `table` to a `markdown_bundle`?
No — type is immutable. A table can have a content_markdown column, which is the path for users who want both.
How are rows deduplicated?
By canonical URL by default. Override with `primary_key: ["sku", "region"]` at dataset creation.
Every scrape, in a dataset you own.
Sign up free. 300 credits a month. Re-runnable from the source_spec.