POST /v1/extract

POST /v1/extract

Pass a JSON Schema and an optional prompt. Get back validated structured fields from any URL. Five credits per URL.

Get your API keyCost · 5 credits per URL
Overview

What this verb does.

extract scrapes each URL with the same engine as scrape, then passes the rendered markdown plus your schema (and optional prompt) through the LLM cascade: Gemma 4 by default, Claude Haiku on schema fail, Claude Sonnet for premium opt-in.

The output is validated against your schema. On failure, the model is asked to repair the JSON; if it still fails, the call escalates one tier and retries. After 2 retries, the URL is marked failed. You only pay 5 credits per URL, regardless of retries.

Request

Make the call.

Pick your language. Every snippet is a real, runnable example.

curl -X POST https://api.stekpad.com/v1/extract \
-H "Authorization: Bearer stkpd_live_..." \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://acme.com/about"],
"schema": {
"type": "object",
"properties": {
"company_name": { "type": "string" },
"founded_year": { "type": "integer" }
}
},
"prompt": "Extract the company name and founding year."
}'
Parameters

Request body.

NameTypeRequiredDescription
urlsstring[]requiredOne URL = sync. Two or more = async.
schemaJSON SchemarequiredDrives validation and the LLM call.
promptstringoptionalOptional natural-language hint.
modelstringoptionalgemma-4-e4b (default), claude-haiku-4, claude-sonnet-4 (premium).
use_sessionstringoptionalTriggers the cookie bridge per URL.
datasetobjectoptional{ type, name } to land results as rows.
Response

What you get back.

extract-response.json
json
{
"run_id": "run_01HZ...",
"results": [
{
"url": "https://acme.com/about",
"json": {
"company_name": "Acme Corp",
"founded_year": 1947,
"employees": [
{ "name": "Wile E. Coyote", "role": "CEO" }
]
},
"model_used": "gemma-4-e4b"
}
],
"credits_charged": 5
}
Errors

Structured, typed, actionable.

CodeWhen
schema_invalidYour input schema is itself invalid JSON Schema.
schema_validation_failedTwo retries failed to produce schema-valid output.
model_unavailableThe requested model is not available on your plan.
target_blockedUnderlying scrape failed.

Every error includes a code, a human message, and a guidance field with the exact action to take.

Try /v1/extract now.

Get an API key, paste the curl, watch the row land in your dataset.

POST /v1/extract — LLM-structured extraction | Stekpad API — Stekpad