POST /v1/extract — LLM-structured extraction

POST /v1/extract

Pass a JSON Schema and an optional prompt. Get back validated structured fields from any URL. Five credits per URL.

Get your API keyCost · 5 credits per URL

Overview

What this verb does.

extract scrapes each URL with the same engine as scrape, then passes the rendered markdown plus your schema (and optional prompt) through the LLM cascade: Gemma 4 by default, Claude Haiku on schema fail, Claude Sonnet for premium opt-in.

The output is validated against your schema. On failure, the model is asked to repair the JSON; if it still fails, the call escalates one tier and retries. After 2 retries, the URL is marked failed. You only pay 5 credits per URL, regardless of retries.

Request

Make the call.

Pick your language. Every snippet is a real, runnable example.

curl -X POST https://api.stekpad.com/v1/extract \
  -H "Authorization: Bearer stkpd_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://acme.com/about"],
    "schema": {
      "type": "object",
      "properties": {
        "company_name": { "type": "string" },
        "founded_year": { "type": "integer" }
      }
    },
    "prompt": "Extract the company name and founding year."
  }'

Parameters

Request body.

Name	Type	Required	Description
urls	string[]	required	One URL = sync. Two or more = async.
schema	JSON Schema	required	Drives validation and the LLM call.
prompt	string	optional	Optional natural-language hint.
model	string	optional	gemma-4-e4b (default), claude-haiku-4, claude-sonnet-4 (premium).
use_session	string	optional	Triggers the cookie bridge per URL.
dataset	object	optional	{ type, name } to land results as rows.

Response

What you get back.

extract-response.json

json

{
  "run_id": "run_01HZ...",
  "results": [
    {
      "url": "https://acme.com/about",
      "json": {
        "company_name": "Acme Corp",
        "founded_year": 1947,
        "employees": [
          { "name": "Wile E. Coyote", "role": "CEO" }
        ]
      },
      "model_used": "gemma-4-e4b"
    }
  ],
  "credits_charged": 5
}

Errors

Structured, typed, actionable.

Code	When
schema_invalid	Your input schema is itself invalid JSON Schema.
schema_validation_failed	Two retries failed to produce schema-valid output.
model_unavailable	The requested model is not available on your plan.
target_blocked	Underlying scrape failed.

Every error includes a code, a human message, and a guidance field with the exact action to take.

The other four verbs.

scrape1 credit

Fetch one URL. Markdown, JSON, HTML or a screenshot.

POST /v1/scrape
{ "url": "https://example.com" }

crawl1 / page

Walk an entire site. Async, with webhooks and a run_id you can poll.

POST /v1/crawl
{ "url": "https://example.com" }

map1 / 1k URLs

Discover URLs without rendering pages. Walks sitemaps in seconds.

POST /v1/map
{ "url": "https://example.com" }

search5 + 1 / result

Web search and scrape, same call. Built for AI agents.

POST /v1/search
{ "query": "best RAG frameworks", "scrape_results": true }

Try /v1/extract now.

Get an API key, paste the curl, watch the row land in your dataset.

Get your API key Read the docs

Name

Type

Required

Description

urls

string[]

required

One URL = sync. Two or more = async.

schema

JSON Schema

required

Drives validation and the LLM call.

prompt

string

optional

Optional natural-language hint.

model

string

optional

gemma-4-e4b (default), claude-haiku-4, claude-sonnet-4 (premium).

use_session

string

optional

Triggers the cookie bridge per URL.

dataset

object

optional

{ type, name } to land results as rows.

Code

When

schema_invalid

Your input schema is itself invalid JSON Schema.

schema_validation_failed

Two retries failed to produce schema-valid output.

model_unavailable

The requested model is not available on your plan.

target_blocked

Underlying scrape failed.