Skip to article
Deep dive

19 native enrichers vs third-party vendors — a case for owning the stack

Why Stekpad ships 19 enrichers in the box instead of reselling third-party APIs — and what that means for your data, pricing, and trust.

Stekpad Team11 min read
On this page

A growth team's enrichment stack usually looks like this. Hunter for email finding. Apollo for titles and phones. Clearbit for company info. Proxycurl for LinkedIn. NumVerify for phone validation. Maybe Clay on top to glue them all together. Six invoices. Six API keys in the same Airtable script. Six copies of your prospect list sitting in six vendor databases.

We picked a different answer. Stekpad ships 19 enrichers in the box and calls zero third-party data vendors. Your scraped data never leaves our stack. There is one invoice. There is one credit budget. There is one place to look when a column is wrong.

This post is the honest case for that decision. Not a manifesto, not a hit piece. The numbers on a realistic 10,000-company enrichment, the privacy and compliance story we hear from customers who used to stitch vendors, the architectural cost of data leaving your stack, and the three places where a vendor-stitched product is still the right call.

The 19 enrichers, one line each

Here is the whole catalog. 1 credit per row per enricher unless marked.

LLM family (6 enrichers)

  • ai_extract — pass a JSON schema and a prompt, get validated fields back, with Gemma-to-Haiku-to-Sonnet cascade on failure.
  • ai_summary — one-line, paragraph, or bullet summaries of any text column.
  • ai_classify — multi-label classification with labels you provide, returns class and confidence.
  • ai_translate — column-to-column translation, ISO 639-1 target languages.
  • ai_clean — boilerplate and whitespace normalization on scraped text.
  • ai_embed — 1024-dim embeddings from bge-m3, via Cloudflare Workers AI through Vercel AI Gateway.

Web-derived family (6 enrichers)

  • find_emails — scrapes /contact, /about, /legal, footer, filters disposables.
  • find_phones — regex plus schema.org plus libphonenumber, returns an array.
  • find_socials — LinkedIn, Twitter, Facebook, Instagram, GitHub, YouTube, TikTok.
  • find_tech_stack — Wappalyzer fingerprints under the MIT license.
  • find_company_info — name, description, logo, country, language, from OG, schema.org, meta, and TLD heuristics.
  • find_favicon — cached on Stekpad CDN, replaces every "favicon API" service.

Validation family (3 enrichers)

  • email_verify — SMTP handshake, ~85 % precision, honest catch-all flagging.
  • whois — TCP/43 WHOIS with registrant, registrar, created and expires dates.
  • dns — A, MX, TXT, SPF, DMARC, DKIM presence, via Node's DNS resolver.

Specialized family (4 enrichers)

  • phone_enrich — libphonenumber (Google, MIT), replaces NumVerify.
  • company_enrich — OpenCorporates plus Companies House (UK) plus SIRENE/INSEE (FR) plus EDGAR (US) plus scrape fallback.
  • email_finder — permutation plus pattern detection plus SMTP verification, with an honest confidence field.
  • geocode — Nominatim on OpenStreetMap, attribution required, self-hostable in v1.5.

Premium add-on (v1.5)

  • linkedin_enrich — 12 credits per row, powered by Proxycurl, explicitly labeled on the row. This is the only place we route to a third-party vendor, and we tell you about it.

The 10,000-company example

Let's run the same enrichment two ways. You have a list of 10,000 company domains — call it your demand-gen pipeline for the next quarter. For each domain you want: the company name, an industry classification, the tech stack, up to three general emails, email validity, and the LinkedIn URL.

The vendor-stitched way

  • Hunter Domain Search for emails. Business plan: ~$149/month for 10,000 credits. One domain costs one credit. You are at the edge of your plan.
  • Apollo for titles and firmographics. Professional plan: ~$99 per seat per month, rate-limited to 10,000 credits a month on bulk enrichment.
  • Clearbit (now HubSpot Breeze Intelligence) for company info. Starter pricing is effectively gated to HubSpot customers; historical list price on a standalone plan was ~$99 per seat with per-call top-ups.
  • BuiltWith or Wappalyzer for tech stack. BuiltWith Pro: $295/month.
  • NumVerify for phone validation (if you got phones out of Apollo). Professional plan: ~$39/month.
  • Clay on top to orchestrate the four APIs above. Starter: $149/month for 2,000 "people credits".

Cash outlay before you have enriched a single row: roughly $830/month in base subscriptions, not counting overage. Time to first row: a week of wiring, testing, and arguing with four sets of rate limits. Number of databases your prospect list now lives in: six. Number of GDPR data processing agreements your legal team has to maintain: six.

And the kicker — Clay is a meta-layer that itself resells the underlying vendors. When a row is wrong, you have a two-hop support chain: Clay blames Hunter, Hunter blames Clay.

The Stekpad way

You load the 10,000 rows into a dataset. You attach a pipeline.

bash
curl -X POST https://api.stekpad.com/v1/datasets/ds_123/pipeline \
-H "Authorization: Bearer stkpd_live_..." \
-H "Content-Type: application/json" \
-d '{
"steps": [
{ "enricher": "find_company_info", "input": "domain" },
{ "enricher": "ai_classify", "input": "description",
"labels": ["B2B","B2C","SaaS","eCom","agency","marketplace"] },
{ "enricher": "find_tech_stack", "input": "domain" },
{ "enricher": "find_emails", "input": "domain" },
{ "enricher": "email_verify", "input": "emails" },
{ "enricher": "find_socials", "input": "domain" }
]
}'

Six enrichers on 10,000 rows: 60,000 credits. On the PAYG Pro pack (50,000 credits for 99 €) plus a 10,000 credit top-up, that is ~120 € one-time. Credits last 12 months. No subscription. One invoice. One API key. One database (yours — on Stekpad).

The math speaks for itself, but the math is not the point.

Why data leaving your stack is the real problem

Cost is easy to rationalize. You can always find a VC-backed vendor willing to burn margin. The structural cost is harder to see until you are already in it.

Compliance surface area. Every third-party vendor in your enrichment chain is a data processor under GDPR. Every data processor needs a DPA. Every DPA needs to be renewed, audited, and rotated when the vendor gets acquired. Six vendors means six DPAs, six sub-processor lists to reconcile against your own, and six risk reviews from your legal team when a new customer asks. One vendor means one.

Data residency. A lot of our customers are European. When a prospect domain gets sent to a US enrichment vendor, it crosses the Atlantic, lands in a US database, and stays there — often with no honest way to get it deleted. Stekpad runs its enrichers inside our own EU and US infrastructure and we honor residency on request. Your data never touches a third party's database, because there is no third party.

Latency stacking. Each vendor hop adds 200–800 ms and a tail-latency risk. A pipeline that chains Hunter → NumVerify → Clearbit → Proxycurl is a p99 time bomb. In-house enrichers share a single worker pool, a single cache, and a single back-pressure queue. No hop, no tail.

Duplicate data in breach notifications. When Clearbit gets breached (they did, in 2017), your prospect list is in the breach. When Apollo gets breached (they did, in 2018 — 125 million records), your prospect list is in the breach. Every vendor you add is another breach you might have to disclose to your own customers.

Support blame-shifting. "That field came from a third-party vendor, please contact them" is the most common excuse in enrichment support tickets. When Stekpad is wrong, we are wrong. The buck stops at one company.

Stekpad vs Clay vs Apollo vs Hunter

Here is how we think about the four products most commonly named in the same sentence.

Stekpad

  • Surface: scraping verbs plus dataset storage plus 19 enrichers, wired into a first-class MCP server.
  • Vendors under the hood: none, except the opt-in linkedin_enrich add-on.
  • Pricing: PAYG credits, no subscription, 12-month expiry. 9 € for 2,000 credits, 99 € for 50,000.
  • Where your data lives: Stekpad only.
  • When it wins: you want one bill, one API key, one trust boundary, and you are okay without the "add 100 vendors through one UI" meta-play.

Clay

  • Surface: a no-code UI on top of 100+ third-party data vendors.
  • Vendors under the hood: Hunter, Apollo, Clearbit, Proxycurl, BuiltWith, and dozens more — you pay in Clay credits, Clay pays the vendors.
  • Pricing: subscription, starts at $149/month for 2,000 "people credits", scales up to thousands of dollars per month for higher volumes.
  • Where your data lives: Clay plus every vendor Clay routes to.
  • When it wins: you want the long tail of weird data sources (person-level firmographics, specific industry databases) and you are okay with the vendor-sprawl tradeoff.

Apollo

  • Surface: a sales engagement platform with a person and company database of ~275 million contacts, plus outbound email sequencing.
  • Vendors under the hood: their own database, rebuilt from public sources and opt-in syncs.
  • Pricing: per-seat subscription, ~$49–$99 per seat per month for enrichment usage, often gated behind a sales call for higher tiers.
  • Where your data lives: Apollo.
  • When it wins: you want a ready-made contact database with sequencing, and you are buying the people product, not the API.

Hunter

  • Surface: email finding and verification, plus a small campaign tool.
  • Vendors under the hood: their own crawl and pattern database.
  • Pricing: subscription, ~$49/month for 500 searches on the Starter plan, scales up.
  • Where your data lives: Hunter.
  • When it wins: you only need email finding on a modest volume and you want one focused tool for it.

The honest verdict: Clay is the "one UI for every vendor" play and wins when you need the long tail. Apollo is the "managed contact database" play and wins when you want to skip building a list. Hunter is the focused email finder. Stekpad is the "own the stack, one trust boundary" play — we are the right choice when your data must stay in your lane, when you care about GDPR surface area, and when you also need the scraping half of the pipeline in the same product.

The things we do not claim

We do not claim 98 % email deliverability. Nobody can. We claim ~85 % on non-catch-all domains and we flag catch-alls honestly. That is what `email_verify` returns in the reason and catch_all fields.

We do not claim 275 million contacts. We do not maintain a contact database at all. If you need "all CTOs at SaaS companies in Berlin" as a ready-made list, buy Apollo. Stekpad enriches the list you already have — or helps you build one from scraping, with search plus extract.

We do not claim full LinkedIn coverage. LinkedIn is a hard target and we are honest about it. linkedin_enrich is a premium, opt-in add-on powered by Proxycurl, labeled on the row, shipped in v1.5. If LinkedIn data is your core need, read our take and make an informed choice.

We do not claim zero overlap with Clay. Clay is a great product for a different job. Our comparison page spells out the overlap and the gap.

When a third-party vendor is still the right call

Three situations where you should not use Stekpad's in-house enrichers.

You already live in Clay and your workflow is five layers deep. Switching a mature Clay workspace to Stekpad is a project. If the vendor-stitched pipeline works and the bill is not the bottleneck, do not move for the sake of moving. Use Stekpad for the scraping half and leave Clay where it is.

You need a very specific vendor signal that we do not replicate. For example, G2 intent data, 6sense intent data, ZoomInfo scoops. These are contract-licensed datasets. We cannot and will not rebuild them in-house. If that is the signal you sell your team on, buy it from the source.

You need a single ready-made contact database with outbound sequencing. Stekpad does not send email. We never will. If you want a tool that finds the contact and sends the email in the same product, buy Apollo or Outreach.

Outside those three cases, the math and the architecture both point the same direction.

The architectural argument, in one diagram you can draw on a napkin

Picture two boxes.

Box A — vendor-stitched. Your prospect list sits in a Google Sheet. An Airtable script fans out to Hunter, Apollo, Clearbit, Proxycurl, NumVerify, and Clay. Each of those services owns a copy of every row you send. Each of those copies has its own retention policy, its own breach history, its own deletion workflow. Six arrows out, six arrows back, six copies of the data, six invoices. The trust surface is the union of six companies.

Box B — in-house. Your prospect list sits in a Stekpad dataset. The dataset is attached to a pipeline. The pipeline is six enrichers. Every enricher runs inside the same Stekpad worker pool. No arrows out. One copy of the data. One invoice. One trust surface.

Box B is strictly smaller. Every security review we have ever been part of ends with the same sentence: "fewer third parties is the point". When a buyer's security team has to tick a box that says list every subprocessor that touches customer-adjacent data, the difference between six vendors and one is the difference between a two-week review and a same-day sign-off.

The observability argument, because enrichment silently breaks

Enrichment pipelines fail quietly. A vendor changes a field name from industry to categories. Another vendor deprecates their v1 API. A third starts returning null for rows that used to work. Your CSV keeps exporting. Your campaign keeps sending. Three weeks later you notice the open rate is down because half the personalization columns are blank.

With one vendor, one debug path. You open the Stekpad run log for the enrichment run, see which rows failed, which enricher failed, and what the error code was. You re-run with force: true. Done.

With six vendors, six debug paths. You open Hunter's dashboard, open Apollo's dashboard, open Clearbit's dashboard, open Proxycurl's dashboard, open NumVerify's dashboard, open Clay's run history. You correlate by row. You cross-reference IDs. You file three support tickets. You wait two business days.

The feedback loop is the product. If you cannot answer which enricher broke my row in under 30 seconds, your enrichment pipeline is not a feature, it is a liability.

A note on Proxycurl and linkedin_enrich

We routed one enricher to a third-party vendor. Exactly one. linkedin_enrich is powered by Proxycurl and we say so out loud on the row, in the docs, and on the pricing page. 12 credits per row, opt-in per call, v1.5. We did this because LinkedIn is a hard, contested target and we would rather be honest than pretend we have it covered in-house. If you do not use linkedin_enrich, zero rows leave Stekpad. If you do, the label tells you exactly what happened. That is the bar.

Next steps

Stekpad Team
We build Stekpad. We scrape the web, store it, and enrich it — from an API, from an app, or from Claude.

Try the API. Free to start.

3 free runs a day on the playground. No credit card. Install MCP for Claude in 60 seconds.

19 native enrichers vs third-party vendors — a case for owning the stack — Stekpad — Stekpad