# Website Contact Extractor - Emails & Phones (`harvestlab/contact-extractor`) Actor

Website contact extractor for B2B lead lists and CRM enrichment. Extract emails, phones, social profiles, addresses, contact pages, tech signals, role labels, and optional MCP connector summaries.

- **URL**: https://apify.com/harvestlab/contact-extractor.md
- **Developed by:** [Nick](https://apify.com/harvestlab) (community)
- **Categories:** Lead generation, Business, MCP servers
- **Stats:** 82 total users, 34 monthly users, 98.9% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.50 / 1,000 lead discovereds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Website Contact Extractor - Emails & Phones

Use this website contact extractor and email finder to turn public domains or URLs into CRM-ready B2B lead records: emails, phone numbers, social profiles, addresses, contact-page URLs, tech stack signals, optional email verification, phone validation, and role labels. It is built for teams that need fresh contact data from company websites, not another static lead database.

**Best first runs**

Cheapest smoke test:

```json
{
  "domains": ["python.org"],
  "maxWebsites": 1,
  "maxPagesPerSite": 3,
  "includeSubpages": true,
  "detectTechStack": false,
  "verifyEmails": false
}
````

CRM enrichment test:

```json
{
  "domains": ["python.org"],
  "maxWebsites": 1,
  "maxPagesPerSite": 5,
  "includeSubpages": true,
  "detectTechStack": true,
  "verifyEmails": false
}
```

Use this actor when you need a website contact extractor, email finder from website, contact details extractor, B2B lead extractor, or CRM enrichment workflow before outreach, market mapping, recruiting, or data cleanup. Start with a small run to confirm the target domain works, then turn on `verifyEmails`, `deepEmailVerification`, or `enableAiAnalysis` only after the basic run works.

**Product focus**

- **Best for:** company-domain enrichment, website contact extraction, email finder from website lists, CRM cleanup, ABM account research, recruiting research, and tech-stack segmentation.
- **Not best for:** guaranteed personal emails, logged-in databases, hidden contacts, or bulk spam list building.
- **Why choose it:** one run returns contact details plus context your CRM actually needs: source URL, contact-page URL, validated phones, socials, optional email checks, and tech-stack signals.
- **Resilience:** if a homepage is blocked or empty, the actor tries canonical host variants, sitemap contact URLs, and common contact/about/legal paths before returning a no-charge diagnostic.

**What you get back**

- One row per submitted site with normalized `url`, discovered `emails`, `phones_validated`, `social_profiles`, `address`, `contact_page_url`, and `tech_stack`.
- `result_type` shows whether the row contains `contact_data`, a no-charge `partial_result`, a no-charge `fetch_failed`, `site_timeout`, or `site_error` diagnostic. `fetch_diagnostics` lists attempted/reached URLs, timeout details, or sanitized per-site error details for debugging blocked and slow sites.
- Optional `emails_verified` and `ai_contact_analysis` fields appear only when the paid verification or AI toggles are enabled. `maxEmailsToVerify` caps paid email verification checks across the run.
- Good starter run: 5 pages per site, tech detection on, verification off. Then re-run only promising domains with email verification enabled.

Example output shape:

```json
{
  "url": "https://example.com",
  "domain": "example.com",
  "result_type": "contact_data",
  "billable_contact_found": true,
  "emails": ["sales@example.com", "support@example.com"],
  "phones_validated": [
    {"raw": "+1 415 555 0100", "e164": "+14155550100", "country": "US", "type": "FIXED_LINE", "valid": true}
  ],
  "social_profiles": {
    "linkedin": "https://www.linkedin.com/company/example",
    "github": "https://github.com/example"
  },
  "contact_page_url": "https://example.com/contact",
  "tech_stack": ["Shopify", "Cloudflare", "Google Analytics"],
  "fallback_used": false,
  "fetch_diagnostics": {
    "pages_attempted": ["https://example.com", "https://example.com/contact"],
    "pages_reached": ["https://example.com", "https://example.com/contact"]
  },
  "scraped_at": "2026-05-10T18:15:00Z"
}
```

**Best for:** website contact extraction, email finder from website/domain lists, B2B prospect lists, CRM cleanup, tech-stack segmentation, contact-page discovery, and fresh lead enrichment from company domains.

**Not best for:** private databases, logged-in pages, websites that intentionally hide contact details, or guaranteed personal decision-maker emails on every domain.

> **No cookies, no OAuth, no API keys (for the scraping)** - runs on public data only. AI keys are optional and only needed when `enableAiAnalysis: true`.

**Pay-per-event pricing. Verified mailboxes (optional). CRM-ready in seconds.** This actor is pay-as-you-go: pay $0.01 per site enriched, with optional deep mailbox verification billed only when a definitive result is returned. No seats, no contract, no monthly minimum.

### Website Contact Extractor

Give the actor a list of domains or URLs and it crawls each public website for contact details that are visible on the homepage and high-signal subpages such as contact, about, imprint, legal, and team pages. Results are normalized into one CRM-ready row per website, with raw contacts separated from verified emails, validated phones, social profiles, address data, contact-page URLs, and technology signals.

### Contact Details Extractor

Extract public contact details from company websites in a format your CRM can use immediately: email addresses, phone numbers, social links, physical addresses, contact-page URLs, and technology signals. Use it for contact discovery, enrichment, deduplication, and fresh lead records before outreach.

### Email Finder from Website

Use it as an email finder from website and domain lists when you need fresh outreach data before a campaign. The actor finds public role, team, and company addresses, adds optional deliverability checks, keeps contact-page context, and returns each website as a structured row that can move directly into a CRM, spreadsheet, or enrichment pipeline.

### B2B Lead Extractor

Extract emails with optional mailbox verification, validated phone numbers (E.164 + country + line-type), **16 social media profiles**, physical addresses, and **175+ tech stack signals** from any public website. Give the actor a list of domains or URLs and get structured, CRM-ready contact data back in seconds. Built for SDRs, AEs, RevOps, recruiters, ABM marketers, and M\&A researchers who need fast, accurate website enrichment without an annual seat license.

For each domain you provide, the actor checks the homepage and high-signal subpages such as `/contact`, `/about`, and `/imprint`; extracts public emails and business contact data; validates phone numbers; identifies 16 social profiles; parses physical addresses; detects 175+ tech stack signals; and can add email verification plus AI role classification. All output is structured JSON, immediately importable into HubSpot, Salesforce, Pipedrive, Outreach, Salesloft, Apollo, Google Sheets, Airtable, or any CRM via Apify's native integrations and webhooks.

### Features

- **Batch-safe runs** - Each website is handled independently, so one slow or malformed target does not wipe out the whole batch. Completed results are preserved whenever possible.
- **Deep mailbox verification (v1.7)** - Opt in with `deepEmailVerification: true` to classify MX-cleared emails as **`deliverable`**, **`undeliverable`**, **`catchall`**, **`greylisted`**, **`port_blocked`**, **`rate_limited`**, or **`error`**. The check is rate-limited per mail host, includes catch-all detection, and charges $0.02 only for definitive deliverable or undeliverable verdicts. Requires `verifyEmails: true`.
- **Email deliverability verification (v1.6)** - Opt in with `verifyEmails: true` to check whether extracted email domains can receive mail, catch common domain typos, flag free-inbox and disposable hosts, and tag every entry **HIGH / MEDIUM / LOW / UNKNOWN** deliverability. Emits an `emails_verified` output field with practical fields such as MX status, typo suggestion, role-based detection, disposable-host flag, and deliverability tier. Charged $0.01 per MX-cleared email; typo, disposable, no-MX, and unknown results are free.
- **Phone validation (v1.5)** - Extracts phone numbers in international formats, then normalizes valid numbers into `phones_validated` with E.164 format, country code, and line type. Numbers that parse but fail strict regional checks end up in `phones_uncertain` so you can triage them separately.
- **Social media profiles (16 platforms)** - Identifies LinkedIn, Twitter/X, Facebook, Instagram, YouTube, GitHub, TikTok, Pinterest, Telegram, Discord, Threads, **WhatsApp**, **Snapchat**, **Vimeo**, **Twitch**, and **Mastodon** profiles linked from the website. (v1.9: +5 platforms - parity with vdrmota's 15+ claim, plus Mastodon.)
- **Physical address parsing** - Locates public business addresses and returns them alongside the contact record.
- **Contact-page URL capture** - Returns the canonical /contact or /kontakt page URL so you can deep-link prospects straight into the form.
- **Tech stack detection (175+ signals)** - Identifies technologies across CMS platforms, frontend frameworks, analytics tools, chat widgets, CDNs, and e-commerce solutions.
- **Smart subpage crawling** - Automatically visits /contact, /about, /imprint, and similar pages where contact information typically lives.
- **Structured business data capture** - Captures public company details that websites publish for search engines and business directories.
- **Domain input support** - Enter bare domains like "example.com" without needing to add the protocol. HTTPS is added automatically.
- **Duplicate handling** - Automatically deduplicates when the same URL appears in both `urls` and `domains` inputs.
- **Optional AI contact enrichment** - When enabled (`enableAiAnalysis`), AI classifies each email by role (sales, support, hr, legal, executive, general, personal), groups near-duplicate team addresses (`dedup_groups`), flags non-monitored no-reply inboxes (`deliverability_flags`), picks the single best `primary_contact` for B2B outreach, and picks the single best `primary_phone` from the validated set. Supports OpenRouter, Anthropic, Google AI, OpenAI, or self-hosted Ollama.

#### What makes this different from other Apify contact scrapers

- **16 social platforms vs 6** - LinkedIn (with company-profile-over-personal selection), Twitter/X, Facebook, Instagram, YouTube, GitHub, TikTok, Pinterest, Telegram, Discord, Threads, WhatsApp, Snapchat, Vimeo, Twitch, Mastodon. Competitors typically stop at the first six. (v1.9 expanded 11 -> 16.)
- **Carrier-format phone split** - `phones_validated` gives you E.164 + country + line-type per number. Most Apify contact scrapers ship a single raw `phones` list and leave cleanup to you.
- **MX + mailbox deliverability** - every email can be checked against DNS MX records and, when deep verification is enabled, mailbox-level acceptance signals. Outputs include HIGH / MEDIUM / LOW / UNKNOWN deliverability, typo suggestions, disposable-host detection, and mailbox verdicts such as `deliverable`, `undeliverable`, `catchall`, `greylisted`, and `port_blocked`.
- **175+ tech-stack detectors** on the same fetch - one run gets you contacts + CMS + analytics + chat + CRM + payment stacks. Competitors either do contacts *or* tech but rarely both.
- **Integrated AI enrichment** - role classification, dedup, primary-contact + primary-phone pick, and deliverability flags.
- **Multiple AI provider choices** - Use hosted providers or a self-hosted model for teams that need stricter data-control workflows.

#### Head-to-head: vs the dominant Apify contact scraper

This Actor competes directly against `vdrmota/contact-info-scraper` (the most-installed contact actor on the Apify Store). Where each wins:

| Feature | This Actor (harvestlab) | vdrmota/contact-info-scraper |
|---|---|---|
| **Mailbox verification** | Yes - deliverable/undeliverable/catchall/greylisted verdicts on extracted emails | No - third-party "lead enrichment" only |
| **Phone validation** | Yes - E.164 + ISO country + line-type (MOBILE/FIXED\_LINE/TOLL\_FREE/...) | No - raw phone list, no validation |
| **Tech stack detection** | Yes - 175+ signals on the same fetch (CMS, frameworks, CDN, analytics, payment) | No - not offered |
| **AI email-role classifier** | Yes - built-in (sales / support / hr / legal / executive / general / personal) | No - not offered |
| **AI provider choice** | Yes - hosted and self-hosted options | No - no AI layer |
| **Social platforms** | **16** (LinkedIn, X, FB, IG, YT, GitHub, TikTok, Pinterest, Telegram, Discord, Threads, WhatsApp, Snapchat, Vimeo, Twitch, Mastodon) | 15+ |
| **Per-result price** | $0.01 / verified contact | $0.00105 / page |
| **Rating on Apify Store** | New / unrated | 3.4/5 across 77 reviews |

**Why pay more per result?** Because `$0.00105 x 1,000 unvalidated raw pages` is not the same product as `$0.01 x 1,000 verified, phone-normalized, role-classified contact records`. This actor is optimized for outreach-ready CRM rows rather than raw pages that still need cleanup.

**If your priority is raw-page volume and you'll do verification yourself downstream**, vdrmota's actor is cheaper. **If your priority is contacts that don't bounce on the first send**, this actor's optional deep verification can reduce bounce risk for outbound runs.

### Use Cases

#### SDR / AE Outbound Prospecting

Enrich prospect lists with verified company emails before outreach. Upload target company domains and get emails, validated phones, LinkedIn / 16 socials, addresses, and 175+ tech-stack signals - drop straight into HubSpot, Salesforce, Pipedrive, Outreach, or Salesloft via Apify's native integrations. AI role classification helps separate shared inboxes such as `info@`, `support@`, and `noreply@` from better outreach targets.

#### Account-Based Marketing (ABM) Lists - match-rate by domain, not name + company guess

Build enriched ABM target lists from a domain seed (e.g. "all SaaS companies running Segment + Marketo on Shopify"). Tech stack detection (175+ signals) lets you filter by **CMS, analytics, CRM, chat widget, payment processor, CDN, search platform**. Combine with `enableAiAnalysis` for primary-contact selection per account.

#### Lead-Gen Agencies & RevOps - pay-per-event beats per-seat

Run hundreds of client enrichment jobs without buying seats per analyst. **Pay only for sites that returned data**: failed fetches and DNS errors are free. Keep source URLs and contact-page context for QA and client delivery.

#### Recruiter Sourcing & Talent Intel - find decision-makers via tech-stack match

Identify hiring managers at companies running specific stacks, such as SaaS teams using modern frontend frameworks or developer tools. Tech-stack detection reveals what languages and platforms a company appears to use today, which can be more current than profile skill tags. Pair with social profile capture to surface candidate-targeting signals.

#### M\&A Due Diligence & Investor Research - rapid target screening

Run a watchlist of acquisition targets through the Actor every quarter. Output captures **company description, address, contact emails, phone numbers, social footprint, and complete tech stack** - letting M\&A analysts and PE / VC associates screen 500-2,000 companies in an afternoon for tech debt, vendor lock-in, growth signals (e.g. "still on jQuery + Bootstrap", "migrated from Magento to Shopify Plus"), and digital maturity.

#### CRM Data Hygiene & Re-enrichment

Upload domains for any CRM segment with empty `phone`, `email`, `linkedin_url`, or `tech_stack` fields and bulk-update with fresh data from the source website. The dual `phones` + `phones_validated` shape lets you pick raw-extract vs carrier-verified per workflow. AI dedup groups `info@` / `hello@` / `contact@` into a single primary contact per account.

#### Competitive Intelligence & Tech-Stack Tracking

Schedule regular scans on competitor domains. Detect when a target migrates CMS, swaps analytics, adds a CDP, or rolls out a new payment processor. The `tech_stack` array surfaces 175+ technologies, including CDPs (Segment, Pendo, PostHog), analytics, search platforms (Algolia, Typesense), build tools (Vite, Webpack), and feature-flag systems (LaunchDarkly).

#### Market Research & Industry Datasets

Build structured datasets of companies in a niche by feeding a domain list and exporting JSON / CSV / Excel. Combine tech stack adoption with contact data to segment markets by technology profile, company digital maturity, and likely deal size - perfect for analyst reports, investor decks, and trend studies.

### Input

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `urls` | array | `[]` | List of full website URLs (e.g., `https://example.com`). |
| `startUrls` / `url` / `website` | array/string | - | API/CLI aliases for URL inputs. `startUrls` is merged with `urls`; singular aliases are accepted for one-off calls. |
| `domains` | array | `[]` | List of bare domains (e.g., `example.com`, `stripe.com`). HTTPS is added automatically. |
| `maxPagesPerSite` | integer | `5` | Number of pages to crawl per site (1-20). More pages means more contact data found but longer runtime. |
| `includeSubpages` | boolean | `true` | Automatically crawl /contact, /about, /imprint, and similar pages. Highly recommended - many sites only list contact info on subpages. |
| `detectTechStack` | boolean | `true` | Detect CMS, frameworks, analytics, and 175+ other technologies. Set to `false` for faster runs when you only need contact data. |
| `verifyEmails` | boolean | `false` | Run real DNS MX lookups on every extracted email, detect typos / disposable / free-inbox domains, and tag each email HIGH / MEDIUM / LOW / UNKNOWN deliverability. Adds the `emails_verified` output field. Cost: $0.01 per MX-cleared email. |
| `deepEmailVerification` | boolean | `false` | Opt-in mailbox-level verification for every MX-cleared email. Classifies each as `deliverable` / `undeliverable` / `catchall` / `greylisted` / `port_blocked` / `rate_limited`. Rate-limited per mail host. Charges $0.02 per definitive verdict only. Requires `verifyEmails: true`. |
| `enableAiAnalysis` | boolean | `false` | Enable AI-powered contact enrichment - email role classification, team-address deduplication, primary-contact selection, and deliverability flags. Requires an API key for your chosen AI provider. |
| `llmProvider` | string | `openrouter` | AI provider when enrichment is enabled: `openrouter`, `anthropic`, `google`, `openai`, or `ollama`. |
| `openrouterApiKey` / `anthropicApiKey` / `googleApiKey` / `openaiApiKey` | string | - | API key for the chosen provider. Ollama (self-hosted) uses `ollamaBaseUrl` instead. |
| `outputConnectors` | array | - | Optional MCP connectors for per-website contact summaries. Sends counts plus extracted public emails, phones, social profiles, and contact metadata to authorized Slack, Notion, GitHub, Sheets, CRM, or other connector tools. |
| `connectorAlertTarget` | string | - | Optional connector destination, such as a Slack channel, Notion database ID, sheet name, table, or CRM list. |
| `proxyConfiguration` | object | `{"useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"]}` | Apify proxy settings. Residential is the default because many contact pages block datacenter IPs. |

You can provide input via `urls`, `startUrls`, `url`, `website`, `domains`, or any combination of them. Duplicates across inputs are removed automatically.

#### Pricing

This Actor uses **pay-per-event (PPE)** pricing and is **x402-ready**. It also ships a $5 Skyfire bundle for automated payment flows that require a minimum charge.

| Event | Price | Description |
|-------|-------|-------------|
| Lead discovered | $0.0015 | Per website successfully scraped in basic mode (`detectTechStack: false`, `verifyEmails: false`, `deepEmailVerification: false`). |
| Contact extracted | $0.01 | Per website successfully analyzed in enriched mode with tech stack, MX and mailbox checks, phone validation, or AI-ready enrichment. |
| Verified phone extracted | $0.02 | Charged once per validated phone number (E.164 + country + line-type). Only fires on numbers that pass strict phone-format validation. |
| Verified email checked | $0.01 | Charged once per email whose domain passes real MX-record verification (HIGH / MEDIUM tier). Typo / disposable / no-MX emails (LOW tier) and UNKNOWN results are free. Only fires when `verifyEmails` is on. |
| Verified email deep-checked | $0.02 | Charged once per email that receives a **definitive** verdict (`deliverable` or `undeliverable`). `catchall`, `greylisted`, `rate_limited`, `port_blocked`, and `error` verdicts are free. Only fires when `deepEmailVerification` is on. |
| AI analysis completed | $0.05 | Per website when `enableAiAnalysis` is on (role classification, email dedup, primary-contact + primary-phone selection, deliverability flags). |
| MCP connector summary delivered | $0.002 | Per successful per-website summary delivery through an MCP output connector. Failed connector attempts are not charged. |
| Skyfire bundle | $5.00 | `skyfire-bundle-1000-leads`, aligned with Skyfire's $5 minimum for prepaid bulk runs. |

Failed requests (unreachable sites, DNS errors) are not charged.

| Scenario | Websites | Estimated Cost |
|----------|----------|----------------|
| Basic discovery smoke | 100 | ~$0.15 with `detectTechStack: false` |
| Quick test batch | 10 | ~$0.10 |
| Sales prospect list | 100 | ~$1.00 |
| Market research dataset | 500 | ~$5.00 |
| Large-scale enrichment | 2,000 | ~$20.00 |

Plus Apify platform compute costs. A typical batch of 100 websites completes in 3-8 minutes depending on site responsiveness and pages crawled per site.

**vs. commercial alternatives**: this actor is a pay-per-event alternative to subscription lead tools and seat-based enrichers. You only pay for successful events.

##### Skyfire bulk bundle

A `skyfire-bundle-1000-leads` event ships at **$5.00 per 1,000 leads** for Skyfire-paid runs. This is a premium over the raw `lead-discovered` tier because Skyfire requires a $5 minimum charge per actor invocation. Pay-as-you-go users via Apify's standard PPE rail still get the cheaper $0.0015/lead; Skyfire is opt-in for teams that want a single prepaid bulk bundle.

### Automated Contact Enrichment Workflows

Contact-extractor returns structured contact records (`emails`, `email_roles`, `primary_contact`, `phones_validated`, `tech_stack`, `social_profiles`, verified flags) that outbound and enrichment workflows can use to plan, qualify, and personalize.

Common automation patterns:

- Enrich a target-account CSV before an SDR sequence and keep only domains with verified company emails.
- Add phone, social, address, and tech-stack fields to incomplete CRM accounts.
- Re-run priority accounts quarterly to catch changed contact pages, new social profiles, or technology changes.
- Send verified contacts to HubSpot, Salesforce, Pipedrive, Airtable, Google Sheets, or a webhook destination after each run.

#### MCP Quickstart - call this actor from Claude / Cursor / ChatGPT

Open Apify's hosted MCP server with this actor selected:

```text
https://mcp.apify.com?tools=harvestlab/contact-extractor
```

Then prompt the agent:

> "Use harvestlab/contact-extractor through Apify MCP. Enrich these company domains, return emails, phone numbers, social profiles, contact-page URLs, tech stack, and the best public outreach contact. Keep email verification off for the first run unless I ask for it."

Through Apify MCP, the agent can generate the actor input, run the job, and read the typed dataset output back into your conversation.

### Output

Each website produces a structured JSON object:

```json
{
    "url": "https://example.com",
    "domain": "example.com",
    "company_name": "Example Inc",
    "company_description": "Example Inc is a technology company that helps businesses build better software solutions and streamline their operations.",
    "emails": ["info@example.com", "sales@example.com"],
    "emails_verified": [
        {
            "email": "info@example.com",
            "domain": "example.com",
            "mx_valid": true,
            "mx_hosts": ["aspmx.l.google.com", "alt1.aspmx.l.google.com"],
            "catchall": false,
            "disposable": false,
            "free_inbox": false,
            "role_based": true,
            "typo_suggestion": null,
            "deliverability": "MEDIUM",
            "deep_probe": true,
            "probe_result": "deliverable",
            "smtp_probe": {
                "result": "deliverable",
                "response_code": 250,
                "response_text": "2.1.5 OK",
                "mx_host": "aspmx.l.google.com",
                "probed_at": "2026-04-23T03:15:22+00:00",
                "is_catchall": false
            }
        },
        {
            "email": "sales@example.com",
            "domain": "example.com",
            "mx_valid": true,
            "mx_hosts": ["aspmx.l.google.com", "alt1.aspmx.l.google.com"],
            "catchall": null,
            "disposable": false,
            "free_inbox": false,
            "role_based": true,
            "typo_suggestion": null,
            "deliverability": "MEDIUM",
            "deep_probe": true,
            "probe_result": "rate_limited",
            "smtp_probe": {
                "result": "rate_limited",
                "response_code": null,
                "response_text": "daily_cap_reached:10/10",
                "mx_host": "aspmx.l.google.com",
                "probed_at": "2026-04-23T03:15:22+00:00",
                "is_catchall": null
            }
        }
    ],
    "phones": ["+1 888 926 2289", "555-1234"],
    "phones_validated": [
        {
            "raw": "+1 888 926 2289",
            "e164": "+18889262289",
            "country": "US",
            "type": "TOLL_FREE",
            "valid": true,
            "possible": true
        }
    ],
    "phones_uncertain": ["555-1234"],
    "social_profiles": {
        "linkedin": "https://linkedin.com/company/example",
        "twitter": "https://twitter.com/example",
        "facebook": null,
        "instagram": null,
        "youtube": null,
        "github": "https://github.com/example",
        "tiktok": "https://tiktok.com/@example",
        "pinterest": null,
        "telegram": "https://t.me/examplecompany",
        "discord": null,
        "threads": "https://threads.net/@example"
    },
    "address": "123 Main St, City, State, 12345",
    "contact_page_url": "https://example.com/contact",
    "tech_stack": ["WordPress", "PHP", "nginx", "Google Analytics", "jQuery"],
    "pages_crawled": 3,
    "scraped_at": "2026-04-10T12:00:00Z"
}
```

**Which phone field should you use?** For carrier-verified dialing, cold-call campaigns, or CRM import, use **`phones_validated`** - every entry carries a canonical E.164 string, ISO-3166-1 country code, and line-type so your dialer knows whether it's mobile, toll-free, or a PBX fixed line. The legacy **`phones`** list (all raw extracts) and **`phones_uncertain`** (parsed but not carrier-format verified) are kept for backward compatibility and triage workflows.

Null values indicate that the Actor searched for the data but did not find it on the crawled pages. Output is available as JSON, CSV, or Excel. Use Apify integrations to push results directly to Google Sheets, HubSpot, Salesforce, Airtable, or any webhook endpoint.

When `enableAiAnalysis` is `true`, each output item additionally includes:

- `email_roles` - per-email classification (sales, support, hr, legal, executive, general, personal)
- `primary_contact` - the single best email for B2B outreach
- `primary_phone` - the single best validated phone for B2B outreach (picked from `phones_validated`, prefers mobile / fixed-line / toll-free in the domain's country over premium-rate / pager / unknown)
- `dedup_groups` - near-duplicate team addresses grouped together (e.g. `info@` and `hello@` pointing to the same inbox)
- `deliverability_flags` - warnings for no-reply and non-monitored addresses

#### Technologies Detected

The Actor recognizes 175+ technologies across these categories:

- **CMS and Platforms** - WordPress, Shopify, Wix, Squarespace, Drupal, Joomla, Ghost, Webflow, HubSpot
- **JavaScript Frameworks** - React, Next.js, Vue.js, Nuxt.js, Angular, Svelte, jQuery
- **CSS Frameworks** - Bootstrap, Tailwind CSS, Foundation, Bulma, Materialize
- **Analytics and Marketing** - Google Analytics, Google Tag Manager, Facebook Pixel, Hotjar, Mixpanel, Segment, Plausible, Matomo, Microsoft Clarity, Amplitude, LinkedIn Insight Tag
- **Product Analytics** - FullStory, LogRocket, PostHog, Pendo, Datadog RUM, Sentry, LaunchDarkly
- **Email Marketing** - Mailchimp, SendGrid, Klaviyo, ActiveCampaign, Marketo, Pardot, ConvertKit
- **CRM and Sales** - Salesforce, HubSpot, Pipedrive
- **Chat and Support** - Intercom, Drift, Zendesk, Crisp, LiveChat, Tawk.to, Freshdesk
- **Authentication** - Auth0, Okta, Firebase, Supabase
- **Search** - Algolia, Elasticsearch, Typesense
- **Infrastructure** - Cloudflare, Fastly, CloudFront, Vercel, Netlify, nginx, Apache, Akamai
- **E-commerce** - WooCommerce, Magento, PrestaShop, BigCommerce, Stripe, PayPal
- **Video and Media** - YouTube Embed, Vimeo, Wistia, Loom
- **Scheduling** - Calendly, Acuity Scheduling
- **Social Proof** - Trustpilot Widget, G2, Yotpo
- **Build Tools** - Webpack, Vite, GraphQL, WebSocket

### Quick Start

The simplest possible input - a list of domains:

```json
{
    "domains": ["stripe.com", "shopify.com", "hubspot.com"]
}
```

Or full URLs with custom crawl depth:

```json
{
    "urls": ["https://example.com", "https://anothersite.org"],
    "maxPagesPerSite": 10,
    "includeSubpages": true,
    "detectTechStack": true
}
```

Enable SMTP-verified emails for CRM import:

```json
{
    "domains": ["stripe.com", "shopify.com"],
    "verifyEmails": true,
    "deepEmailVerification": true
}
```

Enable AI role classification and primary-contact selection:

```json
{
    "domains": ["stripe.com", "shopify.com"],
    "verifyEmails": true,
    "enableAiAnalysis": true,
    "llmProvider": "openrouter",
    "openrouterApiKey": "your-key-here"
}
```

### Troubleshooting

**SMTP verification returns "greylisted" for most emails**
Greylisting is a deliberate delay tactic - mail servers temporarily reject unknown senders and accept on retry. The actor does not retry SMTP probes (that would be too slow). `greylisted` emails are real but the server wasn't conclusive; treat them as "probably valid, low confidence". Retry the same URL in 10-15 minutes for a more definitive result.

**Many emails flagged as "catchall" or "undeliverable"**
Catchall domains accept all email addresses regardless of whether the mailbox exists (common with corporate domains using wildcard MX routing). These cannot be verified via SMTP RCPT - treat as "valid domain, unknown mailbox". Filter with `deliverabilityFilter: "deliverable_only"` to exclude them.

**No tech stack signals returned**
Tech detection depends on what the target website publishes publicly. Pages behind login walls, very dynamic apps, or sites that block automated access may return fewer tech signals. The actor detects 175+ technologies from publicly visible website assets only.

**AI role classification returns wrong roles**
The AI role classifier uses the email prefix and page context to infer roles (sales/marketing/engineering/executive/support). It cannot access LinkedIn or internal org data. For ambiguous prefixes (`info@`, `hello@`, `contact@`), it classifies as "unknown" - these are typically shared inboxes.

**Rate limited or IP blocked mid-crawl**
The actor implements reputation-safe probe rate limits (max 10 SMTP probes per domain per UTC day, 10s cooldown). If a domain blocks the probe IP, the actor logs `smtp_blocked: true` and falls back to MX-only verification. Use a RESIDENTIAL proxy to rotate IPs between domains.

**No emails found for a domain**
The site may not publish contact emails publicly. The actor still checks canonical host variants, sitemap-discovered contact URLs, and common paths such as `/contact`, `/about`, `/impressum`, and `/legal`. If no billable contact data is found, the row is marked as `partial_result`, `fetch_failed`, `site_timeout`, or `site_error` and includes `fetch_diagnostics`. The actor only extracts contacts that are publicly visible - it does not guess or generate addresses.

**How many pages should I crawl per website?**
The default of 5 pages works well for most sites. The Actor prioritizes /contact, /about, and /imprint pages. Set `maxPagesPerSite` to 1 if you only need tech stack data and want the fastest possible run. Increase to 10-20 for large corporate sites where contact details may be spread across many sections.

**Can I input just domain names without https://?**
Yes. Use the `domains` field with bare domain names like "stripe.com" or "hubspot.com". The Actor adds HTTPS automatically. You can also mix `urls` (full URLs) and `domains` (bare domains) in the same run.

**How does email deliverability verification work?**
When you set `verifyEmails: true`, every extracted email runs through a four-step pipeline:

1. **Typo check** - Common mail-domain misspellings such as `gmial.com`, `outllok.com`, `hotmial.com`, and `iclould.com` are flagged with a `typo_suggestion` and tagged **LOW** deliverability.
2. **Disposable / burner check** - Known disposable hosts (Mailinator, 10MinuteMail, GuerrillaMail, YOPmail) are hard-flagged **LOW** regardless of MX state.
3. **Real MX lookup** - The Actor runs an actual DNS MX query against the domain. Domains that have no MX record cannot receive mail and are tagged **LOW**.
4. **Tiering** - Emails at free inboxes (Gmail / Yahoo / Outlook / iCloud / ProtonMail) OR with role prefixes (`info@`, `hello@`, `support@`, `sales@`) land in **MEDIUM**. Named addresses at non-free company domains with resolved MX land in **HIGH**.

**How does the deep SMTP probe work?**
When `verifyEmails` and `deepEmailVerification` are both on, every MX-cleared email receives a mailbox-level acceptance check. No email content is delivered. Results are classified as `deliverable`, `undeliverable`, `catchall`, `greylisted`, `port_blocked`, `rate_limited`, or `error`, and the checks are capped per mail host to avoid aggressive probing.

**How does phone validation work?**
Every phone candidate is normalized and checked against regional phone-number rules. Numbers that pass validation land in `phones_validated` with E.164, country code, and line type. Numbers that parse but fail strict validation land in `phones_uncertain`. This gives you a cleaner phone field without paying for a separate cleanup service.

**What if a website blocks the scraper?**
Some sites block datacenter IP addresses. The default proxy preset uses Apify `RESIDENTIAL` exits, and the actor auto-adds that group when you enable Apify Proxy without specifying one. Sites behind login walls or with aggressive bot protection may still be inaccessible. Failed requests are not charged.

### Frequently Asked Questions

**Can I extract contacts from a website URL?**
Yes. Provide full URLs in `urls` or bare domains in `domains`. The actor normalizes each target, crawls the homepage plus contact-oriented subpages, and returns emails, phones, socials, addresses, contact-page URLs, and technology signals when they are publicly visible.

**Is this a contact details extractor or an email finder?**
Both. The core workflow extracts public contact details from a website, including emails, phones, social profiles, addresses, and contact-page URLs. Email verification and AI role labeling are optional paid enrichments for teams that need cleaner outreach records.

**What is the best first run for a domain list?**
Start with `maxPagesPerSite: 3`, `detectTechStack: false`, and `verifyEmails: false` on one or two domains. If the output looks useful, run the larger list with tech-stack detection enabled, then re-run only promising domains with email verification.

**Can I use this as a B2B lead extractor?**
Yes. Start with a list of company domains, then export the resulting contact rows to your CRM or outreach tool. The actor is strongest for company-level B2B enrichment, ABM lists, recruiting research, market maps, and CRM cleanup.

**Does it verify email addresses?**
Yes, when `verifyEmails: true` is enabled. The actor checks email domains with MX lookup and deliverability signals. For deeper mailbox checks, enable `deepEmailVerification: true`; definitive deliverable or undeliverable SMTP verdicts are charged only when produced.

**Does it find personal emails for every company?**
No. It extracts contact details that the website makes publicly visible. Many companies publish shared inboxes like `info@`, `sales@`, or contact forms rather than personal decision-maker emails. AI role classification can help prioritize the best available inbox.

**Can it extract phone numbers and social profiles too?**
Yes. The output includes raw phones, validated phones in E.164 format when possible, LinkedIn and other social profiles, addresses, and a canonical contact-page URL.

#### Pair this with the rest of the portfolio

- **[bol-com-scraper](https://apify.com/harvestlab/bol-com-scraper)** - NL/BE retail products, prices, watch-mode drop alerts
- **[gov-procurement-scraper](https://apify.com/harvestlab/gov-procurement-scraper)** - NL Tenders + EU TED + UK + US procurement (find target accounts before enrichment)
- **[companies-house-scraper](https://apify.com/harvestlab/companies-house-scraper)** - UK Companies House registry for director/PSC discovery

Workflow: discover target companies via gov-procurement-scraper or companies-house-scraper, then run contact-extractor over the resulting websites for SMTP-verified outbound contacts.

***

### Scheduling and webhooks

Schedule weekly contact-refresh runs in Apify Console to keep your outbound pipeline current. Wire a webhookUrl in n8n or Make to push verified email/phone records with role classification and primary-contact scores directly into HubSpot, Pipedrive, or Lemlist the moment a run completes. Typical pipeline: KvK company list -> weekly Contact Extractor run -> n8n -> CRM contact creation + sequence enrollment.

***

### Legal and Compliance

This actor scrapes publicly available data. By using this actor, you agree to the following:

- **Your responsibility**: You are solely responsible for ensuring your use complies with all applicable laws, regulations, and the target website's terms of service. This includes but is not limited to **GDPR** (EU), **CCPA / CPRA** (California), **CAN-SPAM Act** (US), **CASL** (Canada), **PECR** (UK), **LGPD** (Brazil), and other data protection / anti-spam laws in your jurisdiction.
- **No legal advice**: This actor does not constitute legal advice. Consult a qualified attorney if you have questions about the legality of your specific use case.
- **Intended use**: This actor is designed for legitimate business purposes such as market research, competitive analysis, and B2B lead generation using publicly accessible data.
- **Data handling**: You are responsible for how you store, process, and share any data collected. Ensure you have a lawful basis (e.g. **legitimate interest** under GDPR Art. 6(1)(f), or pre-existing business relationship under CASL) for processing any personal data under applicable privacy laws.
- **CAN-SPAM compliance** (US): Any commercial email you send using contacts from this actor must include a clear, conspicuous opt-out mechanism, a valid physical postal address, accurate "From" / "Reply-To" headers, non-deceptive subject lines, and you must honor unsubscribe requests within 10 business days. Penalties are up to **$53,088 per violation** under the FTC's enforcement schedule.
- **GDPR compliance** (EU/UK): Even publicly available personal data is subject to GDPR. You must have a lawful basis (typically **legitimate interest with documented LIA**), respect Art. 14 transparency requirements (notify the data subject within 30 days when collecting from a third-party source), honor erasure / objection requests promptly, maintain records of processing under Art. 30, and not transfer EU data outside the EU/EEA without an SCC, adequacy decision, or other approved mechanism.
- **Anti-spam / outreach**: Do **NOT** use this tool for unsolicited bulk messaging, spam, scraped-list email blasts, or list-building for resale. Use for permission-tested, targeted, B2B-relevant outreach only.
- **Rate limiting**: This actor implements polite crawling practices including request delays and retry backoff to minimize impact on target servers.
- **No warranty**: This actor is provided "as is" without warranty. Data accuracy depends on the target website's content and structure.
- **Personal data minimization**: Implement data retention policies (typically 6-12 months for cold outreach contacts), encrypt PII at rest and in transit, restrict internal access on a need-to-know basis, and honor opt-out / Do-Not-Contact requests across your entire system, not just the channel that received the request.
- **Mailbox verification**: Deep email verification uses rate-limited mailbox checks and never delivers email content. Some mail server operators may still log or flag verification attempts, so use this option responsibly and only on domains you have explicitly listed in `urls` or `domains`.

#### Related Actors

- **[Indeed Scraper](https://apify.com/harvestlab/indeed-scraper)** - Pipe Indeed job listings into a recruiter sourcing pipeline: scrape hiring companies, then run Contact Extractor on the listing's company URL to get hiring-manager emails and phones for outbound recruiting.
- **[News Monitor](https://apify.com/harvestlab/news-monitor)** - Build account intelligence on prospects you've extracted contacts for. Track funding rounds, leadership changes, and product news so SDRs can time outreach around real-world events instead of cold-pitching.
- **[Google Search Scraper](https://apify.com/harvestlab/google-search-scraper)** - Discover target domains via SEO research before extracting contacts. Scrape SERPs for "best \[category] software" or competitor intent queries, then feed the top-ranking domains into Contact Extractor for ICP-matched lead lists.

# Actor input Schema

## `urls` (type: `array`):

List of full website URLs to analyze (e.g. https://example.com). You can provide either URLs or domains.

## `startUrls` (type: `array`):

Alternative field name for Website URLs, provided for compatibility with Apify's common 'startUrls' convention (e.g. https://example.com). The canonical field is `urls` above - prefer that. If both are supplied, they are merged.

## `url` (type: `string`):

CLI alias for a single website URL. Hidden from Console form.

## `website` (type: `string`):

CLI alias for a single website URL. Hidden from Console form.

## `domains` (type: `array`):

List of domains without protocol (e.g. example.com). HTTPS will be added automatically. Alternative to providing full URLs - use this OR Website URLs above.

## `maxWebsites` (type: `integer`):

Bill-safety cap across Website URLs, Start URLs, URL aliases, and Domains after deduplication. Default 50; raise deliberately for larger lead batches.

## `maxPagesPerSite` (type: `integer`):

Maximum number of pages to crawl per website. Higher values find more contacts but take longer.

## `includeSubpages` (type: `boolean`):

Automatically crawl /contact, /about, /imprint, and similar pages to find additional contact information.

## `detectTechStack` (type: `boolean`):

Analyze the website to detect technologies used (CMS, frameworks, analytics, etc.).

## `verifyEmails` (type: `boolean`):

Look up real MX records for every extracted email, detect common-domain typos (e.g. gmial.com -> gmail.com), flag free/disposable inboxes, and tag each email HIGH / MEDIUM / LOW / UNKNOWN deliverability. Adds a new `emails_verified` field to every output item. Cost: $0.01 per email whose domain passes MX resolution (uncertain / disposable / no-MX emails are free).

## `maxEmailsToVerify` (type: `integer`):

Bill-safety cap for MX and SMTP email verification across the whole run. Raw extracted emails are still returned; only verification is capped. Default 50, maximum 500.

## `deepEmailVerification` (type: `boolean`):

Opt-in: run an async SMTP RCPT TO probe against the primary MX host for every MX-cleared email. Classifies each email as deliverable / undeliverable / catchall / greylisted / port\_blocked. Reputation-safe: per-MX-host probe history persists across runs (named KV store), capped at 10 probes/host/day with 10s cooldown between probes to the same host. Port 25 is frequently blocked on Apify datacenter egress; when it is, every entry returns `port_blocked` (free). Cost: $0.02 per email that receives a definitive verdict (deliverable or undeliverable) - catch-all, greylisted, and non-verdict results are free. Requires `verifyEmails: true`.

## `enableAiAnalysis` (type: `boolean`):

Use an LLM to classify each email by role (sales/support/hr/legal/executive/general/personal), group near-duplicate team addresses, flag non-monitored no-reply inboxes, and pick the single best primary contact for B2B outreach. When `verifyEmails` is also enabled, the AI picks the primary contact only from HIGH/MEDIUM deliverability emails. Requires an API key for your chosen LLM provider. Cost: $0.05 per website analyzed.

## `llmProvider` (type: `string`):

AI backend for email role classification, dedup, and primary-contact selection. 'OpenRouter' (default) is cheapest - Gemini Flash via OpenRouter is ~$0.001 per website. 'Anthropic' = Claude, 'Google AI' = Gemini direct, 'OpenAI' = GPT-4o mini, 'Ollama' = self-hosted (no API cost). Each provider needs its own API key field below.

## `llmModel` (type: `string`):

Specific model to use. Leave empty for the provider default (google/gemini-2.0-flash-001 for OpenRouter, claude-sonnet-4-20250514 for Anthropic, gemini-2.0-flash for Google AI, gpt-4o-mini for OpenAI, llama3.1 for Ollama).

## `openrouterApiKey` (type: `string`):

Your OpenRouter API key. Get one at openrouter.ai/keys

## `anthropicApiKey` (type: `string`):

Your Anthropic API key. Get one at console.anthropic.com

## `googleApiKey` (type: `string`):

API key for Google AI (Gemini). Get one at aistudio.google.com/app/apikey

## `openaiApiKey` (type: `string`):

API key from platform.openai.com (required if using OpenAI provider)

## `ollamaBaseUrl` (type: `string`):

Base URL for Ollama API. Default: http://localhost:11434

## `outputConnectors` (type: `array`):

Optional MCP connectors for contact extraction summaries. Choose Slack, Notion, GitHub, Sheets, CRM, or another compatible connector. At the end of the run, the actor sends a compact contact summary plus structured payload through each connector. Connector failures never fail the scrape. Billed at $0.002 per successful connector dispatch.

## `connectorAlertTarget` (type: `string`):

Optional connector destination such as a Slack channel, Notion database ID, sheet name, table, or CRM list. Leave empty when the selected connector already has a default target.

## `proxyConfiguration` (type: `object`):

Proxy settings. Residential proxy is the default and strongly recommended for reliable results because many websites block datacenter IPs.

## Actor input object example

```json
{
  "urls": [],
  "domains": [
    "python.org"
  ],
  "maxWebsites": 50,
  "maxPagesPerSite": 5,
  "includeSubpages": true,
  "detectTechStack": true,
  "verifyEmails": false,
  "maxEmailsToVerify": 50,
  "deepEmailVerification": false,
  "enableAiAnalysis": false,
  "llmProvider": "openrouter",
  "ollamaBaseUrl": "http://localhost:11434",
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# Actor output Schema

## `datasetOutput` (type: `string`):

Dataset containing scraped B2B contact records: SMTP-verified emails, libphonenumber-validated phones, social profiles, tech stack, and AI email-role classifications.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [],
    "domains": [
        "python.org"
    ],
    "ollamaBaseUrl": "http://localhost:11434",
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("harvestlab/contact-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": [],
    "domains": ["python.org"],
    "ollamaBaseUrl": "http://localhost:11434",
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

# Run the Actor and wait for it to finish
run = client.actor("harvestlab/contact-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [],
  "domains": [
    "python.org"
  ],
  "ollamaBaseUrl": "http://localhost:11434",
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}' |
apify call harvestlab/contact-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=harvestlab/contact-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Website Contact Extractor - Emails & Phones",
        "description": "Website contact extractor for B2B lead lists and CRM enrichment. Extract emails, phones, social profiles, addresses, contact pages, tech signals, role labels, and optional MCP connector summaries.",
        "version": "1.10",
        "x-build-id": "KSlAJqcJ8RdDTGLcG"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/harvestlab~contact-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-harvestlab-contact-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/harvestlab~contact-extractor/runs": {
            "post": {
                "operationId": "runs-sync-harvestlab-contact-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/harvestlab~contact-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-harvestlab-contact-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "Website URLs",
                        "type": "array",
                        "description": "List of full website URLs to analyze (e.g. https://example.com). You can provide either URLs or domains.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "startUrls": {
                        "title": "Start URLs (alias for Website URLs)",
                        "type": "array",
                        "description": "Alternative field name for Website URLs, provided for compatibility with Apify's common 'startUrls' convention (e.g. https://example.com). The canonical field is `urls` above - prefer that. If both are supplied, they are merged.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "url": {
                        "title": "URL (CLI alias)",
                        "type": "string",
                        "description": "CLI alias for a single website URL. Hidden from Console form."
                    },
                    "website": {
                        "title": "Website (CLI alias)",
                        "type": "string",
                        "description": "CLI alias for a single website URL. Hidden from Console form."
                    },
                    "domains": {
                        "title": "Domains",
                        "type": "array",
                        "description": "List of domains without protocol (e.g. example.com). HTTPS will be added automatically. Alternative to providing full URLs - use this OR Website URLs above.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxWebsites": {
                        "title": "Max Websites To Process",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Bill-safety cap across Website URLs, Start URLs, URL aliases, and Domains after deduplication. Default 50; raise deliberately for larger lead batches.",
                        "default": 50
                    },
                    "maxPagesPerSite": {
                        "title": "Max Pages Per Site",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Maximum number of pages to crawl per website. Higher values find more contacts but take longer.",
                        "default": 5
                    },
                    "includeSubpages": {
                        "title": "Include Subpages",
                        "type": "boolean",
                        "description": "Automatically crawl /contact, /about, /imprint, and similar pages to find additional contact information.",
                        "default": true
                    },
                    "detectTechStack": {
                        "title": "Detect Tech Stack",
                        "type": "boolean",
                        "description": "Analyze the website to detect technologies used (CMS, frameworks, analytics, etc.).",
                        "default": true
                    },
                    "verifyEmails": {
                        "title": "Verify Email Deliverability (MX-record check)",
                        "type": "boolean",
                        "description": "Look up real MX records for every extracted email, detect common-domain typos (e.g. gmial.com -> gmail.com), flag free/disposable inboxes, and tag each email HIGH / MEDIUM / LOW / UNKNOWN deliverability. Adds a new `emails_verified` field to every output item. Cost: $0.01 per email whose domain passes MX resolution (uncertain / disposable / no-MX emails are free).",
                        "default": false
                    },
                    "maxEmailsToVerify": {
                        "title": "Max Emails To Verify Per Run",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Bill-safety cap for MX and SMTP email verification across the whole run. Raw extracted emails are still returned; only verification is capped. Default 50, maximum 500.",
                        "default": 50
                    },
                    "deepEmailVerification": {
                        "title": "Deep SMTP Probe - Real Mailbox Verification (v1.7)",
                        "type": "boolean",
                        "description": "Opt-in: run an async SMTP RCPT TO probe against the primary MX host for every MX-cleared email. Classifies each email as deliverable / undeliverable / catchall / greylisted / port_blocked. Reputation-safe: per-MX-host probe history persists across runs (named KV store), capped at 10 probes/host/day with 10s cooldown between probes to the same host. Port 25 is frequently blocked on Apify datacenter egress; when it is, every entry returns `port_blocked` (free). Cost: $0.02 per email that receives a definitive verdict (deliverable or undeliverable) - catch-all, greylisted, and non-verdict results are free. Requires `verifyEmails: true`.",
                        "default": false
                    },
                    "enableAiAnalysis": {
                        "title": "Enable AI Contact Enrichment",
                        "type": "boolean",
                        "description": "Use an LLM to classify each email by role (sales/support/hr/legal/executive/general/personal), group near-duplicate team addresses, flag non-monitored no-reply inboxes, and pick the single best primary contact for B2B outreach. When `verifyEmails` is also enabled, the AI picks the primary contact only from HIGH/MEDIUM deliverability emails. Requires an API key for your chosen LLM provider. Cost: $0.05 per website analyzed.",
                        "default": false
                    },
                    "llmProvider": {
                        "title": "LLM Provider",
                        "enum": [
                            "openrouter",
                            "anthropic",
                            "google",
                            "openai",
                            "ollama"
                        ],
                        "type": "string",
                        "description": "AI backend for email role classification, dedup, and primary-contact selection. 'OpenRouter' (default) is cheapest - Gemini Flash via OpenRouter is ~$0.001 per website. 'Anthropic' = Claude, 'Google AI' = Gemini direct, 'OpenAI' = GPT-4o mini, 'Ollama' = self-hosted (no API cost). Each provider needs its own API key field below.",
                        "default": "openrouter"
                    },
                    "llmModel": {
                        "title": "LLM Model",
                        "type": "string",
                        "description": "Specific model to use. Leave empty for the provider default (google/gemini-2.0-flash-001 for OpenRouter, claude-sonnet-4-20250514 for Anthropic, gemini-2.0-flash for Google AI, gpt-4o-mini for OpenAI, llama3.1 for Ollama)."
                    },
                    "openrouterApiKey": {
                        "title": "OpenRouter API Key",
                        "type": "string",
                        "description": "Your OpenRouter API key. Get one at openrouter.ai/keys"
                    },
                    "anthropicApiKey": {
                        "title": "Anthropic API Key",
                        "type": "string",
                        "description": "Your Anthropic API key. Get one at console.anthropic.com"
                    },
                    "googleApiKey": {
                        "title": "Google AI API Key",
                        "type": "string",
                        "description": "API key for Google AI (Gemini). Get one at aistudio.google.com/app/apikey"
                    },
                    "openaiApiKey": {
                        "title": "OpenAI API Key",
                        "type": "string",
                        "description": "API key from platform.openai.com (required if using OpenAI provider)"
                    },
                    "ollamaBaseUrl": {
                        "title": "Ollama Base URL",
                        "type": "string",
                        "description": "Base URL for Ollama API. Default: http://localhost:11434"
                    },
                    "outputConnectors": {
                        "title": "MCP Output Connectors",
                        "maxItems": 5,
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Optional MCP connectors for contact extraction summaries. Choose Slack, Notion, GitHub, Sheets, CRM, or another compatible connector. At the end of the run, the actor sends a compact contact summary plus structured payload through each connector. Connector failures never fail the scrape. Billed at $0.002 per successful connector dispatch."
                    },
                    "connectorAlertTarget": {
                        "title": "Connector Target",
                        "type": "string",
                        "description": "Optional connector destination such as a Slack channel, Notion database ID, sheet name, table, or CRM list. Leave empty when the selected connector already has a default target."
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings. Residential proxy is the default and strongly recommended for reliable results because many websites block datacenter IPs."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
