# Lead Finder: Email + Name Extraction (`datavault/lead-finder-email-name-extraction`) Actor

Lead Finder: Email + Name Extraction is a fast, lightweight Apify actor that extracts emails and related names from websites. It supports single URLs or domain crawling, handles obfuscated and protected emails, and offers flexible controls for deduplication, validation, and crawl behaviour.

- **URL**: https://apify.com/datavault/lead-finder-email-name-extraction.md
- **Developed by:** [Datavault](https://apify.com/datavault) (community)
- **Categories:** Lead generation, Automation
- **Stats:** 20 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $5.00 / 1,000 lead results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Lead Finder - Email + Name Extraction Actor

Lead Finder is a lightweight Dart actor for Apify that extracts emails (and associated names when available) from one or more URLs. You can keep it strictly to the provided URLs or enable full domain crawling. The actor charges per page load and per lead result.

### What It Extracts
- Emails from page text and HTML.
- `mailto:` links (with name extraction).
- Obfuscated emails like `name (at) domain (dot) com`.
- Cloudflare email protection (`data-cfemail` + email-protection scripts).
- vCard blocks  with `EMAIL`, `FN`, and `N`.
- Optional name detection from nearby DOM context (parent/sibling elements).
- Next.js data API extraction for the current page (build ID resolved from page or homepage).

### Input
- `startUrls`: Array of URLs to start from.
- `crawlDomain`: If `true`, follow internal links on the same domain. Default: `false`.
- `maxPagesPerCrawl`: Maximum pages to visit. Default: `100`.
- `maxConcurrency`: Parallel workers. Default: `5`.
- `maxRetries`: Retries per failed request. Default: `3`.
- `minRequestDelay`: Delay between requests in ms. Default: `1000`.
- `allowSubdomains`: If `true`, allow subdomains. Default: `false`.
- `enableSkipping`: Enable skip patterns. Default: `true`.
- `skipPatterns`: URL substrings to skip. Default: cart/checkout/login/etc.
- `dedupeByDomain`: Keep only one lead per email domain across the run. Default: `false`.
- `dedupeByEmail`: Keep only one lead per email across the run. Default: `false`.
- `validateEmails`: Apply stricter email validation rules. Default: `true`.
- `followExternalLinks`: Follow external homepage links (e.g., "Hemsida"). Default: `false`.
- `maxExternalLinksPerPage`: Max external links to follow per page. Default: `2`.
- `followHomepageOnly`: If `true`, only follow links labeled as homepage/website. Default: `true`.
- `fetchNextDataApi`: Fetch Next.js data API for the current page. Default: `true`.
- `nextDataDeepMode`: Try additional Next.js data routes derived from the URL path. Default: `false`.
- `maxNextDataCandidates`: Max Next.js data URL candidates to try. Default: `4`.
- `proxyConfiguration`: Apify Proxy configuration.

#### Sample Input
```json
{
  "startUrls": [
    { "url": "https://www.example.com" },
    "https://www.example.org/contact"
  ],
  "crawlDomain": true,
  "maxPagesPerCrawl": 200,
  "maxConcurrency": 5,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}
````

### Output

Each dataset item is a lead:

- `email`: Extracted email.
- `name`: Optional associated name.
- `domain`: Email domain (e.g., `example.com`).
- `sourceUrl`: Page where the email was found.
- `sourceDomain`: Domain of the source URL.
- `sourceType`: `mailto`, `page`, `data-attr`, `cloudflare`, `vcard`, or `script`.

### Tips

- Start with `crawlDomain: false` and a single URL to validate results quickly.
- Use `maxPagesPerCrawl` and `dedupeByDomain` to control costs and output size.

# Actor input Schema

## `startUrls` (type: `array`):

List of URLs to start crawling from.

## `crawlDomain` (type: `boolean`):

If checked, the crawler will follow internal links on the same domain. If unchecked, only the Start URLs will be scraped.

## `maxPagesPerCrawl` (type: `integer`):

Maximum number of pages that the crawler will open. The crawl will stop when this limit is reached.

## `maxConcurrency` (type: `integer`):

How many pages to process in parallel. Higher values are faster but risk blocking.

## `maxRetries` (type: `integer`):

Number of times to retry a failed page fetch.

## `minRequestDelay` (type: `integer`):

Wait at least this many milliseconds between requests.

## `enableSkipping` (type: `boolean`):

If checked, pages matching the 'Skip Patterns' will be ignored.

## `skipPatterns` (type: `array`):

List of strings or regex patterns to exclude from crawling (checks URL and Page Title).

## `allowSubdomains` (type: `boolean`):

If checked, the crawler will follow links to subdomains of the start URLs.

## `followExternalLinks` (type: `boolean`):

If checked, the crawler will follow external homepage links (e.g., Website/Hemsida).

## `followHomepageOnly` (type: `boolean`):

If checked, only follow external links labeled as homepage/website.

## `maxExternalLinksPerPage` (type: `integer`):

Maximum number of external homepage links to follow per page.

## `dedupeByEmail` (type: `boolean`):

If checked, only one lead per email is stored for the entire run.

## `dedupeByDomain` (type: `boolean`):

If checked, only one lead per email domain is stored for the entire run.

## `validateEmails` (type: `boolean`):

If checked, applies stricter email validation rules.

## `fetchNextDataApi` (type: `boolean`):

If checked, tries to fetch the Next.js data endpoint for the current page.

## `nextDataDeepMode` (type: `boolean`):

If checked, tries additional Next.js data routes derived from the URL path.

## `maxNextDataCandidates` (type: `integer`):

Maximum number of Next.js data URL candidates to try per page.

## `proxyConfiguration` (type: `object`):

Use Apify Proxy (recommended for blocked sites).

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://webscraper.io/test-sites/e-commerce/allinone"
    }
  ],
  "crawlDomain": false,
  "maxPagesPerCrawl": 20,
  "maxConcurrency": 5,
  "maxRetries": 3,
  "minRequestDelay": 1000,
  "enableSkipping": true,
  "skipPatterns": [
    "cart",
    "checkout",
    "login",
    "signup",
    "register",
    "account",
    "privacy",
    "terms"
  ],
  "allowSubdomains": false,
  "followExternalLinks": false,
  "followHomepageOnly": true,
  "maxExternalLinksPerPage": 1,
  "dedupeByEmail": false,
  "dedupeByDomain": false,
  "validateEmails": true,
  "fetchNextDataApi": true,
  "nextDataDeepMode": false,
  "maxNextDataCandidates": 4
}
```

# Actor output Schema

## `results` (type: `string`):

The extracted leads.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://webscraper.io/test-sites/e-commerce/allinone"
        }
    ],
    "skipPatterns": [
        "cart",
        "checkout",
        "login",
        "signup",
        "register",
        "account",
        "privacy",
        "terms"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("datavault/lead-finder-email-name-extraction").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://webscraper.io/test-sites/e-commerce/allinone" }],
    "skipPatterns": [
        "cart",
        "checkout",
        "login",
        "signup",
        "register",
        "account",
        "privacy",
        "terms",
    ],
}

# Run the Actor and wait for it to finish
run = client.actor("datavault/lead-finder-email-name-extraction").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://webscraper.io/test-sites/e-commerce/allinone"
    }
  ],
  "skipPatterns": [
    "cart",
    "checkout",
    "login",
    "signup",
    "register",
    "account",
    "privacy",
    "terms"
  ]
}' |
apify call datavault/lead-finder-email-name-extraction --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=datavault/lead-finder-email-name-extraction",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Lead Finder: Email + Name Extraction",
        "description": "Lead Finder: Email + Name Extraction is a fast, lightweight Apify actor that extracts emails and related names from websites. It supports single URLs or domain crawling, handles obfuscated and protected emails, and offers flexible controls for deduplication, validation, and crawl behaviour.",
        "version": "1.0",
        "x-build-id": "jZNkDkE3Xp59Cycht"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/datavault~lead-finder-email-name-extraction/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-datavault-lead-finder-email-name-extraction",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/datavault~lead-finder-email-name-extraction/runs": {
            "post": {
                "operationId": "runs-sync-datavault-lead-finder-email-name-extraction",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/datavault~lead-finder-email-name-extraction/run-sync": {
            "post": {
                "operationId": "run-sync-datavault-lead-finder-email-name-extraction",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "List of URLs to start crawling from.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "crawlDomain": {
                        "title": "Crawl Domain",
                        "type": "boolean",
                        "description": "If checked, the crawler will follow internal links on the same domain. If unchecked, only the Start URLs will be scraped.",
                        "default": false
                    },
                    "maxPagesPerCrawl": {
                        "title": "Max pages per crawl",
                        "type": "integer",
                        "description": "Maximum number of pages that the crawler will open. The crawl will stop when this limit is reached.",
                        "default": 20
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrency",
                        "minimum": 1,
                        "type": "integer",
                        "description": "How many pages to process in parallel. Higher values are faster but risk blocking.",
                        "default": 5
                    },
                    "maxRetries": {
                        "title": "Max Retries",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Number of times to retry a failed page fetch.",
                        "default": 3
                    },
                    "minRequestDelay": {
                        "title": "Minimum Request Delay (ms)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Wait at least this many milliseconds between requests.",
                        "default": 1000
                    },
                    "enableSkipping": {
                        "title": "Enable Pattern Skipping",
                        "type": "boolean",
                        "description": "If checked, pages matching the 'Skip Patterns' will be ignored.",
                        "default": true
                    },
                    "skipPatterns": {
                        "title": "Skip Patterns",
                        "type": "array",
                        "description": "List of strings or regex patterns to exclude from crawling (checks URL and Page Title).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "allowSubdomains": {
                        "title": "Allow Subdomains",
                        "type": "boolean",
                        "description": "If checked, the crawler will follow links to subdomains of the start URLs.",
                        "default": false
                    },
                    "followExternalLinks": {
                        "title": "Follow External Links",
                        "type": "boolean",
                        "description": "If checked, the crawler will follow external homepage links (e.g., Website/Hemsida).",
                        "default": false
                    },
                    "followHomepageOnly": {
                        "title": "Homepage Only",
                        "type": "boolean",
                        "description": "If checked, only follow external links labeled as homepage/website.",
                        "default": true
                    },
                    "maxExternalLinksPerPage": {
                        "title": "Max external links per page",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of external homepage links to follow per page.",
                        "default": 1
                    },
                    "dedupeByEmail": {
                        "title": "Dedupe by Email",
                        "type": "boolean",
                        "description": "If checked, only one lead per email is stored for the entire run.",
                        "default": false
                    },
                    "dedupeByDomain": {
                        "title": "Dedupe by Domain",
                        "type": "boolean",
                        "description": "If checked, only one lead per email domain is stored for the entire run.",
                        "default": false
                    },
                    "validateEmails": {
                        "title": "Validate Emails",
                        "type": "boolean",
                        "description": "If checked, applies stricter email validation rules.",
                        "default": true
                    },
                    "fetchNextDataApi": {
                        "title": "Fetch Next.js Data API",
                        "type": "boolean",
                        "description": "If checked, tries to fetch the Next.js data endpoint for the current page.",
                        "default": true
                    },
                    "nextDataDeepMode": {
                        "title": "Next.js Deep Mode",
                        "type": "boolean",
                        "description": "If checked, tries additional Next.js data routes derived from the URL path.",
                        "default": false
                    },
                    "maxNextDataCandidates": {
                        "title": "Max Next.js Candidates",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of Next.js data URL candidates to try per page.",
                        "default": 4
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Use Apify Proxy (recommended for blocked sites)."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
