Website Mail Extractor
Pricing
from $3.00 / 1,000 email enrichments
Website Mail Extractor
Website Email Scraper is a powerful, lightweight, and stealthy web crawler designed to find and extract public email addresses from any website. Simply provide a list of starting URLs, and the scraper will follow internal links, prioritize key pages, and return a clean list of deduplicated emails.
Pricing
from $3.00 / 1,000 email enrichments
Rating
0.0
(0)
Developer
mikolabs
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Website Email Scraper
Website Email Scraper is a powerful, lightweight, and stealthy web crawler designed to find and extract public email addresses from any website. Simply provide a list of starting URLs, and the scraper will follow internal links, prioritize key pages, and return a clean list of deduplicated emails.
✅ Key Features
- SPA-Ready (React, Vue, Angular): Scrapes modern Single Page Applications (SPAs) by scanning local JavaScript script assets for hidden contact emails.
- Sitemap XML Support: Automatically discovers and parses the website's
sitemap.xmlfor lightning-fast page indexing. - Stealth Mode & Anti-Blocking: Built-in concurrency control, randomized user-agents, and Apify Proxy support to keep your scraper undetected.
- Smart Priority Routing: Targets high-value contact pages (like
/contact,/about,/team) first to save crawl time. - Blog & Media Filters: Automatically skips irrelevant blog directories and large media assets to maximize efficiency and control costs.
- Deduplication: Automatically dedupes email addresses across the entire run to prevent duplicate charges and messy datasets.
🏆 Benefits
- Effortless Lead Generation: Turn a list of company websites into verified prospect contact details in seconds.
- High Deliverability: Every email includes the exact source URL where it was found, allowing easy QA and highly personalized outreach.
- Unbeatable Cost Control: Because it uses direct HTTP crawling instead of heavy browser instances, compute unit consumption is extremely minimal, costing under $0.05 per 1,000 crawled pages.
💳 Pricing
This Actor uses the Pay-per-event pricing model. You are billed only for successful results:
- $3.00 USD per 1,000 unique emails ($0.003 USD per email).
- Deduplication is run-wide; you are never charged twice for the same email address in a single run.
- If a run extracts 0 emails, you pay nothing.
🚀 Quick start
- Go to the Input tab in the Apify Console.
- Enter one or more website URLs in the Seed URLs field.
- Configure the optional boundaries (like Max emails to scrape or Max pages per seed).
- Select your Proxy configuration (Apify US/Residential proxy is recommended).
- Click Start to run the scraper and download your data in JSON, CSV, Excel, or HTML format once the run completes.
⚙️ How it works
Under the hood, the scraper performs the following steps:
- Sitemap Discovery: First, it attempts to fetch the website's XML sitemap to immediately identify all key pages.
- Page Crawling: Starting from your seed URLs, it crawls internal pages up to your configured crawl depth.
- Keyword Prioritizing: It scores page paths using contact-related keywords so that pages containing
/contact,/about, or/teamare visited first. - Email Extraction: It searches for plain-text email addresses,
mailto:anchor tags, HTML-obfuscated entities, and scans local JS resource bundles to locate emails in dynamic components. - Deduplication & Output: Found emails are normalized, cleaned of CDNs/spam domains/fake placeholders, deduplicated, and appended to the dataset with their corresponding seed and discovery URLs.
📊 Output
All scraped emails are saved in a structured dataset, available for download in CSV, JSON, XML, or Excel.
Example JSON output item:
{"email": "contact@example.com","pageUrl": "https://example.com/about-us","seedUrl": "https://example.com"}
Output Fields:
| Field Name | Type | Description |
|---|---|---|
email | String | The normalized and cleaned email address. |
pageUrl | String | The exact webpage URL where this email address was discovered. |
seedUrl | String | The initial seed URL entered in the input from which the crawl started. |
🛠️ Input Parameters
You can configure the scraper with the following fields:
| Field Name | Type | Description | Default / Empty Behavior |
|---|---|---|---|
urls | Array of Objects | Required. List of website URLs to crawl (e.g. [{"url": "https://example.com"}]). | Must contain at least 1 URL. |
deepSearch | Boolean | Optional. If enabled, crawls deep into the website structure. If disabled, only crawls the main page and high-priority pages (e.g. contact, about, team). | Default is false (highly cost-effective). |
maxEmailsToScrape | Integer | Optional. Stop the run after finding this many unique emails across all seeds. | 0 or empty crawls until finished. |
maxCrawlDepth | Integer | Optional. Link-hops from the seed URL. Depth 1 crawls only the seed page. | Default is 3. Max is 6. |
maxPagesPerSeed | Integer | Optional. Maximum number of pages the scraper will visit per seed URL to control costs. | Default is 50. Max is 1000. |
proxyConfiguration | Object | Optional. Select proxies to bypass blocking. US/Residential proxies are recommended. | Enabled by default. |
Example Input configuration:
{"urls": [{ "url": "https://example.com" }],"maxEmailsToScrape": 20,"maxCrawlDepth": 2,"maxPagesPerSeed": 30,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
🔍 Error Handling & Ignored Resources
- Skipping Bad URLs: Malformed URL entries are automatically detected, logged as warnings, and skipped so they do not break the crawl.
- Resource Filtering: Non-text files (such as
.zip,.pdf,.mp4,.png,.jpg, etc.) are automatically ignored to save bandwidth and compute units. - CDNs & Noise Filtering: Extracted emails are checked against blacklists to remove fake/placeholder emails (like
you@example.comorinfo@example.com) and tracking domains (likesentry.ioorgoogle-analytics.com). - Resilient Runs: HTTP 403 or 404 pages are skipped gracefully without aborting the crawler.
📝 Release Notes
- v1.0.0: Initial release. Built-in support for XML sitemaps, SPA script scanning, crawler caps, and custom blacklists.
🆘 FAQ & Support
- Is it legal to scrape email addresses? Scraping public emails for contact directories and lead indexing is generally legal. However, always ensure you comply with regional regulations such as CAN-SPAM (US) or GDPR (EU) when conducting cold outreach.
- Can it parse JS obfuscation? Yes, the scraper automatically decodes standard obfuscation schemes (HTML entities, JS strings, and atob schemes) by scanning page scripts.
- Support: If you run into issues, have questions, or want to suggest new features, please file a ticket in the Issues tab.