Pricing

from $0.05 / 1,000 results

Lead List Deduplicator & Normalizer

[💵 $0.05 / 1K] Clean messy B2B lead lists into CRM-ready company/contact records with duplicate clusters, confidence scores, match reasons, normalized domains, emails, and phones.

Pricing

from $0.05 / 1,000 results

Rating

0.0

(0)

Developer

Open Web Team

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

Lead List Deduplicator & Normalizer - CRM-Ready Leads, Not Messy Dumps

Turn messy scraped B2B lead lists into canonical, CRM-ready records - not duplicate-filled dumps.

This Actor takes inline JSON records or an Apify dataset ID, normalizes common lead fields, groups duplicates, and outputs one canonical row per lead/company cluster with confidence scores, match reasons, source row IDs, and warnings. Use it after Google Maps scrapers, directory scrapers, website contact scrapers, exhibitor-list scrapers, Apollo-style lead exports, or any workflow where several sources produce overlapping leads.

✅ What you get / ❌ what this isn't

✅ This Actor gives you	❌ This Actor is not
One canonical row per company/contact cluster	Not a black-box cleanup you can't audit
Confidence scores + match reasons per merge	Not a guess - source row IDs are preserved
Normalized company, domain, email, phone	Not a scraper or enrichment tool (it cleans what you give it)
Deterministic, predictable-cost rules	Not a paid LLM that adjudicates every row

🔎 Why use this Actor

Merge overlapping exports from multiple scrapers.
Remove duplicate companies, domains, emails, and phone numbers before CRM import.
Normalize company names, domains, emails, and phones.
Keep source row IDs so every merge is auditable.
Get confidence scores and match reasons instead of a black-box cleanup.
Use deterministic rules first, so costs stay predictable.
No browser, proxies, or external enrichment APIs.

👥 Who it's for

Anyone importing scraped or exported B2B leads into a CRM. Common jobs:

Merge lead lists from several Apify scrapers.
Clean a CSV before importing into HubSpot, Pipedrive, Salesforce, Clay, Instantly, Smartlead, or Airtable.
Remove duplicate outreach targets before spending credits on email verification or enrichment.
Create a canonical company list from multiple scraped directories.
Audit which rows were merged and why.

⚙️ How to deduplicate a lead list

Open the Actor on Apify.
Paste your records (inline JSON) or provide an Apify datasetId.
Pick a dedupMode: conservative, balanced, or aggressive.
Click Start.
Open the Canonical view for CRM-ready rows, or Duplicate clusters to audit merges.
Download CSV/JSON/Excel or pull from the Apify API.

If no input is provided, the Actor runs with sample records so you can test the output immediately.

📥 Input

{
  "dedupMode": "balanced",
  "records": [
    {
      "id": "1",
      "company": "Acme Inc",
      "website": "https://www.acme.com",
      "email": "sales@acme.com"
    },
    {
      "id": "2",
      "companyName": "ACME LLC",
      "domain": "acme.com",
      "phone": "(415) 555-2671"
    }
  ]
}

You can also provide an Apify datasetId instead of inline records.

Deduplication modes

Mode	Best for	Behavior
`conservative`	Avoiding false merges	Requires exact email, phone, or domain match
`balanced`	Most lead lists	Exact email/phone/domain plus strong company-name similarity
`aggressive`	Very messy lists	Looser company-name matching; review warnings before importing

📤 Output

{
  "recordType": "canonicalLead",
  "clusterId": "cluster_0001",
  "clusterSize": 2,
  "mergeDecision": "merged",
  "mergeConfidence": 0.9,
  "matchReasons": ["same_domain", "similar_company"],
  "sourceRowIds": ["1", "2"],
  "canonicalCompanyName": "Acme Inc",
  "normalizedCompanyName": "acme",
  "normalizedDomain": "acme.com",
  "normalizedEmail": "sales@acme.com",
  "normalizedPhone": "4155552671",
  "warnings": []
}

Dataset views

View	Best for
`Canonical`	CRM-ready rows after deduplication
`Duplicate clusters`	Auditing source rows, match reasons, and confidence

Output fields

Field	Meaning
`clusterId`	Stable cluster identifier for the canonical row
`clusterSize`	Number of source rows merged into the canonical row
`mergeDecision`	`unique`, `merged`, or `ambiguous`
`mergeConfidence`	Confidence score from 0 to 1
`matchReasons`	Why records matched (`same_email`, `same_domain`, `similar_company`)
`sourceRowIds`	Original row IDs or indexes used in the merge
`normalizedDomain`	Clean domain value such as `acme.com`
`warnings`	Flags such as `low_confidence_merge` or `missing_domain_or_email`

💵 How much does it cost?

You pay per cleaned output row plus Apify platform usage. Because the engine is deterministic (no browser, no proxies, no external APIs), cost is predictable and scales with input size. Each run processes up to 5,000 input records; split larger datasets across multiple runs.

🔁 Run it on the Apify platform

Chain it after any Apify scraper via the API, schedule recurring cleanups, export CSV/JSON/Excel, or wire it into Make, Zapier, or webhooks ahead of your CRM import.

⚠️ Limits and caveats

This MVP uses deterministic rules and fuzzy string similarity, not paid LLM adjudication.
Review ambiguous rows before importing them into a CRM.
Email/phone/domain normalization is conservative and may not cover every country-specific format.
The Actor does not scrape or enrich missing contact data; it cleans the records you provide.
It does not verify email deliverability or MX records in this version.
Runs are capped at 5,000 input records while the engine is optimized for larger files.

Website Contact Extractor - find the emails first, then dedupe them here.
LinkedIn Ads Library Scraper - build the advertiser list this cleans.

❓ FAQ

Does it scrape leads? No. It cleans and dedupes the records you provide (inline or via a dataset ID).

Can it pull from another Actor's output? Yes - pass that run's datasetId as input.

Which mode should I use? balanced for most lists; conservative to avoid false merges; aggressive only for very messy data (then review warnings).

🛠️ Support

If a run fails or a field is missing, open an Actor issue with the run URL, the input you used, and the field or behavior you expected.

Domain Lead Enricher

runtime/domain-lead-enricher

Enrich company domains and websites into CRM-ready lead records with company metadata, public contacts, social profiles, website technologies, source evidence, and lead scores.

scraping automation

Company Name Normalizer

zentrafoundry/company-name-normalizer

Transform company name normalizer inputs into structured rows, clear errors, confidence signals, and automation-ready output.

Zentra

CRM-Ready Website Contact Enrichment

happyfhantum/crm-ready-contact-enrichment

Turn company websites into CRM-ready contact rows with best-contact picks, persona hints, and confidence scores.

Kelsey Todd

B2B Lead Finder — Emails, Phones & Contacts

khadinakbar/universal-lead-finder

Find B2B leads from any niche and location via DuckDuckGo + website crawl. Returns company name, emails, phones, social links, and lead score. MCP-ready. $3/1K.

Khadin Akbar

134

Lead List Deduplicator & Merger

jurassic_jove/lead-deduplicator-merger

Merge and deduplicate lead lists from multiple Apify datasets, CSV files and inline JSON into one clean, outreach-ready list. Pure data processor — no scraping, no proxies, no external APIs.

Data Runner

B2B URL Finder

automation-lab/b2b-url-finder

Discover B2B company websites from industry, keyword, and location searches. Export clean domains with source snippets and confidence scores.

Stas Persiianenko

Local Business Lead Enricher & Website Contact Auditor

leadforge412/local-business-lead-enricher-website-contact-auditor

Enrich Google Maps Scraper results and local business lists with emails, phones, social links, website checks, contact pages, and transparent CRM-ready lead scores.

Mezhnun Orudzhaliev

Google Maps Scraper — Emails, Phones & B2B Leads

khadinakbar/google-maps-leads-scraper

Extract B2B leads from Google Maps with emails, phones, ratings, reviews, social links & opening hours. CRM-ready flat output with lead scoring $4.00/1K.

Khadin Akbar

Project Normalizer

wild_equipment/project-normalizer

Zhang Luxin

Website Contact Extractor - Emails & Phones

harvestlab/contact-extractor

Website contact extractor for B2B lead lists and CRM enrichment. Extract emails, phones, social profiles, addresses, contact pages, tech signals, role labels, and optional MCP connector summaries.