# Crunchbase Scraper (`magicfingers/crunchbase-scraper`) Actor

Scrape Crunchbase company profiles, funding rounds, founders/executives, and investor data. Search by keyword, industry, location, or funding stage.

- **URL**: https://apify.com/magicfingers/crunchbase-scraper.md
- **Developed by:** [abdulrahman alrashid](https://apify.com/magicfingers) (community)
- **Categories:** Other
- **Stats:** 118 total users, 43 monthly users, 100.0% runs succeeded, 3 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Crunchbase Scraper

Scrape Crunchbase company profiles, funding rounds, founders/executives, and investor data without a Crunchbase Pro subscription.

### What data can you extract?

#### Organization profiles
- Name, description, founded date, headquarters location
- Employee count range, operating status, website
- Social links (LinkedIn, Twitter, Facebook)
- Industry categories and category groups
- Total funding raised, last funding type and date
- IPO status, stock symbol, revenue range
- Crunchbase rank

#### Funding rounds
- Announced date, funding type (Seed, Series A, etc.)
- Money raised (USD), pre-money valuation
- Lead investors with names and Crunchbase links
- All investors for each round
- Number of investors

#### People (founders and executives)
- Name, title, current organization
- LinkedIn and Twitter profiles
- Crunchbase profile URL
- Employment history

#### Investor profiles
- Name, type, location, description
- Number of investments, exits, and funds
- Lead investments count
- Portfolio companies list
- Website and social links

### How to use

#### Search companies by keyword
```json
{
    "action": "searchCompanies",
    "searchQuery": "artificial intelligence",
    "maxResults": 50,
    "includeFunding": true,
    "includePeople": true
}
````

#### Search with filters

```json
{
    "action": "searchCompanies",
    "searchQuery": "fintech",
    "location": "San Francisco",
    "industryFilter": "Financial Services",
    "fundingStage": "series_a",
    "maxResults": 100
}
```

#### Scrape specific company profiles

```json
{
    "action": "scrapeProfiles",
    "directUrls": [
        "https://www.crunchbase.com/organization/openai",
        "https://www.crunchbase.com/organization/stripe",
        "https://www.crunchbase.com/organization/anthropic"
    ],
    "includeFunding": true,
    "includePeople": true
}
```

#### Scrape investor profiles

```json
{
    "action": "scrapeInvestors",
    "directUrls": [
        "https://www.crunchbase.com/organization/sequoia-capital",
        "https://www.crunchbase.com/organization/andreessen-horowitz"
    ]
}
```

#### Scrape people profiles

```json
{
    "action": "scrapePeople",
    "directUrls": [
        "https://www.crunchbase.com/person/sam-altman",
        "https://www.crunchbase.com/person/elon-musk"
    ]
}
```

### Input parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `action` | string | `searchCompanies` | What to scrape: `searchCompanies`, `scrapeProfiles`, `scrapeInvestors`, `scrapePeople` |
| `searchQuery` | string | | Keyword to search for |
| `location` | string | | Filter by HQ location |
| `industryFilter` | string | | Filter by industry category |
| `fundingStage` | string | | Filter by last funding stage (seed, series\_a, etc.) |
| `directUrls` | array | `[]` | Crunchbase URLs to scrape directly |
| `maxResults` | integer | `100` | Max results (0 = unlimited) |
| `includeFunding` | boolean | `true` | Include funding rounds for profiles |
| `includePeople` | boolean | `true` | Include founders/executives for profiles |
| `includeInvestors` | boolean | `false` | Include detailed investor info |
| `maxConcurrency` | integer | `3` | Parallel pages (lower = safer) |
| `requestTimeout` | integer | `90` | Page load timeout in seconds |
| `proxyConfiguration` | object | Residential | Proxy settings (residential recommended) |

### Output format

Each item in the dataset is a JSON object with a `type` field (`organization`, `investor`, `person`, or `searchResult`).

#### Organization example

```json
{
    "type": "organization",
    "name": "OpenAI",
    "shortDescription": "OpenAI is an AI research and deployment company.",
    "foundedDate": "2015-12-11",
    "headquartersLocation": "San Francisco, California, United States",
    "website": "https://openai.com",
    "employeeCount": "1001-5000",
    "totalFunding": "$11,300.0M",
    "totalFundingUsd": 11300000000,
    "lastFundingType": "Series Unknown",
    "operatingStatus": "Active",
    "industries": ["Artificial Intelligence", "Machine Learning"],
    "linkedin": "https://www.linkedin.com/company/openai",
    "twitter": "https://twitter.com/OpenAI",
    "fundingRounds": [
        {
            "announcedDate": "2023-04-28",
            "fundingType": "Series Unknown",
            "moneyRaised": 10000000000,
            "leadInvestors": [{ "name": "Microsoft" }],
            "numInvestors": 1
        }
    ],
    "people": [
        {
            "name": "Sam Altman",
            "title": "CEO",
            "linkedin": "https://linkedin.com/in/samaltman"
        }
    ],
    "crunchbaseUrl": "https://www.crunchbase.com/organization/openai",
    "scrapedAt": "2024-01-15T10:30:00.000Z"
}
```

### Pricing

This actor uses pay-per-event pricing: **$2.00 per 1,000 results** ($0.002 per result).

Each saved item (organization profile, investor profile, person profile, or search result) counts as one result.

### Tips for best results

1. **Use residential proxies** — Crunchbase has strong anti-bot detection. Residential proxies are strongly recommended.
2. **Keep concurrency low** — 2-3 concurrent pages reduces blocking risk significantly.
3. **Start with direct URLs** — Scraping specific profile URLs is more reliable than search-based scraping.
4. **Use reasonable limits** — Start with smaller `maxResults` to verify output before large runs.

### Technical details

- Built with Apify SDK v3 and Crawlee PlaywrightCrawler
- Uses response interception to capture Crunchbase's internal API data
- Falls back to DOM extraction when API data is not available
- Extracts JSON-LD and **NEXT\_DATA** structured data
- Includes stealth measures (navigator overrides, random delays, human-like scrolling)
- Automatic session rotation and retry logic

# Actor input Schema

## `action` (type: `string`):

What type of scraping to perform.

## `searchQuery` (type: `string`):

Keyword to search for companies/organizations (e.g., 'artificial intelligence', 'fintech').

## `location` (type: `string`):

Filter by headquarters location (e.g., 'San Francisco', 'United States', 'Europe').

## `industryFilter` (type: `string`):

Filter by industry category (e.g., 'Artificial Intelligence', 'Financial Services', 'Health Care').

## `fundingStage` (type: `string`):

Filter by last funding stage.

## `directUrls` (type: `array`):

List of Crunchbase URLs to scrape directly (for scrapeProfiles, scrapeInvestors, scrapePeople actions). One URL per line.

## `maxResults` (type: `integer`):

Maximum number of results to return. Use 0 for unlimited.

## `includeFunding` (type: `boolean`):

Scrape detailed funding rounds for each organization profile.

## `includePeople` (type: `boolean`):

Scrape founders and key executives for each organization profile.

## `includeInvestors` (type: `boolean`):

Scrape detailed investor information for each funding round.

## `proxyConfiguration` (type: `object`):

Select proxies for the scraper. Residential proxies are strongly recommended for Crunchbase.

## `maxConcurrency` (type: `integer`):

Maximum number of pages to process in parallel. Lower values reduce chance of blocking.

## `requestTimeout` (type: `integer`):

Maximum time in seconds to wait for a page to load.

## Actor input object example

```json
{
  "action": "searchCompanies",
  "searchQuery": "artificial intelligence",
  "fundingStage": "",
  "maxResults": 5,
  "includeFunding": true,
  "includePeople": true,
  "includeInvestors": false,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  },
  "maxConcurrency": 3,
  "requestTimeout": 90
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQuery": "artificial intelligence",
    "maxResults": 5
};

// Run the Actor and wait for it to finish
const run = await client.actor("magicfingers/crunchbase-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQuery": "artificial intelligence",
    "maxResults": 5,
}

# Run the Actor and wait for it to finish
run = client.actor("magicfingers/crunchbase-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQuery": "artificial intelligence",
  "maxResults": 5
}' |
apify call magicfingers/crunchbase-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=magicfingers/crunchbase-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Crunchbase Scraper",
        "description": "Scrape Crunchbase company profiles, funding rounds, founders/executives, and investor data. Search by keyword, industry, location, or funding stage.",
        "version": "1.0",
        "x-build-id": "tJcVu4kDYQ6M43nOy"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/magicfingers~crunchbase-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-magicfingers-crunchbase-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/magicfingers~crunchbase-scraper/runs": {
            "post": {
                "operationId": "runs-sync-magicfingers-crunchbase-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/magicfingers~crunchbase-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-magicfingers-crunchbase-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "action"
                ],
                "properties": {
                    "action": {
                        "title": "Action",
                        "enum": [
                            "searchCompanies",
                            "scrapeProfiles",
                            "scrapeInvestors",
                            "scrapePeople"
                        ],
                        "type": "string",
                        "description": "What type of scraping to perform.",
                        "default": "searchCompanies"
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Keyword to search for companies/organizations (e.g., 'artificial intelligence', 'fintech')."
                    },
                    "location": {
                        "title": "Location Filter",
                        "type": "string",
                        "description": "Filter by headquarters location (e.g., 'San Francisco', 'United States', 'Europe')."
                    },
                    "industryFilter": {
                        "title": "Industry Filter",
                        "type": "string",
                        "description": "Filter by industry category (e.g., 'Artificial Intelligence', 'Financial Services', 'Health Care')."
                    },
                    "fundingStage": {
                        "title": "Funding Stage Filter",
                        "enum": [
                            "",
                            "seed",
                            "early_stage_venture",
                            "late_stage_venture",
                            "private_equity",
                            "debt_financing",
                            "post_ipo_equity",
                            "post_ipo_debt",
                            "secondary_market",
                            "grant",
                            "ipo"
                        ],
                        "type": "string",
                        "description": "Filter by last funding stage.",
                        "default": ""
                    },
                    "directUrls": {
                        "title": "Direct URLs",
                        "type": "array",
                        "description": "List of Crunchbase URLs to scrape directly (for scrapeProfiles, scrapeInvestors, scrapePeople actions). One URL per line.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxResults": {
                        "title": "Max Results",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of results to return. Use 0 for unlimited.",
                        "default": 5
                    },
                    "includeFunding": {
                        "title": "Include Funding Data",
                        "type": "boolean",
                        "description": "Scrape detailed funding rounds for each organization profile.",
                        "default": true
                    },
                    "includePeople": {
                        "title": "Include People Data",
                        "type": "boolean",
                        "description": "Scrape founders and key executives for each organization profile.",
                        "default": true
                    },
                    "includeInvestors": {
                        "title": "Include Investor Details",
                        "type": "boolean",
                        "description": "Scrape detailed investor information for each funding round.",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Select proxies for the scraper. Residential proxies are strongly recommended for Crunchbase.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrency",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Maximum number of pages to process in parallel. Lower values reduce chance of blocking.",
                        "default": 3
                    },
                    "requestTimeout": {
                        "title": "Request Timeout (seconds)",
                        "minimum": 30,
                        "maximum": 300,
                        "type": "integer",
                        "description": "Maximum time in seconds to wait for a page to load.",
                        "default": 90
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
