# Yellow Pages Scraper (`onidivo/yellow-pages-scraper`) Actor

Crawl the Yellow Pages site and extract data about businesses. Scrape business details with unlimited options like search terms, location, sorting options, and many more.

- **URL**: https://apify.com/onidivo/yellow-pages-scraper.md
- **Developed by:** [Onidivo Technologies](https://apify.com/onidivo) (community)
- **Categories:** Lead generation
- **Stats:** 182 total users, 1 monthly users, 87.5% runs succeeded, 4 bookmarks
- **User rating**: 1.00 out of 5 stars

## Pricing

$25.00/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Yellow Pages Scraper

<!-- toc start -->

- [Features](#Features)
- [Cost of usage](#Cost-of-usage)
- [Bugs, issues, features, and feedback](#Bugs,-issues,-features,-and-feedback)
- [Input](#Input)
- [Output](#Output)

<!-- toc end -->

Crawl the Yellow Pages site and extract data about businesses. Scrape business details for any search term, location,
and sorting option. Download and use the
data in whatever way you want.

### Features

- Multiple business listings and details
- Search by term and location
- Auto-verifying and finding location
- Sorting listing results

### Cost of usage

When running the actor with memory of **512 MB**:

- Using datacenter proxies, average consumption is about **$0.12** usage credits and **15 minutes** per **1000
  businesses**.
- Using residential proxies, average consumption is about **$1** usage credits and **20 minutes** per **1000
  businesses**.

### Bugs, issues, features, and feedback

You can report issues on the Actor tab "Issues"
or [here](https://github.com/onidivo/apify-actors/issues/new?title=Yellow+Pages+-+) and discuss or
leave your
feedback [here](https://github.com/onidivo/apify-actors/discussions).

### Input

You can provide input either through the editor on the Apify platform or as a JSON object.

The only mandatory fields you need to provide are at least one of: **startUrls** or **searchTerm** & **searchLocation**.

An example of minimal input:

```json
{
    "startUrls": [
        {
            "url": "https://www.yellowpages.com/search?search_terms=air+conditioning+service+repair&geo_location_terms=San+Francisco%2C+CA"
        },
        {
            "url": "https://www.yellowpages.com/san-francisco-ca/mip/air-conditioning-service-repair-515780833"
        }
    ],
    "searchTerm": "Air conditioning",
    "searchLocation": "San Francisco",
    "sortOption": "NO_SORTING",
    "includeAllDetails": true,
    "maxItems": 1000,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
}

````

**The scraper forcibly use Apify Proxy so that you don’t get blocked by Yellow Pages. It is better to use residential
proxies to reduce blocking. Also, it is advised to use a low value for max concurrency.**

### Output

The output of each business looks like this:

- [Table format](https://airtable.com/appL3J93Ydw4pvYib/shrdQwqTlY14atXja/tblP4HCspeE7dLCHW?backgroundColor=cyanDusty)
- JSON format

```json
[
    {
        "searchTerm": "Air conditioning",
        "searchLocation": "San Francisco",
        "name": "Schmitt Heating & Air Conditioning Inc.",
        "address": {
            "addressCountry": "US",
            "streetAddress": "1580 Tennessee St",
            "addressLocality": "San Francisco",
            "addressRegion": "CA",
            "postalCode": "94107"
        },
        "phoneNumber": "(415) 527-0730",
        "imageUrl": "https://i2.ypcdn.com/blob/7581113edb7f0c86acf86704af03fe5b49745ed6",
        "openingHours": [
            "Mo-Fr 07:00-16:30"
        ],
        "websiteUrl": "https://www.schmittheating.com",
        "ratingValue": 5,
        "reviewCount": 1,
        "url": "https://www.yellowpages.com/san-francisco-ca/mip/schmitt-heating-air-conditioning-inc-497104287?lid=1001921646556"
    }
]
```

# Actor input Schema

## `searchTerm` (type: `string`):

The term to search for

## `searchLocation` (type: `string`):

The location to search in

## `startUrls` (type: `array`):

The start URLs to scrape. It can be listening or businesses pages.

## `sortOption` (type: `string`):

The option to use for sorting the results

## `includeAllDetails` (type: `boolean`):

Get details from direct business pages

## `maxItems` (type: `integer`):

Maximum number of items to scrape per whole run. If you want to scrape all available, set this to `0` or `9999999`.

## `proxyConfiguration` (type: `object`):

Provided options to configure connection with a proxy server.

## `maxConcurrency` (type: `integer`):

Maximum concurrency that you want to run actor

## Actor input object example

```json
{
  "searchTerm": "Air conditioning",
  "searchLocation": "San Francisco",
  "startUrls": [
    {
      "url": "https://www.yellowpages.com/search?search_terms=air+conditioning+service+repair&geo_location_terms=San+Francisco%2C+CA"
    },
    {
      "url": "https://www.yellowpages.com/san-francisco-ca/mip/air-conditioning-service-repair-515780833"
    }
  ],
  "sortOption": "NO_SORTING",
  "includeAllDetails": true,
  "maxItems": 100,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "US"
  },
  "maxConcurrency": 5
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.yellowpages.com/search?search_terms=air+conditioning+service+repair&geo_location_terms=San+Francisco%2C+CA"
        },
        {
            "url": "https://www.yellowpages.com/san-francisco-ca/mip/air-conditioning-service-repair-515780833"
        }
    ],
    "maxItems": 100,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ],
        "apifyProxyCountry": "US"
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("onidivo/yellow-pages-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [
        { "url": "https://www.yellowpages.com/search?search_terms=air+conditioning+service+repair&geo_location_terms=San+Francisco%2C+CA" },
        { "url": "https://www.yellowpages.com/san-francisco-ca/mip/air-conditioning-service-repair-515780833" },
    ],
    "maxItems": 100,
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
        "apifyProxyCountry": "US",
    },
}

# Run the Actor and wait for it to finish
run = client.actor("onidivo/yellow-pages-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.yellowpages.com/search?search_terms=air+conditioning+service+repair&geo_location_terms=San+Francisco%2C+CA"
    },
    {
      "url": "https://www.yellowpages.com/san-francisco-ca/mip/air-conditioning-service-repair-515780833"
    }
  ],
  "maxItems": 100,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "US"
  }
}' |
apify call onidivo/yellow-pages-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=onidivo/yellow-pages-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Yellow Pages Scraper",
        "description": "Crawl the Yellow Pages site and extract data about businesses. Scrape business details with unlimited options like search terms, location, sorting options, and many more.",
        "version": "0.0",
        "x-build-id": "R7sl8YaAVv4Z7ZnwY"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/onidivo~yellow-pages-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-onidivo-yellow-pages-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/onidivo~yellow-pages-scraper/runs": {
            "post": {
                "operationId": "runs-sync-onidivo-yellow-pages-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/onidivo~yellow-pages-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-onidivo-yellow-pages-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchTerm": {
                        "title": "Search term",
                        "type": "string",
                        "description": "The term to search for"
                    },
                    "searchLocation": {
                        "title": "Search location",
                        "type": "string",
                        "description": "The location to search in"
                    },
                    "startUrls": {
                        "title": "Start URLs",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "The start URLs to scrape. It can be listening or businesses pages.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "sortOption": {
                        "title": "Sort Option",
                        "enum": [
                            "NO_SORTING",
                            "DISTANCE",
                            "RATING",
                            "NAME"
                        ],
                        "type": "string",
                        "description": "The option to use for sorting the results",
                        "default": "NO_SORTING"
                    },
                    "includeAllDetails": {
                        "title": "Include All Details",
                        "type": "boolean",
                        "description": "Get details from direct business pages",
                        "default": true
                    },
                    "maxItems": {
                        "title": "Maximum Items",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of items to scrape per whole run. If you want to scrape all available, set this to `0` or `9999999`.",
                        "default": 100
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Provided options to configure connection with a proxy server.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ],
                            "apifyProxyCountry": "US"
                        }
                    },
                    "maxConcurrency": {
                        "title": "Maximum Concurrency",
                        "type": "integer",
                        "description": "Maximum concurrency that you want to run actor",
                        "default": 5
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
