# Yelp Scraper (`widbox/yelp-scraper`) Actor

The Yelp-Reviews-Scraper, Apify platform based scraper, enables the collection of reviews directly from Yelp's business pages. This tool is designed to retrieve comprehensive business information, user ratings, and customer feedback without the necessity of utilizing Yelp's official API.

- **URL**: https://apify.com/widbox/yelp-scraper.md
- **Developed by:** [TrustHero Prod](https://apify.com/widbox) (community)
- **Categories:** Other
- **Stats:** 92 total users, 1 monthly users, 0.0% runs succeeded, 3 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

$20.00/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### **What does Yelp-Reviews-Scraper do?**

The Yelp-Reviews-Scraper, Apify platform based scraper, enables the collection of reviews directly from Yelp's business pages. This tool is designed to retrieve comprehensive business information, user ratings, and customer feedback without the necessity of utilizing Yelp's official API. It offers an alternative method for data extraction, providing valuable insights into customer experiences and business performance.

### **How to scrape Yelp Reviews**

Provide JSON formatted data like in example, and start the actor.

```json
{
    "bizUrl": "https://www.yelp.com/biz/girl-and-the-goat-chicago",
    "proxy": {
        "useApifyProxy": true,
        "apifyProxyGroups": []
    },
    "rating": "1, 2, 3",
    "reviewsCount": 100,
    "sortType": "DATE_DESC",
    "timestamp": 1682090094
}
````

Or set actor settings in actor settings editor.

![UI settings](https://i.postimg.cc/wTWgJW1m/Screenshot-2024-04-21-at-18-19-39.png)

### **How many results can you scrape with Yelp-Reviews-Scraper?**

It depends on provided business url. For example more than 10000 reviews was scraped by 4.29 minutes.
![Test 10000 run](https://i.postimg.cc/8z9GXjBk/Screenshot-2024-04-21-at-17-57-20.png)

### **How much will scraping Yelp-Reviews-Scraper cost you?**

When it comes to scraping, it can be challenging to estimate the resources needed to extract data as use cases may vary significantly. That's why the best course of action is to run a test scrape with a small sample of input data and limited output. You’ll get your price per scrape, which you’ll then multiply by the number of scrapes you intend to do.

[Watch this video](https://www.youtube.com/watch?v=-wyz2iscZ30) for a few helpful tips. And don't forget that choosing a higher plan will save you money in the long run.

### **Output**

Here's an example of Yelp-Reviews-Scraper's output:

```json
{
    "authorEncid": "qhyRxP8lBB2OZA6a_mn0Wg",
    "authorDisplayName": "Stella G.",
    "authorPhotoUrl": "https://s3-media0.fl.yelpcdn.com/photo/GhgQeJcizxevJu6TFjmXqg/30s.jpg",
    "authorDisplayLocation": "New York, NY",
    "authorFriendCount": 206,
    "authorReviewCount": 23,
    "reviewEncId": "DgTscDUA21jxyHL-9E6Lqg",
    "reviewRating": 5,
    "reviewText": "Adding this to the list of amazing smash burgers in the city! I was surprised to find a line when I got there at 4PM on a Saturday, but thankfully the line was short and moved quickly. While we were waiting to order, we were told that our seats were at the counter. Interesting system not being able to pick your own seats, but I guess it's to ensure that they're using their table space efficiently. Besides counter seats, there are also some booths and hightops.\n\nWe ordered the double gotham smash burger and the loaded tots to share. The bun is toasted, the cheese melty, the perfect amount of sauce with thick pickles... it's a solid burger! If I were to compare it to 7th Street (another great smash burger), it's definitely more substantial and less greasy. The tots were nice and crispy, and the chopped cheese is a perfect topping for them. But for $16, I don't think I need to get them again.\n\nThe only complaint I have is that the burger was served in a paper carton, so the burger kept sliding apart and was just really hard to eat. If it was served in a normal burger paper wrap, it would be perfect! We had a chance to chat with the guy that was handling the line at the door, and he said the owner is super nice and that he loved working there, which is always awesome to hear!",
    "reviewLanguage": "en",
    "reviewPhotos": [
        "https://s3-media0.fl.yelpcdn.com/bphoto/U6m-9LKDDYkNhqIM3n0Yrw/348s.jpg",
        "https://s3-media0.fl.yelpcdn.com/bphoto/SOrOhUgcDUFj2IQgcqryOA/348s.jpg"
    ],
    "reviewCreatedAtLocal": "2024-04-14T20:39:50-04:00",
    "reviewTimestamp": 1713141590,
    "reviewReviewLink": "https://www.yelp.com/biz/gotham-burger-social-club-new-york?hrid=DgTscDUA21jxyHL-9E6Lqg",
    "business": {
        "alias": "gotham-burger-social-club-new-york",
        "encid": "DQYQw31j3LxXv_viOZx7Lg",
        "name": "Gotham Burger Social Club",
        "photoUrl": "https://s3-media0.fl.yelpcdn.com/bphoto/jTX48X9EOLpfwvy4xkFrkQ/90s.jpg",
        "rating": 4.5,
        "reviewCount": 59,
        "reviewCountsByLanguage": [
            {
                "language": "en",
                "count": 58,
                "__typename": "BusinessReviewLanguageCount"
            },
            {
                "language": "zh",
                "count": 1,
                "__typename": "BusinessReviewLanguageCount"
            }
        ],
        "reviewCountsByRating": [2, 1, 3, 11, 42],
        "address": {
            "streetAddress": "131 Essex St",
            "addressLocality": "New York",
            "addressRegion": "NY",
            "postalCode": "10002",
            "addressCountry": "US",
            "phone": ""
        }
    }
}
```

### **Is it legal to scrape Yelp?**

Note that personal data is protected by GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers. We also recommend that you read our blog post: [is web scraping legal?](https://blog.apify.com/is-web-scraping-legal/)

# Actor input Schema

## `bizUrl` (type: `string`):

Provide url of YELP place, to scrape reviews. Example: https://www.yelp.com/biz/gotham-burger-social-club-new-york.

## `sortType` (type: `string`):

Select one of yelp sorting methods like a date-descending, date-ascending or other.

## `reviewsCount` (type: `integer`):

Provide reviews count to scrape, for limit reviews. If reviews count is not filled, we will try to get all existing reviews

## `rating` (type: `string`):

Provide review rating numbers you want to scrape. If you want to scrape all reviews, leave it empty. Format: 1, 2, 3, 4, 5.

## `timestamp` (type: `integer`):

Provide timestamp in seconds to scrape reviews newer than this timestamp. Reviews older than this timestamp will be ignored. Use only with "Newest first" sortType, or timestamp will be ignored.

## `proxy` (type: `object`):

Select proxies to be used by your crawler.

## Actor input object example

```json
{
  "bizUrl": "https://www.yelp.com/biz/gotham-burger-social-club-new-york",
  "sortType": "RELEVANCE_DESC",
  "rating": "1, 2, 3, 4, 5",
  "proxy": {
    "useApifyProxy": true
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "bizUrl": "https://www.yelp.com/biz/gotham-burger-social-club-new-york",
    "rating": "1, 2, 3, 4, 5",
    "proxy": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("widbox/yelp-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "bizUrl": "https://www.yelp.com/biz/gotham-burger-social-club-new-york",
    "rating": "1, 2, 3, 4, 5",
    "proxy": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("widbox/yelp-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "bizUrl": "https://www.yelp.com/biz/gotham-burger-social-club-new-york",
  "rating": "1, 2, 3, 4, 5",
  "proxy": {
    "useApifyProxy": true
  }
}' |
apify call widbox/yelp-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=widbox/yelp-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Yelp Scraper",
        "description": "The Yelp-Reviews-Scraper, Apify platform based scraper, enables the collection of reviews directly from Yelp's business pages. This tool is designed to retrieve comprehensive business information, user ratings, and customer feedback without the necessity of utilizing Yelp's official API.",
        "version": "0.0",
        "x-build-id": "5ccFNCXDep4MAygQU"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/widbox~yelp-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-widbox-yelp-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/widbox~yelp-scraper/runs": {
            "post": {
                "operationId": "runs-sync-widbox-yelp-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/widbox~yelp-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-widbox-yelp-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "bizUrl",
                    "sortType",
                    "proxy"
                ],
                "properties": {
                    "bizUrl": {
                        "title": "Yelp business page url",
                        "type": "string",
                        "description": "Provide url of YELP place, to scrape reviews. Example: https://www.yelp.com/biz/gotham-burger-social-club-new-york."
                    },
                    "sortType": {
                        "title": "Yelp reviews sorting method",
                        "enum": [
                            "RELEVANCE_DESC",
                            "DATE_DESC",
                            "DATE_ASC",
                            "RATING_DESC",
                            "RATING_ASC",
                            "ELITES_DESC"
                        ],
                        "type": "string",
                        "description": "Select one of yelp sorting methods like a date-descending, date-ascending or other.",
                        "default": "RELEVANCE_DESC"
                    },
                    "reviewsCount": {
                        "title": "Reviews count",
                        "type": "integer",
                        "description": "Provide reviews count to scrape, for limit reviews. If reviews count is not filled, we will try to get all existing reviews"
                    },
                    "rating": {
                        "title": "Yelp reviews rating",
                        "pattern": "^(?:[1-5],\\s?){0,4}[1-5]?$",
                        "type": "string",
                        "description": "Provide review rating numbers you want to scrape. If you want to scrape all reviews, leave it empty. Format: 1, 2, 3, 4, 5."
                    },
                    "timestamp": {
                        "title": "Timestamp filter (seconds)",
                        "type": "integer",
                        "description": "Provide timestamp in seconds to scrape reviews newer than this timestamp. Reviews older than this timestamp will be ignored. Use only with \"Newest first\" sortType, or timestamp will be ignored."
                    },
                    "proxy": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Select proxies to be used by your crawler."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
