# Web Scraper (`apify/web-scraper`) Actor

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

- **URL**: https://apify.com/apify/web-scraper.md
- **Developed by:** [Apify](https://apify.com/apify) (Apify)
- **Categories:** Developer tools, Open source
- **Stats:** 119,418 total users, 3,310 monthly users, 99.6% runs succeeded, 1,366 bookmarks
- **User rating**: 4.49 out of 5 stars

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Web Scraper

### What is Web Scraper?

Web Scraper is a tool for extracting data from any website. It can navigate pages, render JavaScript, and extract structured data using a few simple commands. Whether you need to scrape product prices, real estate data, or social media profiles, this Actor turns any web page into an API.

- Configurable with an **intuitive user interface**
- Can handle almost **any website** and can scrape dynamic content
- Scrape a list of **URLs or crawl an entire website** by following links
- Runs entirely on the **Apify platform**; no need to manage servers or proxies
- Set your scraper to **run on a schedule** and get data delivered automatically
- Can be used as a template to **create your own scraper**

### What can Web Scraper data be used for?

Web Scraper can extract almost any data from any site, effectively turning any site into a data source. All data can be exported into **JSON, CSV, HTML, and Excel** formats.

Here are some examples:

- **Extract reviews** from sites like Yelp or Amazon
- Gather **real estate data** from Zillow or local realtor pages
- Get **contact details** and social media accounts from local businesses
- **Monitor mentions** of a brand or person on specific sites
- **Collect and monitor product prices** on e-commerce websites

As a generic tool, Web Scraper can also serve as a template to **build your own scraper** which you can then market on Apify Store.

### How much does the Web Scraper cost?

Web Scraper is free to use, but you do pay for Apify platform usage, which is calculated in [compute units](https://help.apify.com/en/articles/3490384-what-is-a-compute-unit?ref=apify) (CU). On the free plan, these are charged at $0.04 per CU. CUs get cheaper with higher subscription plans - [see our pricing page](https://apify.com/pricing) for more details.

With our free plan, you get **$5 in platform credits every month**, which is enough to scrape from 500 to 1,000 **web pages**. If you sign up to our Starter plan, you can expect to scrape thousands.

### How to use Web Scraper

1. [Create](https://console.apify.com/actors/moJRLRc85AitArpNN?addFromActorId=moJRLRc85AitArpNN) a free Apify account using your email and open [Web Scraper](https://apify.com/apify/web-scraper)
2. Add one or more URLs you want to scrape
3. Set paths that you’d like to include or exclude from crawling by configuring glob patterns or pseudo-URLs
4. Configure the page function that determines the data that needs to be scraped
5. Click the “Start” button and wait for the data to be extracted
6. Download your data in JSON, XML, CSV, Excel, or HTML

For more in-depth instructions, please read our article on [scraping with Web Scraper](https://docs.apify.com/tutorials/apify-scrapers/web-scraper), which features step-by-step instructions on how to use Web Scraper on the basis of real-life examples. We also have a video tutorial you can follow along with:

https://www.youtube.com/watch?v=5kcaHAuGxmY

### Using Web Scraper with the Apify API

The Apify API gives you programmatic access to the Apify platform. The API is organized around RESTful HTTP endpoints that enable you to manage, schedule, and run Apify Actors. The API also lets you access any datasets, monitor actor performance, fetch results, create and update versions, and more.

To access the API using Node.js, use the `apify-client` [NPM package](https://apify.com/apify/web-scraper/api/javascript). To access the API using Python, use the `apify-client` [PyPI package](https://apify.com/apify/web-scraper/api/python).

Click on the [API tab](https://apify.com/apify/web-scraper/api/python) for code examples, or check out the [Apify API reference](https://docs.apify.com/api/v2) docs for all the details.

### Web Scraper and MCP Server

With Apify API, you can use almost any Actor in conjunction with an MCP server. You can connect to the MCP server using clients like ClaudeDesktop and LibreChat, or even build your own. Read all about how you can [set up Apify Actors with MCP](https://blog.apify.com/how-to-use-mcp/).

For Web Scraper, go to the [MCP tab](https://apify.com/apify/web-scraper/api/mcp) and then go through the following steps:

1. Start a Server-Sent Events (SSE) session to receive a `sessionId`
2. Send API messages using that `sessionId` to trigger the scraper
3. The message starts the Web Scraper with the provided input
4. The response should be: `Accepted`

### Integrating Web Scraper into your workflows

You can integrate Web Scraper with almost any cloud service or web app. We offer integrations with **Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive**, [and plenty more](https://docs.apify.com/integrations).

Alternatively, you could use [webhooks](https://docs.apify.com/integrations/webhooks) to carry out an action whenever an event occurs, such as getting a notification whenever Web Scraper successfully finishes a run.

### Advanced configuration settings

Below you’ll find detailed instructions on more advanced configuration settings for Web Scraper.

### Input configurations

On input, the Web Scraper Actor accepts a number of configuration settings. These can be entered either manually in the user interface in [Apify Console](https://console.apify.com/), or programmatically in a JSON object using the [Apify API](https://docs.apify.com/api/v2#/reference/actors/run-collection/run-actor).

For a complete list of input fields and their type, please see the [input tab](https://apify.com/apify/web-scraper/input-schema).

#### Run mode

Run mode allows you to switch between two modes of operation for Web Scraper.

**PRODUCTION** mode gives you full control and full performance. You should always switch Web Scraper to production mode once you're done making changes to your scraper.

When starting to develop your Scraper, you want to be able to inspect what's happening in the browser and debug your code. You can do that with the scraper's **DEVELOPMENT** mode. It allows you to directly control the browser using Chrome DevTools. Open the Live View tab to access the DevTools. It will also limit concurrency and prevent timeouts to improve your DevTools experience. Other debugging related options can be configured in the **Advanced configuration** section.

#### Start URLs

The **Start URLs** (`startUrls`) field represent the initial list of URLs of pages that the scraper will visit. You can either enter these URLs manually one by one, upload them in a CSV file or
[link URLs from the Google Sheets](https://help.apify.com/en/articles/2906022-scraping-a-list-of-urls-from-a-google-sheets-document) document. Each URL must start with either a `http://` or `https://` protocol prefix.

The scraper supports adding new URLs to scrape on the fly, either using the [**Link selector**](#link-selector) and [**Glob Patterns**](#glob-patterns)/[**Pseudo-URLs**](#pseudo-urls) options or by calling `await context.enqueueRequest()` inside [**Page function**](#page-function).

Optionally, each URL can be associated with custom user data - a JSON object that can be referenced from your JavaScript code in [**Page function**](#page-function) under `context.request.userData`. This is useful for determining which start URL is currently loaded, in order to perform some page-specific actions. For example, when crawling an online store, you might want to perform different
actions on a page listing the products vs. a product detail page. For details, see our [web scraping tutorial](https://docs.apify.com/tutorials/apify-scrapers/getting-started#the-start-url).

<!-- TODO: Describe how the queue works, unique key etc. plus link -->

#### Link selector

The **Link selector** (`linkSelector`) field contains a CSS selector that is used to find links to other web pages, i.e. `<a>` elements with the `href` attribute.

On every page loaded, the scraper looks for all links matching **Link selector**, checks that the target URL matches one of the [**Glob Patterns**](#glob-patterns)/[**Pseudo-URLs**](#pseudo-urls), and if so then adds the URL to the request queue, so that it's loaded by the scraper later.

By default, new scrapers are created with the following selector that matches all links:

````

a\[href]

```

If **Link selector** is empty, the page links are ignored, and the scraper only loads pages that were specified in [**Start URLs**](#start-urls) or that were manually added to the request queue by calling `await context.enqueueRequest()` in [**Page function**](#page-function).

#### Glob Patterns

The **Glob Patterns** (`globs`) field specifies which types of URLs found by [**Link selector**](#link-selector) should be added to the request queue.

A glob pattern is simply a string with wildcard characters.

For example, a glob pattern `http://www.example.com/pages/**/*` will match all the
following URLs:

- `http://www.example.com/pages/deeper-level/page`
- `http://www.example.com/pages/my-awesome-page`
- `http://www.example.com/pages/something`

Note that you don't need to use the **Glob Patterns** setting at all, because you can completely control which pages the scraper will access by calling `await context.enqueueRequest()` from the [**Page function**](#page-function).

#### Pseudo-URLs

The **Pseudo-URLs** (`pseudoUrls`) field specifies what kind of URLs found by [**Link selector**](#link-selector) should be added to the request queue.

A pseudo-URL is simply a URL with special directives enclosed in `[]` brackets. Currently, the only supported directive is `[regexp]`, which defines a JavaScript-style regular expression to match against the URL.

For example, a pseudo-URL `http://www.example.com/pages/[(\w|-)*]` will match all the
following URLs:

- `http://www.example.com/pages/`
- `http://www.example.com/pages/my-awesome-page`
- `http://www.example.com/pages/something`

If either `[` or `]` is part of the normal query string, it must be encoded as `[\x5B]` or `[\x5D]`, respectively. For example, the following pseudo-URL:

```

http://www.example.com/search?do\[\x5B]load\[\x5D]=1

```

will match the URL:

```

http://www.example.com/search?do\[load]=1

````

Optionally, each pseudo-URL can be associated with user data that can be referenced from
your [**Page function**](#page-function) using `context.request.label` to determine which kind of page is currently loaded in the browser.

Note that you don't need to use the **Pseudo-URLs** setting at all, because you can completely control which pages the scraper will access by calling `await context.enqueueRequest()` from [**Page function**](#page-function).

#### Page function

The **Page function** (`pageFunction`) field contains a JavaScript function that is executed in the context of every page loaded in the Chromium browser. The purpose of this function is to extract
data from the web page, manipulate the DOM by clicking elements, add new URLs to the request queue and otherwise control Web Scraper's operation.

Example:

```javascript
async function pageFunction(context) {
    // jQuery is handy for finding DOM elements and extracting data from them.
    // To use it, make sure to enable the "Inject jQuery" option.
    const $ = context.jQuery;
    const pageTitle = $('title').first().text();

    // Print some information to Actor log
    context.log.info(`URL: ${context.request.url}, TITLE: ${pageTitle}`);

    // Manually add a new page to the scraping queue.
    await context.enqueueRequest({ url: 'http://www.example.com' });

    // Return an object with the data extracted from the page.
    // It will be stored to the resulting dataset.
    return {
        url: context.request.url,
        pageTitle,
    };
}
````

The page function accepts a single argument, the `context` object, whose properties are listed in the table below. Since the function is executed in the context of the web page, it can access the DOM, e.g. using the `window` or `document` global variables.

The return value of the page function is an object (or an array of objects) representing the data extracted from the web page. The return value must be stringify-able to JSON, i.e. it can only contain basic types and no circular references. If you don't want to extract any data from the page and skip it in the clean results, simply return `null` or `undefined`.

The page function supports the JavaScript ES6 syntax and is asynchronous, which means you can use the `await` keyword to wait for background operations to finish. To learn more about `async` functions, see <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function">Mozilla documentation</a>.

**Properties of the `context` object:**

- **`customData: Object`**
  Contains the object provided in the **Custom data** (`customData`) input setting. This is useful for passing dynamic parameters to your Web Scraper using API.

- **`enqueueRequest(request, [options]): AsyncFunction`**
  Adds a new URL to the request queue, if it wasn't already there. The `request` parameter is an object containing details of the request, with properties such as `url`, `label`, `userData`, `headers` etc. For the full list of the supported properties, see the <a href="https://crawlee.dev/api/core/class/Request" target="_blank">`Request`</a> object's constructor in Crawlee documentation.
  The optional `options` parameter is an object with additional options. Currently, it only supports the `forefront` boolean flag. If it's `true`, the request is added to the beginning of the queue. By default, requests are added to the end.
  Example:
  ```javascript
  await context.enqueueRequest({ url: 'https://www.example.com' });
  await context.enqueueRequest(
      { url: 'https://www.example.com/first' },
      { forefront: true },
  );
  ```

- **`env: Object`**
  A map of all relevant values set by the Apify platform to the Actor run via the `APIFY_` environment variables. For example, you can find here information such as Actor run ID, timeouts, Actor run memory, etc.
  For the full list of available values, see <a href="https://sdk.apify.com/api/apify/interface/ApifyEnv" target="_blank">`ApifyEnv`</a> interface in Apify SDK.
  Example:
  ```javascript
  console.log(`Actor run ID: ${context.env.actorRunId}`);
  ```

- **`getValue(key): AsyncFunction`**
  Gets a value from the default key-value store associated with the Actor run. The key-value store is useful for persisting named data records, such as state objects, files, etc. The function is very similar to <a href="https://sdk.apify.com/api/apify/class/Actor#getValue" target="_blank">`Actor.getValue()`</a> function in Apify SDK.
  To set the value, use the dual function `context.setValue(key, value)`.
  Example:
  ```javascript
  const value = await context.getValue('my-key');
  console.dir(value);
  ```

- **`globalStore: Object`**
  Represents an in-memory store that can be used to share data across page function invocations, e.g. state variables, API responses or other data. The `globalStore` object has an equivalent interface as JavaScript's <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map" target="_blank">`Map`</a> object, with a few important differences:
  - All functions of `globalStore` are `async`; use `await` when calling them.
  - Keys must be strings and values need to be JSON stringify-able.
  - `forEach()` function is not supported.
    Note that the stored data is not persisted. If the Actor run is restarted or migrated to another worker server, the content of `globalStore` is reset. Therefore, never depend on a specific value to be present in the store.
    Example:
  ```javascript
  let movies = await context.globalStore.get('cached-movies');
  if (!movies) {
      movies = await fetch('http://example.com/movies.json');
      await context.globalStore.set('cached-movies', movies);
  }
  console.dir(movies);
  ```

- **`input: Object`**
  An object containing the Actor run input, i.e. the Web Scraper's configuration. Each page function invocation gets a fresh copy of the `input` object, so changing its properties has no effect.

- **`jQuery: Function`**
  A reference to the <a href="https://api.jquery.com/" target="_blank">`jQuery`</a> library, which is extremely useful for DOM traversing, manipulation, querying and data extraction. This field is only available if the **Inject jQuery** option is enabled.
  Typically, the jQuery function is registered under a global variable called <code>$</code>.
  However, the web page might use this global variable for something else. To avoid conflicts, the jQuery object is not registered globally and is only available through the `context.jQuery` property.
  Example:
  ```javascript
  const $ = context.jQuery;
  const pageTitle = $('title').first().text();
  ```

- **`log: Object`**
  An object containing logging functions, with the same interface as provided by the <a href="https://crawlee.dev/api/core/class/Log" target="_blank">`Crawlee.utils.log`</a> object in Crawlee. The log messages are written directly to the Actor run log, which is useful for monitoring and debugging. Note that `log.debug()` only prints messages to the log if the **Enable debug log** input setting is set.
  Example:
  ```javascript
  const log = context.log;
  log.debug('Debug message', { hello: 'world!' });
  log.info('Information message', { all: 'good' });
  log.warning('Warning message');
  log.error('Error message', { details: 'This is bad!' });
  try {
      throw new Error('Not good!');
  } catch (e) {
      log.exception(e, 'Exception occurred', {
          details: 'This is really bad!',
      });
  }
  ```

- **`request: Object`**
  An object containing information about the currently loaded web page, such as the URL, number of retries, a unique key, etc. Its properties are equivalent to the <a href="https://crawlee.dev/api/core/class/Request" target="_blank">`Request`</a> object in Crawlee.

- **`response: Object`**
  An object containing information about the HTTP response from the web server. Currently, it only contains the `status` and `headers` properties. For example:

  ```javascript
  {
    // HTTP status code
    status: 200,

    // HTTP headers
    headers: {
      'content-type': 'text/html; charset=utf-8',
      'date': 'Wed, 06 Nov 2019 16:01:53 GMT',
      'cache-control': 'no-cache',
      'content-encoding': 'gzip',
    },
  }

  ```

- **`saveSnapshot(): AsyncFunction`**
  Saves a screenshot and full HTML of the current page to the key-value store
  associated with the Actor run, under the `SNAPSHOT-SCREENSHOT` and `SNAPSHOT-HTML` keys, respectively. This feature is useful when debugging your scraper.
  Note that each snapshot overwrites the previous one and the `saveSnapshot()` calls are throttled to at most one call in two seconds, in order to avoid excess consumption of resources and slowdown of the Actor.

- **`setValue(key, data, options): AsyncFunction`**
  Sets a value to the default key-value store associated with the Actor run. The key-value store is useful for persisting named data records, such as state objects, files, etc. The function is very similar to <a href="https://crawlee.dev/api/core/class/KeyValueStore#setValue" target="_blank">`KeyValueStore.setValue()`</a> function in Crawlee.
  To get the value, use the dual function `await context.getValue(key)`.
  Example:
  ```javascript
  await context.setValue('my-key', { hello: 'world' });
  ```

- **`skipLinks(): AsyncFunction`**
  Calling this function ensures that page links from the current page will not be added to the request queue, even if they match the [**Link selector**](#link-selector) and/or [**Glob Patterns**](#glob-patterns)/[**Pseudo-URLs**](#pseudo-urls) settings. This is useful to programmatically stop recursive crawling, e.g. if you know there are no more interesting links on the current page to follow.

- **`waitFor(task, options): AsyncFunction`**
  A helper function that waits either a specific amount of time (in milliseconds), for an element specified using a CSS selector to appear in the DOM or for a provided function to return `true`.
  This is useful for extracting data from web pages with dynamic content, where the content might not be available at the time when the page function is called.
  The `options` parameter is an object with the following properties and default values:

  ```javascript
  {
    // Maximum time to wait
    timeoutMillis: 20000,

    // How often to check if the condition changes
    pollingIntervalMillis: 50,
  }
  ```

  Example:

  ```javascript
  // Wait for selector
  await context.waitFor('.foo');
  // Wait for 1 second
  await context.waitFor(1000);
  // Wait for predicate
  await context.waitFor(() => !!document.querySelector('.foo'), {
      timeoutMillis: 5000,
  });
  ```

### Proxy configuration

The **Proxy configuration** (`proxyConfiguration`) option enables you to set proxies that will be used by the scraper in order to prevent its detection by target websites. You can use both [Apify Proxy](https://apify.com/proxy) and custom HTTP or SOCKS5 proxy servers.

Proxy is required to run the scraper. The following table lists the available options of the proxy configuration setting:

<table class="table table-bordered table-condensed">
    <tbody>
    <tr>
        <th><b>Apify Proxy (automatic)</b></td>
        <td>
            The scraper will load all web pages using <a href="https://apify.com/proxy">Apify Proxy</a> in the automatic mode. In this mode, the proxy uses all proxy groups that are available to the user, and for each new web page it automatically selects the proxy that hasn't been used in the longest time for the specific hostname, in order to reduce the chance of detection by the website. You can view the list of available proxy groups on the <a href="https://console.apify.com/proxy" target="_blank" rel="noopener">Proxy</a> page in Apify Console.
        </td>
    </tr>
    <tr>
        <th><b>Apify Proxy (selected groups)</b></td>
        <td>
            The scraper will load all web pages using <a href="https://apify.com/proxy">Apify Proxy</a> with specific groups of target proxy servers.
        </td>
    </tr>
    <tr>
        <th><b>Custom proxies</b></td>
        <td>
            <p>
                The scraper will use a custom list of proxy servers. The proxies must be specified in the `scheme://user:password@host:port` format, multiple proxies should be separated by a space or new line. The URL scheme can be either `HTTP` or `SOCKS5`. User and password might be omitted, but the port must always be present.
            </p>
            <p>
                Example:
            </p>
            <pre><code class="language-none">http://bob:password@proxy1.example.com:8000
http://bob:password@proxy2.example.com:8000</code></pre>
        </td>
    </tr>
    </tbody>
</table>

The proxy configuration can be set programmatically when calling the Actor using the API
by setting the `proxyConfiguration` field. It accepts a JSON object with the following structure:

```javascript
{
    // Indicates whether to use Apify Proxy or not.
    "useApifyProxy": Boolean,

    // Array of Apify Proxy groups, only used if "useApifyProxy" is true.
    // If missing or null, Apify Proxy will use the automatic mode.
    "apifyProxyGroups": String[],

    // Array of custom proxy URLs, in "scheme://user:password@host:port" format.
    // If missing or null, custom proxies are not used.
    "proxyUrls": String[],
}
```

#### Logging into websites with Web Scraper

The **Initial cookies** field allows you to set cookies that will be used by the scraper to log into websites. Cookies are small text files that are stored on your computer by your web browser. Various websites use cookies to store information about your current session. By transferring this information to the scraper, it will be able to log into websites using your credentials. To learn more about logging into websites by transferring cookies, check out our [tutorial](https://docs.apify.com/tutorials/log-in-by-transferring-cookies).

Be aware that cookies usually have a limited lifespan and will expire after a certain period of time. This means that you will have to update the cookies periodically in order to keep the scraper logged in. Alternative approach is to make the scraper actively log in to the website in the Page function. For more info about this approach, check out our [tutorial](https://docs.apify.com/tutorials/log-into-a-website-using-puppeteer) on logging into websites using Puppeteer.

The scraper expects the cookies in the **Initial cookies** field to be stored as separate JSON objects in a JSON array, see example below:

```json
[
    {
        "name": " ga",
        "value": "GA1.1.689972112. 1627459041",
        "domain": ".apify.com",
        "hostOnly": false,
        "path": "/",
        "secure": false,
        "httpOnly": false,
        "sameSite": "no_restriction",
        "session": false,
        "firstPartyDomain": "",
        "expirationDate": 1695304183,
        "storelId": "firefox-default",
        "id": 1
    }
]
```

### Advanced configuration

#### Pre-navigation hooks

This is an array of functions that will be executed **BEFORE** the main `pageFunction` is run. A similar `context` object is passed into each of these functions as is passed into the `pageFunction`; however, a second "[DirectNavigationOptions](https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#DirectNavigationOptions)" object is also passed in.

The available options can be seen here:

```javascript
preNavigationHooks: [
    async (
        { id, request, session, proxyInfo },
        { timeout, waitUntil, referer },
    ) => {},
];
```

> Unlike with playwright, puppeteer and cheerio scrapers, in web scraper we don't have the Actor object available in the hook parameters, as the hook is executed inside the browser.

Check out the docs for [Pre-navigation hooks](https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks) and the [PuppeteerHook type](https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerHook) for more info regarding the objects passed into these functions.

#### Post-navigation hooks

An array of functions that will be executed **AFTER** the main `pageFunction` is run. The only available parameter is the `CrawlingContext` object.

```javascript
postNavigationHooks: [
    async ({ id, request, session, proxyInfo, response }) => {},
],
```

> Unlike with playwright, puppeteer and cheerio scrapers, in web scraper we don't have the Actor object available in the hook parameters, as the hook is executed inside the browser.

Check out the docs for [Post-navigation hooks](https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#postNavigationHooks) and the [PuppeteerHook type](https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerHook) for more info regarding the objects passed into these functions.

#### Insert breakpoint

This property has no effect if [Run mode](#run-mode) is set to **PRODUCTION**. When set to **DEVELOPMENT** it inserts a breakpoint at the selected location in every page the scraper visits. Execution of code stops at the breakpoint until manually resumed in the DevTools window accessible via Live View tab or Container URL. Additional breakpoints can be added by adding debugger; statements within your Page function.

#### Debug log

When set to true, debug messages will be included in the log. Use `context.log.debug('message')` to log your own debug messages.

#### Browser log

When set to true, console messages from the browser will be included in the Actor's log. This may result in the log being flooded by error messages, warnings and other messages of little value (especially with a high concurrency).

#### Custom data

Since the input UI is fixed, it does not support adding of other fields that may be needed for all specific use cases. If you need to pass arbitrary data to the scraper, use the [Custom data](#custom-data) input field within [Advanced configuration](#advanced-configuration) and its contents will be available under the `customData` context key as an object within the [pageFunction](#page-function).

#### Custom names

With the final three options in the **Advanced configuration**, you can set custom names for the following:

- Dataset
- Key-value store
- Request queue

Leave the storage unnamed if you only want the data within it to be persisted on the Apify platform for a number of days corresponding to your [plan](https://apify.com/pricing) (after which it will expire). Named storages are retained indefinitely. Additionally, using a named storage allows you to share it across multiple runs (e.g. instead of having 10 different unnamed datasets for 10 different runs, all the data from all 10 runs can be accumulated into a single named dataset). Learn more [here](https://docs.apify.com/storage#named-and-unnamed-storages).

### Results

All scraping results returned by [**Page function**](#page-function) are stored in the default dataset associated with the Actor run, and can be saved in several different formats, such as JSON, XML, CSV or Excel. For each object returned by [**Page function**](#page-function), Web Scraper pushes one record into the dataset, and extends it with metadata such as the URL of the web page where the results come from.

For example, if your page function returned the following object:

```javascript
{
    message: 'Hello world!',
}
```

The full object stored in the dataset will look as follows
(in JSON format, including the metadata fields `#error` and `#debug`):

```json
{
    "message": "Hello world!",
    "#error": false,
    "#debug": {
        "requestId": "fvwscO2UJLdr10B",
        "url": "https://www.example.com/",
        "loadedUrl": "https://www.example.com/",
        "method": "GET",
        "retryCount": 0,
        "errorMessages": null,
        "statusCode": 200
    }
}
```

To download the results, call the [Get dataset items](https://docs.apify.com/api/v2#/reference/datasets/item-collection) API endpoint:

```
https://api.apify.com/v2/datasets/[DATASET_ID]/items?format=json
```

where `[DATASET_ID]` is the ID of the Actor's run dataset, in which you can find the Run object returned when starting the Actor. Alternatively, you'll find the download links for the results in Apify Console.

To skip the `#error` and `#debug` metadata fields from the results and not include empty result records, simply add the `clean=true` query parameter to the API URL, or select the **Clean items** option when downloading the dataset in Apify Console.

To get the results in other formats, set the `format` query parameter to `xml`, `xlsx`, `csv`, `html`, etc. For more information, see [Datasets](https://apify.com/docs/storage#dataset) in documentation or the [Get dataset items](https://docs.apify.com/api/v2#/reference/datasets/item-collection) endpoint in Apify API reference.

### Additional resources

If you’d like to learn more about Web Scraper or Apify’s other Actors and tools, check out these resources:

- [Cheerio Scraper](https://apify.com/apify/cheerio-scraper), another web scraping Actor that downloads and processes pages in raw HTML for much higher performance.
- [Playwright Scraper](https://apify.com/apify/playwright-scraper), a similar web scraping Actor to Web Scraper, which provides lower-level control of the underlying [Playwright](https://github.com/microsoft/playwright) library and the ability to use server-side libraries.
- [Puppeteer Scraper](https://apify.com/apify/puppeteer-scraper), an Actor similar to Web Scraper, which provides lower-level control of the underlying [Puppeteer](https://github.com/puppeteer/puppeteer) library and the ability to use server-side libraries.
- [Actors documentation](https://apify.com/docs/actor) for the Apify cloud computing platform.
- [Apify SDK documentation](https://sdk.apify.com/), where you can learn more about the tools required to run your own Apify Actors.
- [Crawlee documentation](https://crawlee.dev/?__hstc=160404322.4ff1f55e48512a0b19aa0955767abc98.1753772621636.1753781308751.1753783813114.4&__hssc=160404322.2.1753783813114&__hsfp=3081399490), how to build a new web scraping project from scratch using the world's most popular web crawling and scraping library for Node.js.

### Frequently asked questions

#### Are there any limitations to using Web Scraper?

Web Scraper is designed to be user-friendly and generic, which may affect its performance and flexibility compared to more specialized solutions. It uses a resource-intensive Chromium browser to supports client-side JavaScript code.

#### Is web scraping legal?

It is legal to scrape any non-personal data. Personal data is protected by the [GDPR](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation) in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers. You can also read our blog post on the [legality of web scraping](https://blog.apify.com/is-web-scraping-legal/).

#### Can I control the crawling behavior of Web Scraper?

Yes, you can control the crawling behavior of Web Scraper. You can specify start URLs, define link selectors, enter pseudo-URLs to guide the scraper in following specific page links, and plenty of other configurations options. This allows recursive crawling of websites or targeted extraction of data.

#### Is it possible to use proxies with Web Scraper?

Yes, you can configure proxies for Web Scraper. You have the option to use [Apify Proxy](https://apify.com/proxy), which under the free plan is set up for you. On paid plans, you can configure them yourself, or even set up your own.

#### How can I access and export the data scraped by Web Scraper?

The data scraped by Web Scraper is stored in a dataset which you can access and export in various formats such as JSON, XML, CSV, or as an Excel spreadsheet. The results can be downloaded using the Apify API or through the Apify Console.

#### Your feedback

We’re always working on improving the performance of our Actors. If you have any technical feedback for Web Scraper or found a bug, please create an issue in the [Issues tab](https://apify.com/apify/web-scraper/issues/open).

# Actor input Schema

## `runMode` (type: `string`):

This property indicates the scraper's mode of operation. In DEVELOPMENT mode, the scraper ignores page timeouts, doesn't use sessionPool, opens pages one by one and enables debugging via Chrome DevTools.  Open the live view tab or the container URL to access the debugger. Further debugging options can be configured in the Advanced configuration section. PRODUCTION mode disables debugging and enables timeouts and concurrency. <br><br>For details, see <a href='https://apify.com/apify/web-scraper#run-mode' target='_blank' rel='noopener'>Run mode</a> in README.

## `startUrls` (type: `array`):

A static list of URLs to scrape. <br><br>For details, see <a href='https://apify.com/apify/web-scraper#start-urls' target='_blank' rel='noopener'>Start URLs</a> in README.

## `keepUrlFragments` (type: `boolean`):

Indicates that URL fragments (e.g. <code>http://example.com<b>#fragment</b></code>) should be included when checking whether a URL has already been visited or not. Typically, URL fragments are used for page navigation only and therefore they should be ignored, as they don't identify separate pages. However, some single-page websites use URL fragments to display different pages; in such a case, this option should be enabled.

## `respectRobotsTxtFile` (type: `boolean`):

If enabled, the crawler will consult the robots.txt file for the target website before crawling each page. At the moment, the crawler does not use any specific user agent identifier. The crawl-delay directive is also not supported yet.

## `linkSelector` (type: `string`):

A CSS selector saying which links on the page (<code>\<a></code> elements with <code>href</code> attribute) shall be followed and added to the request queue. To filter the links added to the queue, use the <b>Pseudo-URLs</b> and/or <b>Glob patterns</b> setting.<br><br>If <b>Link selector</b> is empty, the page links are ignored.<br><br>For details, see <a href='https://apify.com/apify/web-scraper#link-selector' target='_blank' rel='noopener'>Link selector</a> in README.

## `globs` (type: `array`):

Glob patterns to match links in the page that you want to enqueue. Combine with Link selector to tell the scraper where to find links. Omitting the Glob patterns will cause the scraper to enqueue all links matched by the Link selector.

## `pseudoUrls` (type: `array`):

Specifies what kind of URLs found by <b>Link selector</b> should be added to the request queue. A pseudo-URL is a URL with regular expressions enclosed in <code>\[]</code> brackets, e.g. <code>http://www.example.com/\[.\*]</code>. <br><br>If <b>Pseudo-URLs</b> are omitted, the Actor enqueues all links matched by the <b>Link selector</b>.<br><br>For details, see <a href='https://apify.com/apify/web-scraper#pseudo-urls' target='_blank' rel='noopener'>Pseudo-URLs</a> in README.

## `excludes` (type: `array`):

Glob patterns to match links in the page that you want to exclude from being enqueued.

## `pageFunction` (type: `string`):

JavaScript (ES6) function that is executed in the context of every page loaded in the Chrome browser. Use it to scrape data from the page, perform actions or add new URLs to the request queue.<br><br>For details, see <a href='https://apify.com/apify/web-scraper#page-function' target='_blank' rel='noopener'>Page function</a> in README.

## `injectJQuery` (type: `boolean`):

If enabled, the scraper will inject the <a href='http://jquery.com' target='_blank' rel='noopener'>jQuery</a> library into every web page loaded, before <b>Page function</b> is invoked. Note that the jQuery object (<code>$</code>) will not be registered into global namespace in order to avoid conflicts with libraries used by the web page. It can only be accessed through <code>context.jQuery</code> in <b>Page function</b>.

## `proxyConfiguration` (type: `object`):

Specifies proxy servers that will be used by the scraper in order to hide its origin.<br><br>For details, see <a href='https://apify.com/apify/web-scraper#proxy-configuration' target='_blank' rel='noopener'>Proxy configuration</a> in README.

## `proxyRotation` (type: `string`):

This property indicates the strategy of proxy rotation and can only be used in conjunction with Apify Proxy. The recommended setting automatically picks the best proxies from your available pool and rotates them evenly, discarding proxies that become blocked or unresponsive. If this strategy does not work for you for any reason, you may configure the scraper to either use a new proxy for each request, or to use one proxy as long as possible, until the proxy fails. IMPORTANT: This setting will only use your available Apify Proxy pool, so if you don't have enough proxies for a given task, no rotation setting will produce satisfactory results.

## `sessionPoolName` (type: `string`):

<b>Use only english alphanumeric characters dashes and underscores.</b> A session is a representation of a user. It has it's own IP and cookies which are then used together to emulate a real user. Usage of the sessions is controlled by the Proxy rotation option. By providing a session pool name, you enable sharing of those sessions across multiple Actor runs. This is very useful when you need specific cookies for accessing the websites or when a lot of your proxies are already blocked. Instead of trying randomly, a list of working sessions will be saved and a new Actor run can reuse those sessions. Note that the IP lock on sessions expires after 24 hours, unless the session is used again in that window.

## `initialCookies` (type: `array`):

A JSON array with cookies that will be set to every Chrome browser tab opened before loading the page, in the format accepted by Puppeteer's <a href='https://pptr.dev/api/puppeteer.cookie' target='_blank' rel='noopener'><code>Page.setCookie()</code></a> function. This option is useful for transferring a logged-in session from an external web browser.

## `useChrome` (type: `boolean`):

If enabled, the scraper will use a real Chrome browser instead of Chromium bundled with Puppeteer. This option may help bypass certain anti-scraping protections, but might make the scraper unstable. Use at your own risk 🙂

## `headless` (type: `boolean`):

By default, browsers run in headless mode. You can toggle this off to run them in headful mode, which can help with certain rare anti-scraping protections but is slower and more costly.

## `ignoreSslErrors` (type: `boolean`):

If enabled, the scraper will ignore SSL/TLS certificate errors. Use at your own risk.

## `ignoreCorsAndCsp` (type: `boolean`):

If enabled, the scraper will ignore Content Security Policy (CSP) and Cross-Origin Resource Sharing (CORS) settings of visited pages and requested domains. This enables you to freely use XHR/Fetch to make HTTP requests from <b>Page function</b>.

## `downloadMedia` (type: `boolean`):

If enabled, the scraper will download media such as images, fonts, videos and sound files, as usual. Disabling this option might speed up the scrape, but certain websites could stop working correctly.

## `downloadCss` (type: `boolean`):

If enabled, the scraper will download CSS files with stylesheets, as usual. Disabling this option may speed up the scrape, but certain websites could stop working correctly, and the live view will not look as cool.

## `maxRequestRetries` (type: `integer`):

The maximum number of times the scraper will retry to load each web page on error, in case of a page load error or an exception thrown by <b>Page function</b>.<br><br>If set to <code>0</code>, the page will be considered failed right after the first error.

## `maxPagesPerCrawl` (type: `integer`):

The maximum number of pages that the scraper will load. The scraper will stop when this limit is reached. It's always a good idea to set this limit in order to prevent excess platform usage for misconfigured scrapers. Note that the actual number of pages loaded might be slightly higher than this value.<br><br>If set to <code>0</code>, there is no limit.

## `maxResultsPerCrawl` (type: `integer`):

The maximum number of records that will be saved to the resulting dataset. The scraper will stop when this limit is reached. <br><br>If set to <code>0</code>, there is no limit.

## `maxCrawlingDepth` (type: `integer`):

Specifies how many links away from <b>Start URLs</b> the scraper will descend. This value is a safeguard against infinite crawling depths for misconfigured scrapers. Note that pages added using <code>context.enqueuePage()</code> in <b>Page function</b> are not subject to the maximum depth constraint. <br><br>If set to <code>0</code>, there is no limit. To crawl only the pages specified by the Start URLs, set <a href="#linkSelector"><code>linkSelector</code></a> empty instead.

## `maxConcurrency` (type: `integer`):

Specified the maximum number of pages that can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. This option enables you to set an upper limit, for example to reduce the load on a target web server.

## `pageLoadTimeoutSecs` (type: `integer`):

The maximum amount of time the scraper will wait for a web page to load, in seconds. If the web page does not load in this timeframe, it is considered to have failed and will be retried (subject to <b>Max page retries</b>), similarly as with other page load errors.

## `pageFunctionTimeoutSecs` (type: `integer`):

The maximum amount of time the scraper will wait for <b>Page function</b> to execute, in seconds. It's a good idea to set this limit, to ensure that unexpected behavior in page function will not get the scraper stuck.

## `waitUntil` (type: `array`):

Contains a JSON array with names of page events to wait, before considering a web page fully loaded. The scraper will wait until <b>all</b> of the events are triggered in the web page before executing <b>Page function</b>. Available events are <code>domcontentloaded</code>, <code>load</code>, <code>networkidle2</code> and <code>networkidle0</code>.<br><br>For details, see <a href='https://pptr.dev/#?product=Puppeteer&show=api-pagegotourl-options' target='_blank' rel='noopener'><code>waitUntil</code> option</a> in Puppeteer's <code>Page.goto()</code> function documentation.

## `preNavigationHooks` (type: `string`):

Async functions that are sequentially evaluated before the navigation. Good for setting additional cookies or browser properties before navigation. The function accepts two parameters, `crawlingContext` and `gotoOptions`, which are passed to the `page.goto()` function the crawler calls to navigate.

## `postNavigationHooks` (type: `string`):

Async functions that are sequentially evaluated after the navigation. Good for checking if the navigation was successful. The function accepts `crawlingContext` as the only parameter.

## `breakpointLocation` (type: `string`):

This property has no effect if Run mode is set to PRODUCTION. When set to DEVELOPMENT it inserts a breakpoint at the selected location in every page the scraper visits. Execution of code stops at the breakpoint until manually resumed in the DevTools window accessible via Live View tab or Container URL. Additional breakpoints can be added by adding <code>debugger;</code> statements within your Page function. <br><br>See <a href='https://apify.com/apify/web-scraper#run-mode' target='_blank' rel='noopener'>Run mode</a> in README for details.

## `closeCookieModals` (type: `boolean`):

Using the [I don't care about cookies](https://addons.mozilla.org/en-US/firefox/addon/i-dont-care-about-cookies/) browser extension. When on, the crawler will automatically try to dismiss cookie consent modals. This can be useful when crawling European websites that show cookie consent modals.

## `maxScrollHeightPixels` (type: `integer`):

The crawler will scroll down the page until all content is loaded or the maximum scrolling distance is reached. Setting this to `0` disables scrolling altogether.

## `debugLog` (type: `boolean`):

If enabled, the Actor log will include debug messages. Beware that this can be quite verbose. Use <code>context.log.debug('message')</code> to log your own debug messages from <b>Page function</b>.

## `browserLog` (type: `boolean`):

If enabled, the Actor log will include console messages produced by JavaScript executed by the web pages (e.g. using <code>console.log()</code>). Beware that this may result in the log being flooded by error messages, warnings and other messages of little value, especially with high concurrency.

## `customData` (type: `object`):

A custom JSON object that is passed to <b>Page function</b> as <code>context.customData</code>. This setting is useful when invoking the scraper via API, in order to pass some arbitrary parameters to your code.

## `datasetName` (type: `string`):

Name or ID of the dataset that will be used for storing results. If left empty, the default dataset of the run will be used.

## `keyValueStoreName` (type: `string`):

Name or ID of the key-value store that will be used for storing records. If left empty, the default key-value store of the run will be used.

## `requestQueueName` (type: `string`):

Name of the request queue that will be used for storing requests. If left empty, the default request queue of the run will be used.

## Actor input object example

```json
{
  "runMode": "DEVELOPMENT",
  "startUrls": [
    {
      "url": "https://crawlee.dev/js"
    }
  ],
  "keepUrlFragments": false,
  "respectRobotsTxtFile": true,
  "linkSelector": "a[href]",
  "globs": [
    {
      "glob": "https://crawlee.dev/js/*/*"
    }
  ],
  "pseudoUrls": [],
  "excludes": [
    {
      "glob": "/**/*.{png,jpg,jpeg,pdf}"
    }
  ],
  "pageFunction": "// The function accepts a single argument: the \"context\" object.\n// For a complete list of its properties and functions,\n// see https://apify.com/apify/web-scraper#page-function \nasync function pageFunction(context) {\n    // This statement works as a breakpoint when you're trying to debug your code. Works only with Run mode: DEVELOPMENT!\n    // debugger; \n\n    // jQuery is handy for finding DOM elements and extracting data from them.\n    // To use it, make sure to enable the \"Inject jQuery\" option.\n    const $ = context.jQuery;\n    const pageTitle = $('title').first().text();\n    const h1 = $('h1').first().text();\n    const first_h2 = $('h2').first().text();\n    const random_text_from_the_page = $('p').first().text();\n\n\n    // Print some information to Actor log\n    context.log.info(`URL: ${context.request.url}, TITLE: ${pageTitle}`);\n\n    // Manually add a new page to the queue for scraping.\n   await context.enqueueRequest({ url: 'http://www.example.com' });\n\n    // Return an object with the data extracted from the page.\n    // It will be stored to the resulting dataset.\n    return {\n        url: context.request.url,\n        pageTitle,\n        h1,\n        first_h2,\n        random_text_from_the_page\n    };\n}",
  "injectJQuery": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "proxyRotation": "RECOMMENDED",
  "initialCookies": [],
  "useChrome": false,
  "headless": true,
  "ignoreSslErrors": false,
  "ignoreCorsAndCsp": false,
  "downloadMedia": true,
  "downloadCss": true,
  "maxRequestRetries": 3,
  "maxPagesPerCrawl": 0,
  "maxResultsPerCrawl": 0,
  "maxCrawlingDepth": 0,
  "maxConcurrency": 50,
  "pageLoadTimeoutSecs": 60,
  "pageFunctionTimeoutSecs": 60,
  "waitUntil": [
    "networkidle2"
  ],
  "preNavigationHooks": "// We need to return array of (possibly async) functions here.\n// The functions accept two arguments: the \"crawlingContext\" object\n// and \"gotoOptions\".\n[\n    async (crawlingContext, gotoOptions) => {\n        // ...\n    },\n]\n",
  "postNavigationHooks": "// We need to return array of (possibly async) functions here.\n// The functions accept a single argument: the \"crawlingContext\" object.\n[\n    async (crawlingContext) => {\n        // ...\n    },\n]",
  "breakpointLocation": "NONE",
  "closeCookieModals": false,
  "maxScrollHeightPixels": 5000,
  "debugLog": false,
  "browserLog": false,
  "customData": {}
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "runMode": "DEVELOPMENT",
    "startUrls": [
        {
            "url": "https://crawlee.dev/js"
        }
    ],
    "respectRobotsTxtFile": true,
    "linkSelector": "a[href]",
    "globs": [
        {
            "glob": "https://crawlee.dev/js/*/*"
        }
    ],
    "pseudoUrls": [],
    "excludes": [
        {
            "glob": "/**/*.{png,jpg,jpeg,pdf}"
        }
    ],
    "pageFunction": `// The function accepts a single argument: the "context" object.
// For a complete list of its properties and functions,
// see https://apify.com/apify/web-scraper#page-function 
async function pageFunction(context) {
    // This statement works as a breakpoint when you're trying to debug your code. Works only with Run mode: DEVELOPMENT!
    // debugger; 

    // jQuery is handy for finding DOM elements and extracting data from them.
    // To use it, make sure to enable the "Inject jQuery" option.
    const $ = context.jQuery;
    const pageTitle = $('title').first().text();
    const h1 = $('h1').first().text();
    const first_h2 = $('h2').first().text();
    const random_text_from_the_page = $('p').first().text();


    // Print some information to Actor log
    context.log.info(`URL: ${context.request.url}, TITLE: ${pageTitle}`);

    // Manually add a new page to the queue for scraping.
   await context.enqueueRequest({ url: 'http://www.example.com' });

    // Return an object with the data extracted from the page.
    // It will be stored to the resulting dataset.
    return {
        url: context.request.url,
        pageTitle,
        h1,
        first_h2,
        random_text_from_the_page
    };
}`,
    "proxyConfiguration": {
        "useApifyProxy": true
    },
    "initialCookies": [],
    "waitUntil": [
        "networkidle2"
    ],
    "preNavigationHooks": `// We need to return array of (possibly async) functions here.
// The functions accept two arguments: the "crawlingContext" object
// and "gotoOptions".
[
    async (crawlingContext, gotoOptions) => {
        // ...
    },
]`,
    "postNavigationHooks": `// We need to return array of (possibly async) functions here.
// The functions accept a single argument: the "crawlingContext" object.
[
    async (crawlingContext) => {
        // ...
    },
]`,
    "breakpointLocation": "NONE",
    "customData": {}
};

// Run the Actor and wait for it to finish
const run = await client.actor("apify/web-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "runMode": "DEVELOPMENT",
    "startUrls": [{ "url": "https://crawlee.dev/js" }],
    "respectRobotsTxtFile": True,
    "linkSelector": "a[href]",
    "globs": [{ "glob": "https://crawlee.dev/js/*/*" }],
    "pseudoUrls": [],
    "excludes": [{ "glob": "/**/*.{png,jpg,jpeg,pdf}" }],
    "pageFunction": """// The function accepts a single argument: the \"context\" object.
// For a complete list of its properties and functions,
// see https://apify.com/apify/web-scraper#page-function 
async function pageFunction(context) {
    // This statement works as a breakpoint when you're trying to debug your code. Works only with Run mode: DEVELOPMENT!
    // debugger; 

    // jQuery is handy for finding DOM elements and extracting data from them.
    // To use it, make sure to enable the \"Inject jQuery\" option.
    const $ = context.jQuery;
    const pageTitle = $('title').first().text();
    const h1 = $('h1').first().text();
    const first_h2 = $('h2').first().text();
    const random_text_from_the_page = $('p').first().text();


    // Print some information to Actor log
    context.log.info(`URL: ${context.request.url}, TITLE: ${pageTitle}`);

    // Manually add a new page to the queue for scraping.
   await context.enqueueRequest({ url: 'http://www.example.com' });

    // Return an object with the data extracted from the page.
    // It will be stored to the resulting dataset.
    return {
        url: context.request.url,
        pageTitle,
        h1,
        first_h2,
        random_text_from_the_page
    };
}""",
    "proxyConfiguration": { "useApifyProxy": True },
    "initialCookies": [],
    "waitUntil": ["networkidle2"],
    "preNavigationHooks": """// We need to return array of (possibly async) functions here.
// The functions accept two arguments: the \"crawlingContext\" object
// and \"gotoOptions\".
[
    async (crawlingContext, gotoOptions) => {
        // ...
    },
]
""",
    "postNavigationHooks": """// We need to return array of (possibly async) functions here.
// The functions accept a single argument: the \"crawlingContext\" object.
[
    async (crawlingContext) => {
        // ...
    },
]""",
    "breakpointLocation": "NONE",
    "customData": {},
}

# Run the Actor and wait for it to finish
run = client.actor("apify/web-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "runMode": "DEVELOPMENT",
  "startUrls": [
    {
      "url": "https://crawlee.dev/js"
    }
  ],
  "respectRobotsTxtFile": true,
  "linkSelector": "a[href]",
  "globs": [
    {
      "glob": "https://crawlee.dev/js/*/*"
    }
  ],
  "pseudoUrls": [],
  "excludes": [
    {
      "glob": "/**/*.{png,jpg,jpeg,pdf}"
    }
  ],
  "pageFunction": "// The function accepts a single argument: the \\"context\\" object.\\n// For a complete list of its properties and functions,\\n// see https://apify.com/apify/web-scraper#page-function \\nasync function pageFunction(context) {\\n    // This statement works as a breakpoint when you'\''re trying to debug your code. Works only with Run mode: DEVELOPMENT!\\n    // debugger; \\n\\n    // jQuery is handy for finding DOM elements and extracting data from them.\\n    // To use it, make sure to enable the \\"Inject jQuery\\" option.\\n    const $ = context.jQuery;\\n    const pageTitle = $('\''title'\'').first().text();\\n    const h1 = $('\''h1'\'').first().text();\\n    const first_h2 = $('\''h2'\'').first().text();\\n    const random_text_from_the_page = $('\''p'\'').first().text();\\n\\n\\n    // Print some information to Actor log\\n    context.log.info(`URL: ${context.request.url}, TITLE: ${pageTitle}`);\\n\\n    // Manually add a new page to the queue for scraping.\\n   await context.enqueueRequest({ url: '\''http://www.example.com'\'' });\\n\\n    // Return an object with the data extracted from the page.\\n    // It will be stored to the resulting dataset.\\n    return {\\n        url: context.request.url,\\n        pageTitle,\\n        h1,\\n        first_h2,\\n        random_text_from_the_page\\n    };\\n}",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "initialCookies": [],
  "waitUntil": [
    "networkidle2"
  ],
  "preNavigationHooks": "// We need to return array of (possibly async) functions here.\\n// The functions accept two arguments: the \\"crawlingContext\\" object\\n// and \\"gotoOptions\\".\\n[\\n    async (crawlingContext, gotoOptions) => {\\n        // ...\\n    },\\n]\\n",
  "postNavigationHooks": "// We need to return array of (possibly async) functions here.\\n// The functions accept a single argument: the \\"crawlingContext\\" object.\\n[\\n    async (crawlingContext) => {\\n        // ...\\n    },\\n]",
  "breakpointLocation": "NONE",
  "customData": {}
}' |
apify call apify/web-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=apify/web-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Web Scraper",
        "description": "Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.",
        "version": "3.0",
        "x-build-id": "7vMcHq1ZhcmGsRp8K"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/apify~web-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-apify-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/apify~web-scraper/runs": {
            "post": {
                "operationId": "runs-sync-apify-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/apify~web-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-apify-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls",
                    "pageFunction",
                    "proxyConfiguration"
                ],
                "properties": {
                    "runMode": {
                        "title": "Run mode",
                        "enum": [
                            "PRODUCTION",
                            "DEVELOPMENT"
                        ],
                        "type": "string",
                        "description": "This property indicates the scraper's mode of operation. In DEVELOPMENT mode, the scraper ignores page timeouts, doesn't use sessionPool, opens pages one by one and enables debugging via Chrome DevTools.  Open the live view tab or the container URL to access the debugger. Further debugging options can be configured in the Advanced configuration section. PRODUCTION mode disables debugging and enables timeouts and concurrency. <br><br>For details, see <a href='https://apify.com/apify/web-scraper#run-mode' target='_blank' rel='noopener'>Run mode</a> in README.",
                        "default": "PRODUCTION"
                    },
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "A static list of URLs to scrape. <br><br>For details, see <a href='https://apify.com/apify/web-scraper#start-urls' target='_blank' rel='noopener'>Start URLs</a> in README.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "keepUrlFragments": {
                        "title": "URL #fragments identify unique pages",
                        "type": "boolean",
                        "description": "Indicates that URL fragments (e.g. <code>http://example.com<b>#fragment</b></code>) should be included when checking whether a URL has already been visited or not. Typically, URL fragments are used for page navigation only and therefore they should be ignored, as they don't identify separate pages. However, some single-page websites use URL fragments to display different pages; in such a case, this option should be enabled.",
                        "default": false
                    },
                    "respectRobotsTxtFile": {
                        "title": "Respect the robots.txt file",
                        "type": "boolean",
                        "description": "If enabled, the crawler will consult the robots.txt file for the target website before crawling each page. At the moment, the crawler does not use any specific user agent identifier. The crawl-delay directive is also not supported yet.",
                        "default": false
                    },
                    "linkSelector": {
                        "title": "Link selector",
                        "type": "string",
                        "description": "A CSS selector saying which links on the page (<code>&lt;a&gt;</code> elements with <code>href</code> attribute) shall be followed and added to the request queue. To filter the links added to the queue, use the <b>Pseudo-URLs</b> and/or <b>Glob patterns</b> setting.<br><br>If <b>Link selector</b> is empty, the page links are ignored.<br><br>For details, see <a href='https://apify.com/apify/web-scraper#link-selector' target='_blank' rel='noopener'>Link selector</a> in README."
                    },
                    "globs": {
                        "title": "Glob Patterns",
                        "type": "array",
                        "description": "Glob patterns to match links in the page that you want to enqueue. Combine with Link selector to tell the scraper where to find links. Omitting the Glob patterns will cause the scraper to enqueue all links matched by the Link selector.",
                        "default": [],
                        "items": {
                            "type": "object",
                            "required": [
                                "glob"
                            ],
                            "properties": {
                                "glob": {
                                    "type": "string",
                                    "title": "Glob of a web page"
                                }
                            }
                        }
                    },
                    "pseudoUrls": {
                        "title": "Pseudo-URLs",
                        "type": "array",
                        "description": "Specifies what kind of URLs found by <b>Link selector</b> should be added to the request queue. A pseudo-URL is a URL with regular expressions enclosed in <code>[]</code> brackets, e.g. <code>http://www.example.com/[.*]</code>. <br><br>If <b>Pseudo-URLs</b> are omitted, the Actor enqueues all links matched by the <b>Link selector</b>.<br><br>For details, see <a href='https://apify.com/apify/web-scraper#pseudo-urls' target='_blank' rel='noopener'>Pseudo-URLs</a> in README.",
                        "default": [],
                        "items": {
                            "type": "object",
                            "required": [
                                "purl"
                            ],
                            "properties": {
                                "purl": {
                                    "type": "string",
                                    "title": "Pseudo-URL of a web page"
                                }
                            }
                        }
                    },
                    "excludes": {
                        "title": "Exclude Glob Patterns",
                        "type": "array",
                        "description": "Glob patterns to match links in the page that you want to exclude from being enqueued.",
                        "default": [],
                        "items": {
                            "type": "object",
                            "required": [
                                "glob"
                            ],
                            "properties": {
                                "glob": {
                                    "type": "string",
                                    "title": "Glob of a web page"
                                }
                            }
                        }
                    },
                    "pageFunction": {
                        "title": "Page function",
                        "type": "string",
                        "description": "JavaScript (ES6) function that is executed in the context of every page loaded in the Chrome browser. Use it to scrape data from the page, perform actions or add new URLs to the request queue.<br><br>For details, see <a href='https://apify.com/apify/web-scraper#page-function' target='_blank' rel='noopener'>Page function</a> in README."
                    },
                    "injectJQuery": {
                        "title": "Inject jQuery",
                        "type": "boolean",
                        "description": "If enabled, the scraper will inject the <a href='http://jquery.com' target='_blank' rel='noopener'>jQuery</a> library into every web page loaded, before <b>Page function</b> is invoked. Note that the jQuery object (<code>$</code>) will not be registered into global namespace in order to avoid conflicts with libraries used by the web page. It can only be accessed through <code>context.jQuery</code> in <b>Page function</b>.",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Specifies proxy servers that will be used by the scraper in order to hide its origin.<br><br>For details, see <a href='https://apify.com/apify/web-scraper#proxy-configuration' target='_blank' rel='noopener'>Proxy configuration</a> in README.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "proxyRotation": {
                        "title": "Proxy rotation",
                        "enum": [
                            "RECOMMENDED",
                            "PER_REQUEST",
                            "UNTIL_FAILURE"
                        ],
                        "type": "string",
                        "description": "This property indicates the strategy of proxy rotation and can only be used in conjunction with Apify Proxy. The recommended setting automatically picks the best proxies from your available pool and rotates them evenly, discarding proxies that become blocked or unresponsive. If this strategy does not work for you for any reason, you may configure the scraper to either use a new proxy for each request, or to use one proxy as long as possible, until the proxy fails. IMPORTANT: This setting will only use your available Apify Proxy pool, so if you don't have enough proxies for a given task, no rotation setting will produce satisfactory results.",
                        "default": "RECOMMENDED"
                    },
                    "sessionPoolName": {
                        "title": "Session pool name",
                        "pattern": "[0-9A-z-]",
                        "minLength": 3,
                        "maxLength": 200,
                        "type": "string",
                        "description": "<b>Use only english alphanumeric characters dashes and underscores.</b> A session is a representation of a user. It has it's own IP and cookies which are then used together to emulate a real user. Usage of the sessions is controlled by the Proxy rotation option. By providing a session pool name, you enable sharing of those sessions across multiple Actor runs. This is very useful when you need specific cookies for accessing the websites or when a lot of your proxies are already blocked. Instead of trying randomly, a list of working sessions will be saved and a new Actor run can reuse those sessions. Note that the IP lock on sessions expires after 24 hours, unless the session is used again in that window."
                    },
                    "initialCookies": {
                        "title": "Initial cookies",
                        "type": "array",
                        "description": "A JSON array with cookies that will be set to every Chrome browser tab opened before loading the page, in the format accepted by Puppeteer's <a href='https://pptr.dev/api/puppeteer.cookie' target='_blank' rel='noopener'><code>Page.setCookie()</code></a> function. This option is useful for transferring a logged-in session from an external web browser.",
                        "default": []
                    },
                    "useChrome": {
                        "title": "Use Chrome",
                        "type": "boolean",
                        "description": "If enabled, the scraper will use a real Chrome browser instead of Chromium bundled with Puppeteer. This option may help bypass certain anti-scraping protections, but might make the scraper unstable. Use at your own risk 🙂",
                        "default": false
                    },
                    "headless": {
                        "title": "Run browsers in headless mode",
                        "type": "boolean",
                        "description": "By default, browsers run in headless mode. You can toggle this off to run them in headful mode, which can help with certain rare anti-scraping protections but is slower and more costly.",
                        "default": true
                    },
                    "ignoreSslErrors": {
                        "title": "Ignore SSL errors",
                        "type": "boolean",
                        "description": "If enabled, the scraper will ignore SSL/TLS certificate errors. Use at your own risk.",
                        "default": false
                    },
                    "ignoreCorsAndCsp": {
                        "title": "Ignore CORS and CSP",
                        "type": "boolean",
                        "description": "If enabled, the scraper will ignore Content Security Policy (CSP) and Cross-Origin Resource Sharing (CORS) settings of visited pages and requested domains. This enables you to freely use XHR/Fetch to make HTTP requests from <b>Page function</b>.",
                        "default": false
                    },
                    "downloadMedia": {
                        "title": "Download media files",
                        "type": "boolean",
                        "description": "If enabled, the scraper will download media such as images, fonts, videos and sound files, as usual. Disabling this option might speed up the scrape, but certain websites could stop working correctly.",
                        "default": true
                    },
                    "downloadCss": {
                        "title": "Download CSS files",
                        "type": "boolean",
                        "description": "If enabled, the scraper will download CSS files with stylesheets, as usual. Disabling this option may speed up the scrape, but certain websites could stop working correctly, and the live view will not look as cool.",
                        "default": true
                    },
                    "maxRequestRetries": {
                        "title": "Max page retries",
                        "minimum": 0,
                        "type": "integer",
                        "description": "The maximum number of times the scraper will retry to load each web page on error, in case of a page load error or an exception thrown by <b>Page function</b>.<br><br>If set to <code>0</code>, the page will be considered failed right after the first error.",
                        "default": 3
                    },
                    "maxPagesPerCrawl": {
                        "title": "Max pages per run",
                        "minimum": 0,
                        "type": "integer",
                        "description": "The maximum number of pages that the scraper will load. The scraper will stop when this limit is reached. It's always a good idea to set this limit in order to prevent excess platform usage for misconfigured scrapers. Note that the actual number of pages loaded might be slightly higher than this value.<br><br>If set to <code>0</code>, there is no limit.",
                        "default": 0
                    },
                    "maxResultsPerCrawl": {
                        "title": "Max result records",
                        "minimum": 0,
                        "type": "integer",
                        "description": "The maximum number of records that will be saved to the resulting dataset. The scraper will stop when this limit is reached. <br><br>If set to <code>0</code>, there is no limit.",
                        "default": 0
                    },
                    "maxCrawlingDepth": {
                        "title": "Max crawling depth",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Specifies how many links away from <b>Start URLs</b> the scraper will descend. This value is a safeguard against infinite crawling depths for misconfigured scrapers. Note that pages added using <code>context.enqueuePage()</code> in <b>Page function</b> are not subject to the maximum depth constraint. <br><br>If set to <code>0</code>, there is no limit. To crawl only the pages specified by the Start URLs, set <a href=\"#linkSelector\"><code>linkSelector</code></a> empty instead.",
                        "default": 0
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Specified the maximum number of pages that can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. This option enables you to set an upper limit, for example to reduce the load on a target web server.",
                        "default": 50
                    },
                    "pageLoadTimeoutSecs": {
                        "title": "Page load timeout",
                        "minimum": 1,
                        "type": "integer",
                        "description": "The maximum amount of time the scraper will wait for a web page to load, in seconds. If the web page does not load in this timeframe, it is considered to have failed and will be retried (subject to <b>Max page retries</b>), similarly as with other page load errors.",
                        "default": 60
                    },
                    "pageFunctionTimeoutSecs": {
                        "title": "Page function timeout",
                        "minimum": 1,
                        "type": "integer",
                        "description": "The maximum amount of time the scraper will wait for <b>Page function</b> to execute, in seconds. It's a good idea to set this limit, to ensure that unexpected behavior in page function will not get the scraper stuck.",
                        "default": 60
                    },
                    "waitUntil": {
                        "title": "Navigation waits until",
                        "type": "array",
                        "description": "Contains a JSON array with names of page events to wait, before considering a web page fully loaded. The scraper will wait until <b>all</b> of the events are triggered in the web page before executing <b>Page function</b>. Available events are <code>domcontentloaded</code>, <code>load</code>, <code>networkidle2</code> and <code>networkidle0</code>.<br><br>For details, see <a href='https://pptr.dev/#?product=Puppeteer&show=api-pagegotourl-options' target='_blank' rel='noopener'><code>waitUntil</code> option</a> in Puppeteer's <code>Page.goto()</code> function documentation.",
                        "default": [
                            "networkidle2"
                        ]
                    },
                    "preNavigationHooks": {
                        "title": "Pre-navigation hooks",
                        "type": "string",
                        "description": "Async functions that are sequentially evaluated before the navigation. Good for setting additional cookies or browser properties before navigation. The function accepts two parameters, `crawlingContext` and `gotoOptions`, which are passed to the `page.goto()` function the crawler calls to navigate."
                    },
                    "postNavigationHooks": {
                        "title": "Post-navigation hooks",
                        "type": "string",
                        "description": "Async functions that are sequentially evaluated after the navigation. Good for checking if the navigation was successful. The function accepts `crawlingContext` as the only parameter."
                    },
                    "breakpointLocation": {
                        "title": "Insert breakpoint",
                        "enum": [
                            "NONE",
                            "BEFORE_GOTO",
                            "BEFORE_PAGE_FUNCTION",
                            "AFTER_PAGE_FUNCTION"
                        ],
                        "type": "string",
                        "description": "This property has no effect if Run mode is set to PRODUCTION. When set to DEVELOPMENT it inserts a breakpoint at the selected location in every page the scraper visits. Execution of code stops at the breakpoint until manually resumed in the DevTools window accessible via Live View tab or Container URL. Additional breakpoints can be added by adding <code>debugger;</code> statements within your Page function. <br><br>See <a href='https://apify.com/apify/web-scraper#run-mode' target='_blank' rel='noopener'>Run mode</a> in README for details.",
                        "default": "NONE"
                    },
                    "closeCookieModals": {
                        "title": "Dismiss cookie modals",
                        "type": "boolean",
                        "description": "Using the [I don't care about cookies](https://addons.mozilla.org/en-US/firefox/addon/i-dont-care-about-cookies/) browser extension. When on, the crawler will automatically try to dismiss cookie consent modals. This can be useful when crawling European websites that show cookie consent modals.",
                        "default": false
                    },
                    "maxScrollHeightPixels": {
                        "title": "Maximum scrolling distance in pixels",
                        "type": "integer",
                        "description": "The crawler will scroll down the page until all content is loaded or the maximum scrolling distance is reached. Setting this to `0` disables scrolling altogether.",
                        "default": 5000
                    },
                    "debugLog": {
                        "title": "Debug log",
                        "type": "boolean",
                        "description": "If enabled, the Actor log will include debug messages. Beware that this can be quite verbose. Use <code>context.log.debug('message')</code> to log your own debug messages from <b>Page function</b>.",
                        "default": false
                    },
                    "browserLog": {
                        "title": "Browser log",
                        "type": "boolean",
                        "description": "If enabled, the Actor log will include console messages produced by JavaScript executed by the web pages (e.g. using <code>console.log()</code>). Beware that this may result in the log being flooded by error messages, warnings and other messages of little value, especially with high concurrency.",
                        "default": false
                    },
                    "customData": {
                        "title": "Custom data",
                        "type": "object",
                        "description": "A custom JSON object that is passed to <b>Page function</b> as <code>context.customData</code>. This setting is useful when invoking the scraper via API, in order to pass some arbitrary parameters to your code.",
                        "default": {}
                    },
                    "datasetName": {
                        "title": "Dataset name",
                        "type": "string",
                        "description": "Name or ID of the dataset that will be used for storing results. If left empty, the default dataset of the run will be used."
                    },
                    "keyValueStoreName": {
                        "title": "Key-value store name",
                        "type": "string",
                        "description": "Name or ID of the key-value store that will be used for storing records. If left empty, the default key-value store of the run will be used."
                    },
                    "requestQueueName": {
                        "title": "Request queue name",
                        "type": "string",
                        "description": "Name of the request queue that will be used for storing requests. If left empty, the default request queue of the run will be used."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
