# Web Scraper Experimental Debug (`mtrunkat/web-scraper-experimental-dbgr`) Actor

Experimental version of Apify Web Scraper with Chrome debugger integrated

- **URL**: https://apify.com/mtrunkat/web-scraper-experimental-dbgr.md
- **Developed by:** [Marek Trunkát](https://apify.com/mtrunkat) (community)
- **Categories:** Developer tools, Open source
- **Stats:** 88 total users, 1 monthly users, 100.0% runs succeeded, 3 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Experimental version of [Apify Web Scraper](https://apify.com/apify/web-scraper) with Chrome debugger integrated

<!-- toc -->

- [How it works](#how-it-works)
- [Getting Started](#getting-started)
- [Input](#input)
- [Page function](#page-function)
- [`context`](#context)
  * [Data structures](#data-structures)
  * [Functions](#functions)
  * [Class instances and namespaces](#class-instances-and-namespaces)
    + [Request](#request)
    + [Response](#response)
    + [Global Store](#global-store)
    + [Log](#log)
    + [Underscore](#underscore)
- [Output](#output)
  * [Dataset](#dataset)

<!-- tocstop -->

### How it works
Web Scraper is a ready-made solution for scraping the web using the Chrome browser. It takes away all
the work necessary to set up a browser for crawling, controls the browser automatically and produces
machine readable results in several common formats.

Underneath, it uses the [Puppeteer](https://github.com/GoogleChrome/puppeteer/) library to control
the browser, but you don't need to worry about that. Using a simple web UI and a little of basic
JavaScript, you can tweak it to serve almost any scraping need.

### Getting Started
If you're new to scraping or Apify, be sure to [visit our tutorial](https://apify.com/docs/scraping/web-scraper-tutorial)
to walk you through creating your first scraping task step by step.

### Input
Input is provided via the pre-configured UI. See the tooltips for more info on the available options.

### Page function
Page function is a single JavaScript function that enables the user to control the Scraper's operation,
manipulate the visited pages and extract data as needed. It is invoked with a `context` object
containing the following properties:

```js
const context = {
    // USEFUL DATA
    input, // Unaltered original input as parsed from the UI
    env, // Contains information about the run such as actorId or runId
    customData, // Value of the 'Custom data' scraper option.

    // EXPOSED OBJECTS
    request, // Apify.Request object.
    response, // Response object holding the status code and headers.
    globalStore, // Represents an in memory store that can be used to share data across pageFunction invocations.
    log, // Reference to Apify.utils.log
    underscoreJs, // A reference to the Underscore _ object (if Inject Underscore was used).

    // EXPOSED FUNCTIONS
    setValue, // Reference to the Apify.setValue() function.
    getValue, // Reference to the Apify.getValue() function.
    saveSnapshot, // Saves a screenshot and full HTML of the current page to the key value store.
    waitFor, // Helps with handling dynamic content by waiting for time, selector or function.
    skipLinks, // Prevents enqueueing more links via Pseudo URLs on the current page.
    enqueueRequest, // Adds a page to the request queue.
    jQuery, // A reference to the jQuery $ function (if Inject JQuery was used).

}
````

### `context`

The following tables describe the `context` object in more detail.

#### Data structures

<table>
<thead>
    <tr><td>Argument</td><td>Type</td></tr>
</thead>
<tbody>
    <tr><td><code>input</code></td><td><code>Object</code></td></tr>
    <tr><td colspan="2">
        Input as it was received from the UI. Each <code>pageFunction</code> invocation gets a fresh
        copy and you can not modify the input by changing the values in this object.
    </td></tr>
    <tr><td><code>env</code></td><td><code>Object</code></td></tr>
    <tr><td colspan="2">
        A map of all the relevant environment variables that you may want to use. See the
        <a href="https://sdk.apify.com/docs/api/apify#apifygetenv-code-object-code" target="_blank"><code>Apify.getEnv()</code></a>
        function for a preview of the structure and full documentation.
    </td></tr>
    <tr><td><code>customData</code></td><td><code>Object</code></td></tr>
    <tr><td colspan="2">
        Since the input UI is fixed, it does not support adding of other fields that may be needed for all
        specific use cases. If you need to pass arbitrary data to the scraper, use the Custom data input field
        and its contents will be available under the <code>customData</code> context key.
    </td></tr>
</tbody>
</table>

#### Functions

The `context` object provides several helper functions that make scraping and saving data easier
and more streamlined. All of the functions are `async` so make sure to use `await` with their invocations.

<table>
<thead>
    <tr><td>Argument</td><td>Arguments</td></tr>
</thead>
<tbody>
    <tr><td><code>setValue</code></td><td><code>(key: string, data: Object, options: Object)</code></td></tr>
    <tr><td colspan="2">
        To save data to the default key-value store, you can use the <code>setValue</code> function.
        See the full documentation:
        <a href="https://sdk.apify.com/docs/api/apify#apifysetvaluekey-value-options-code-promise-code" target="_blank">
            <code>Apify.setValue()</code>
        </a> function.
    </td></tr>
    <tr><td><code>getValue</code></td><td><code>(key: string)</code></td></tr>
    <tr><td colspan="2">
        To read data from the default key-value store, you can use the <code>getValue</code> function.
        See the full documentation:
        <a href="https://sdk.apify.com/docs/api/apify#apifygetvaluekey-value-options-code-promise-code" target="_blank">
            <code>Apify.getValue()</code>
        </a> function.
    </td></tr>
    <tr><td><code>waitFor</code></td><td><code>(task: number|string|Function, options: Object)</code></td></tr>
    <tr><td colspan="2">
        The <code>waitFor</code> function enables you to wait
        for various events in the scraped page. The first argument determines its behavior.
        If you use a <code>number</code>, such as <code>await waitFor(1000)</code>, it will wait for the provided
        number of milliseconds. The other option is using a CSS selector <code>string</code>
        which will make the function wait until the given selector appears in the page. The final option
        is to use a <code>Function</code>. In that case, it will wait until the provided function returns
        <code>true</code>.
    <tr><td><code>saveSnapshot</code></td><td></td></tr>
    <tr><td colspan="2">
        A helper function that enables saving a snapshot of the current page's HTML and its screenshot
        into the default key value store. Each snapshot overwrites the previous one and the function's
        invocations will also be throttled if invoked more than once in 2 seconds, to prevent abuse.
        So make sure you don't call it for every single request. You can find the screenshot under
        the SNAPSHOT-SCREENSHOT key and the HTML under the SNAPSHOT-HTML key.
    </td></tr>
    <tr><td><code>skipLinks</code></td><td></td></tr>
    <tr><td colspan="2">
        With each invocation of the <code>pageFunction</code> the scraper attempts to extract
        new URLs from the page using the Link selector and PseudoURLs provided in the input UI.
        If you want to prevent this behavior in certain cases, call the <code>skipLinks</code>
        function and no URLs will be added to the queue for the given page.
    </td></tr>
    <tr><td><code>enqueueRequest</code></td><td><code>(request: Request|Object, options: Object)</code></td></tr>
    <tr><td colspan="2">
        To enqueue a specific URL manually instead of automatically by a combination of a Link selector
        and a Pseudo URL, use the <code>enqueueRequest</code> function. It accepts a plain object as argument
        that needs to have the structure to construct a
        <a href="https://sdk.apify.com/docs/api/request" target="_blank"><code>Request</code></a> object.
        But frankly, you just need a URL: <code>{ url: 'https://www.example.com }</code>
    </td></tr>
    <tr><td><code>jQuery</code></td><td>see jQuery docs</td></tr>
    <tr><td colspan="2">
        To make the DOM manipulation within the page easier, you may choose the Inject jQuery
        option in the UI and all the crawled pages will have an instance of the
        <a href="https://api.jquery.com/" target="_blank"><code>jQuery</code></a> library
        available. However, since we do not want to modify the page in any way, we don't inject it
        into the global <code>$</code> object as you may be used to, but instead we make it available
        in <code>context</code>. Feel free to <code>const $ = context.jQuery</code> to get the familiar notation.
    </td></tr>
</tbody>
</table>

#### Class instances and namespaces

The following are either class instances or namespaces, which is just a way of saying objects
with functions on them.

##### Request

Apify uses a `request` object to represent metadata about the currently crawled page,
such as its URL or the number of retries. See the <a href="https://sdk.apify.com/docs/api/request" target="_blank"><code>Request</code></a>
class for a preview of the structure and full documentation.

##### Response

The `response` object is produced by Puppeteer. Currently, we only pass the HTTP status code
and the response headers to the `context`.

##### Global Store

`globalStore` represents an instance of a very simple in memory store that is not scoped to the individual
`pageFunction` invocation. This enables you to easily share global data such as API responses, tokens and other.
Since the stored data need to cross from the Browser to the Node.js process, it cannot be any kind of data,
but only JSON stringifiable objects. You cannot store DOM objects, functions, circular objects and so on.

`globalStore` supports the full <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map" target="_blank"> <code>Map</code> API </a>, with the following limitations:

- All methods of `globalStore` are `async`. Use `await`.
- Only `string` keys can be used and the values need to be JSON stringifiable.
- `map.forEach()` is not supported.

##### Log

`log` is a reference to <a href="https://sdk.apify.com/docs/api/log" target="_blank"><code>Apify.utils.log</code></a>.
You can use any of the logging methods such as <code>log.info</code> or <code>log.exception</code>. <code>log.debug</code> is special, because you can trigger visibility of those messages in the
scraper's Log by the provided **Debug log** input option.

##### Underscore

<a href="https://underscorejs.org/" target="_blank">Underscore</a> is a helper library.
You can use it in your `pageFunction` if you use the **Inject Underscore** input option.

### Output

Output is a dataset containing extracted data for each scraped page. To save data into
the dataset, return an `Object` or an `Object[]` from the `pageFunction`.

#### Dataset

For each of the scraped URLs, the dataset contains an object with results and some metadata.
If you were scraping the HTML `<title>` of [Apify](https://apify.com/) and returning
the following object from the `pageFunction`

```js
return {
  title: "Web Scraping, Data Extraction and Automation - Apify"
}
```

it would look like this:

```json
{
  "title": "Web Scraping, Data Extraction and Automation - Apify",
  "#error": false,
  "#debug": {
    "requestId": "fvwscO2UJLdr10B",
    "url": "https://apify.com",
    "loadedUrl": "https://apify.com/",
    "method": "GET",
    "retryCount": 0,
    "errorMessages": null,
    "statusCode": 200
  }
}
```

You can remove the metadata (and results containing only metadata) from the results
by selecting the **Clean items** option when downloading the dataset.

The result will look like this:

```json
{
  "title": "Web Scraping, Data Extraction and Automation - Apify"
}
```

# Actor input Schema

## `startUrls` (type: `array`):

URLs to start with

## `useRequestQueue` (type: `boolean`):

Request queue enables recursive crawling and the use of Pseudo-URLs, Link selector and <code>context.enqueueRequest()</code>.

## `pseudoUrls` (type: `array`):

Pseudo-URLs to match links in the page that you want to enqueue. Combine with Link selector to tell the scraper where to find links. Omitting the Pseudo-URLs will cause the scraper to enqueue all links matched by the Link selector.

## `linkSelector` (type: `string`):

CSS selector matching elements with 'href' attributes that should be enqueued. To enqueue urls from <code><div class="my-class" href=...></code> tags, you would enter <strong>div.my-class</strong>. Leave empty to ignore all links.

## `keepUrlFragments` (type: `boolean`):

URL fragments (the parts of URL after a <code>#</code>) are not considered when the scraper determines whether a URL has already been visited. This means that when adding URLs such as <code>https://example.com/#foo</code> and <code>https://example.com/#bar</code>, only the first will be visited. Turn this option on to tell the scraper to visit both.

## `pageFunction` (type: `string`):

Function executed for each request

## `injectJQuery` (type: `boolean`):

The jQuery library will be injected into each page. If the page already uses jQuery, conflicts may arise.

## `injectUnderscore` (type: `boolean`):

The Underscore.js library will be injected into each page. If the page already uses Underscore.js (or other libraries that attach to '\_', such as Lodash), conflicts may arise.

## `proxyConfiguration` (type: `object`):

Choose to use no proxy, Apify Proxy, or provide custom proxy URLs.

## `initialCookies` (type: `array`):

The provided cookies will be pre-set to all pages the scraper opens.

## `useChrome` (type: `boolean`):

The scraper will use a real Chrome browser instead of a Chromium masking as Chrome. Using this option may help with bypassing certain anti-scraping protections, but risks that the scraper will be unstable or not work at all.

## `useStealth` (type: `boolean`):

The scraper will apply various browser emulation techniques to match a real user as closely as possible. This feature works best in conjunction with the Use Chrome option and also carries the risk of making the scraper unstable.

## `ignoreSslErrors` (type: `boolean`):

Scraper will ignore SSL certificate errors.

## `ignoreCorsAndCsp` (type: `boolean`):

Scraper will ignore CSP (content security policy) and CORS (cross origin resource sharing) settings of visited pages and requested domains. This enables you to freely use XHR/Fetch to make HTTP requests from the scraper.

## `downloadMedia` (type: `boolean`):

Scraper will download media such as images, fonts, videos and sounds. Disabling this may speed up the scrape, but certain websites could stop working correctly.

## `downloadCss` (type: `boolean`):

Scraper will download CSS stylesheets. Disabling this may speed up the scrape, but certain websites could stop working correctly.

## `maxRequestRetries` (type: `integer`):

Maximum number of times the request for the page will be retried in case of an error. Setting it to 0 means that the request will be attempted once and will not be retried if it fails.

## `maxPagesPerCrawl` (type: `integer`):

Maximum number of pages that the scraper will open. 0 means unlimited.

## `maxResultsPerCrawl` (type: `integer`):

Maximum number of results that will be saved to dataset. The scraper will terminate afterwards. 0 means unlimited.

## `maxCrawlingDepth` (type: `integer`):

Defines how many links away from the StartURLs will the scraper descend. 0 means unlimited.

## `maxConcurrency` (type: `integer`):

Defines how many pages can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. Use this option to set a hard limit.

## `pageLoadTimeoutSecs` (type: `integer`):

Maximum time the scraper will allow a web page to load in seconds.

## `pageFunctionTimeoutSecs` (type: `integer`):

Maximum time the scraper will wait for the page function to execute in seconds.

## `waitUntil` (type: `array`):

The scraper will wait until the selected events are triggered in the page before executing the page function. Available events are <code>domcontentloaded</code>, <code>load</code>, <code>networkidle2</code> and <code>networkidle0</code>. <a href="https://pptr.dev/#?product=Puppeteer&show=api-pagegotourl-options" target="_blank">See Puppeteer docs</a>.

## `debugLog` (type: `boolean`):

Debug messages will be included in the log. Use <code>context.log.debug('message')</code> to log your own debug messages.

## `browserLog` (type: `boolean`):

Console messages from the Browser will be included in the log. This may result in the log being flooded by error messages, warnings and other messages of little value, especially with high concurrency.

## `chromeDebugger` (type: `boolean`):

Experimental implementation of Chrome debugger. In this mode the scraper will run with single browser on concurrency 1. You can place <code>debugger;</code> into your page function to set up a debugger breakpoint.

## `customData` (type: `object`):

This object will be available on pageFunction's context as customData.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://apify.com"
    }
  ],
  "useRequestQueue": true,
  "pseudoUrls": [
    {
      "purl": "https://apify.com[(/[\\w-]+)?]"
    }
  ],
  "linkSelector": "a",
  "keepUrlFragments": false,
  "pageFunction": "async function pageFunction(context) {\n    // See README for context properties. If the syntax is unfamiliar see the link\n    // https://javascript.info/destructuring-assignment#object-destructuring\n    const { request, log, jQuery } = context;\n\n    // To be able to use jQuery as $, one needs save it into a variable\n    // and select the inject jQuery option. We've selected it for you.\n    const $ = jQuery;\n    const title = $('title').text();\n\n    // This is yet another new feature of Javascript called template strings.\n    // https://javascript.info/string#quotes\n    log.info(`URL: ${request.url} TITLE: ${title}`);\n\n    // To save data just return an object with the requested properties.\n    return {\n        url: request.url,\n        title\n    };\n}",
  "injectJQuery": true,
  "injectUnderscore": false,
  "proxyConfiguration": {
    "useApifyProxy": false
  },
  "initialCookies": [],
  "useChrome": false,
  "useStealth": false,
  "ignoreSslErrors": false,
  "ignoreCorsAndCsp": false,
  "downloadMedia": true,
  "downloadCss": true,
  "maxRequestRetries": 3,
  "maxPagesPerCrawl": 0,
  "maxResultsPerCrawl": 0,
  "maxCrawlingDepth": 0,
  "maxConcurrency": 50,
  "pageLoadTimeoutSecs": 60,
  "pageFunctionTimeoutSecs": 60,
  "waitUntil": [
    "networkidle2"
  ],
  "debugLog": false,
  "browserLog": false,
  "chromeDebugger": false,
  "customData": {}
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://apify.com"
        }
    ],
    "pseudoUrls": [
        {
            "purl": "https://apify.com[(/[\\w-]+)?]"
        }
    ],
    "linkSelector": "a",
    "pageFunction": async function pageFunction(context) {
        // See README for context properties. If the syntax is unfamiliar see the link
        // https://javascript.info/destructuring-assignment#object-destructuring
        const { request, log, jQuery } = context;
    
        // To be able to use jQuery as $, one needs save it into a variable
        // and select the inject jQuery option. We've selected it for you.
        const $ = jQuery;
        const title = $('title').text();
    
        // This is yet another new feature of Javascript called template strings.
        // https://javascript.info/string#quotes
        log.info(`URL: ${request.url} TITLE: ${title}`);
    
        // To save data just return an object with the requested properties.
        return {
            url: request.url,
            title
        };
    },
    "proxyConfiguration": {
        "useApifyProxy": false
    },
    "initialCookies": [],
    "waitUntil": [
        "networkidle2"
    ],
    "customData": {}
};

// Run the Actor and wait for it to finish
const run = await client.actor("mtrunkat/web-scraper-experimental-dbgr").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://apify.com" }],
    "pseudoUrls": [{ "purl": "https://apify.com[(/[\\w-]+)?]" }],
    "linkSelector": "a",
    "pageFunction": """async function pageFunction(context) {
    // See README for context properties. If the syntax is unfamiliar see the link
    // https://javascript.info/destructuring-assignment#object-destructuring
    const { request, log, jQuery } = context;

    // To be able to use jQuery as $, one needs save it into a variable
    // and select the inject jQuery option. We've selected it for you.
    const $ = jQuery;
    const title = $('title').text();

    // This is yet another new feature of Javascript called template strings.
    // https://javascript.info/string#quotes
    log.info(`URL: ${request.url} TITLE: ${title}`);

    // To save data just return an object with the requested properties.
    return {
        url: request.url,
        title
    };
}""",
    "proxyConfiguration": { "useApifyProxy": False },
    "initialCookies": [],
    "waitUntil": ["networkidle2"],
    "customData": {},
}

# Run the Actor and wait for it to finish
run = client.actor("mtrunkat/web-scraper-experimental-dbgr").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://apify.com"
    }
  ],
  "pseudoUrls": [
    {
      "purl": "https://apify.com[(/[\\\\w-]+)?]"
    }
  ],
  "linkSelector": "a",
  "pageFunction": "async function pageFunction(context) {\\n    // See README for context properties. If the syntax is unfamiliar see the link\\n    // https://javascript.info/destructuring-assignment#object-destructuring\\n    const { request, log, jQuery } = context;\\n\\n    // To be able to use jQuery as $, one needs save it into a variable\\n    // and select the inject jQuery option. We'\''ve selected it for you.\\n    const $ = jQuery;\\n    const title = $('\''title'\'').text();\\n\\n    // This is yet another new feature of Javascript called template strings.\\n    // https://javascript.info/string#quotes\\n    log.info(`URL: ${request.url} TITLE: ${title}`);\\n\\n    // To save data just return an object with the requested properties.\\n    return {\\n        url: request.url,\\n        title\\n    };\\n}",
  "proxyConfiguration": {
    "useApifyProxy": false
  },
  "initialCookies": [],
  "waitUntil": [
    "networkidle2"
  ],
  "customData": {}
}' |
apify call mtrunkat/web-scraper-experimental-dbgr --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=mtrunkat/web-scraper-experimental-dbgr",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Web Scraper Experimental Debug",
        "description": "Experimental version of Apify Web Scraper with Chrome debugger integrated",
        "version": "0.1",
        "x-build-id": "Q3sb7KLmaZ3L46Zyg"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/mtrunkat~web-scraper-experimental-dbgr/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-mtrunkat-web-scraper-experimental-dbgr",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/mtrunkat~web-scraper-experimental-dbgr/runs": {
            "post": {
                "operationId": "runs-sync-mtrunkat-web-scraper-experimental-dbgr",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/mtrunkat~web-scraper-experimental-dbgr/run-sync": {
            "post": {
                "operationId": "run-sync-mtrunkat-web-scraper-experimental-dbgr",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls",
                    "pageFunction"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "URLs to start with",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "useRequestQueue": {
                        "title": "Use request queue",
                        "type": "boolean",
                        "description": "Request queue enables recursive crawling and the use of Pseudo-URLs, Link selector and <code>context.enqueueRequest()</code>.",
                        "default": true
                    },
                    "pseudoUrls": {
                        "title": "Pseudo-URLs",
                        "type": "array",
                        "description": "Pseudo-URLs to match links in the page that you want to enqueue. Combine with Link selector to tell the scraper where to find links. Omitting the Pseudo-URLs will cause the scraper to enqueue all links matched by the Link selector.",
                        "default": [],
                        "items": {
                            "type": "object",
                            "required": [
                                "purl"
                            ],
                            "properties": {
                                "purl": {
                                    "type": "string",
                                    "title": "Pseudo-URL of a web page"
                                }
                            }
                        }
                    },
                    "linkSelector": {
                        "title": "Link selector",
                        "type": "string",
                        "description": "CSS selector matching elements with 'href' attributes that should be enqueued. To enqueue urls from <code><div class=\"my-class\" href=...></code> tags, you would enter <strong>div.my-class</strong>. Leave empty to ignore all links."
                    },
                    "keepUrlFragments": {
                        "title": "Keep URL fragments",
                        "type": "boolean",
                        "description": "URL fragments (the parts of URL after a <code>#</code>) are not considered when the scraper determines whether a URL has already been visited. This means that when adding URLs such as <code>https://example.com/#foo</code> and <code>https://example.com/#bar</code>, only the first will be visited. Turn this option on to tell the scraper to visit both.",
                        "default": false
                    },
                    "pageFunction": {
                        "title": "Page function",
                        "type": "string",
                        "description": "Function executed for each request"
                    },
                    "injectJQuery": {
                        "title": "jQuery",
                        "type": "boolean",
                        "description": "The jQuery library will be injected into each page. If the page already uses jQuery, conflicts may arise.",
                        "default": true
                    },
                    "injectUnderscore": {
                        "title": "Underscore",
                        "type": "boolean",
                        "description": "The Underscore.js library will be injected into each page. If the page already uses Underscore.js (or other libraries that attach to '_', such as Lodash), conflicts may arise.",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Choose to use no proxy, Apify Proxy, or provide custom proxy URLs.",
                        "default": {}
                    },
                    "initialCookies": {
                        "title": "Initial cookies",
                        "type": "array",
                        "description": "The provided cookies will be pre-set to all pages the scraper opens.",
                        "default": []
                    },
                    "useChrome": {
                        "title": "Use Chrome",
                        "type": "boolean",
                        "description": "The scraper will use a real Chrome browser instead of a Chromium masking as Chrome. Using this option may help with bypassing certain anti-scraping protections, but risks that the scraper will be unstable or not work at all.",
                        "default": false
                    },
                    "useStealth": {
                        "title": "Use Stealth",
                        "type": "boolean",
                        "description": "The scraper will apply various browser emulation techniques to match a real user as closely as possible. This feature works best in conjunction with the Use Chrome option and also carries the risk of making the scraper unstable.",
                        "default": false
                    },
                    "ignoreSslErrors": {
                        "title": "Ignore SSL errors",
                        "type": "boolean",
                        "description": "Scraper will ignore SSL certificate errors.",
                        "default": false
                    },
                    "ignoreCorsAndCsp": {
                        "title": "Ignore CORS and CSP",
                        "type": "boolean",
                        "description": "Scraper will ignore CSP (content security policy) and CORS (cross origin resource sharing) settings of visited pages and requested domains. This enables you to freely use XHR/Fetch to make HTTP requests from the scraper.",
                        "default": false
                    },
                    "downloadMedia": {
                        "title": "Download media",
                        "type": "boolean",
                        "description": "Scraper will download media such as images, fonts, videos and sounds. Disabling this may speed up the scrape, but certain websites could stop working correctly.",
                        "default": true
                    },
                    "downloadCss": {
                        "title": "Download CSS",
                        "type": "boolean",
                        "description": "Scraper will download CSS stylesheets. Disabling this may speed up the scrape, but certain websites could stop working correctly.",
                        "default": true
                    },
                    "maxRequestRetries": {
                        "title": "Max request retries",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of times the request for the page will be retried in case of an error. Setting it to 0 means that the request will be attempted once and will not be retried if it fails.",
                        "default": 3
                    },
                    "maxPagesPerCrawl": {
                        "title": "Max pages per run",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of pages that the scraper will open. 0 means unlimited.",
                        "default": 0
                    },
                    "maxResultsPerCrawl": {
                        "title": "Max result records",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of results that will be saved to dataset. The scraper will terminate afterwards. 0 means unlimited.",
                        "default": 0
                    },
                    "maxCrawlingDepth": {
                        "title": "Max crawling depth",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Defines how many links away from the StartURLs will the scraper descend. 0 means unlimited.",
                        "default": 0
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Defines how many pages can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. Use this option to set a hard limit.",
                        "default": 50
                    },
                    "pageLoadTimeoutSecs": {
                        "title": "Page load timeout",
                        "minimum": 1,
                        "maximum": 360,
                        "type": "integer",
                        "description": "Maximum time the scraper will allow a web page to load in seconds.",
                        "default": 60
                    },
                    "pageFunctionTimeoutSecs": {
                        "title": "Page function timeout",
                        "minimum": 1,
                        "maximum": 360,
                        "type": "integer",
                        "description": "Maximum time the scraper will wait for the page function to execute in seconds.",
                        "default": 60
                    },
                    "waitUntil": {
                        "title": "Navigation wait until",
                        "type": "array",
                        "description": "The scraper will wait until the selected events are triggered in the page before executing the page function. Available events are <code>domcontentloaded</code>, <code>load</code>, <code>networkidle2</code> and <code>networkidle0</code>. <a href=\"https://pptr.dev/#?product=Puppeteer&show=api-pagegotourl-options\" target=\"_blank\">See Puppeteer docs</a>.",
                        "default": [
                            "networkidle2"
                        ]
                    },
                    "debugLog": {
                        "title": "Debug log",
                        "type": "boolean",
                        "description": "Debug messages will be included in the log. Use <code>context.log.debug('message')</code> to log your own debug messages.",
                        "default": false
                    },
                    "browserLog": {
                        "title": "Browser log",
                        "type": "boolean",
                        "description": "Console messages from the Browser will be included in the log. This may result in the log being flooded by error messages, warnings and other messages of little value, especially with high concurrency.",
                        "default": false
                    },
                    "chromeDebugger": {
                        "title": "Chrome debugger [experimental]",
                        "type": "boolean",
                        "description": "Experimental implementation of Chrome debugger. In this mode the scraper will run with single browser on concurrency 1. You can place <code>debugger;</code> into your page function to set up a debugger breakpoint.",
                        "default": false
                    },
                    "customData": {
                        "title": "Custom data",
                        "type": "object",
                        "description": "This object will be available on pageFunction's context as customData.",
                        "default": {}
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
