# RAG Browser (`visita/rag-browser`) Actor

This Actor provides essential web browsing and content extraction functionality for AI Agents, LLM applications, and Retrieval-Augmented Generation (RAG) pipelines. It functions similarly to the web search feature in popular LLM chatbots, providing fresh, contextualized data directly from the web.

- **URL**: https://apify.com/visita/rag-browser.md
- **Developed by:** [Visita Intelligence](https://apify.com/visita) (community)
- **Categories:** Developer tools, MCP servers
- **Stats:** 19 total users, 1 monthly users, 100.0% runs succeeded, 3 bookmarks
- **User rating**: No ratings yet

## Pricing

$7.00 / 1,000 page crawleds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🌐 RAG Web Browser

**Give your AI agent live web access.** This Apify Actor searches Google, scrapes the top result pages, and returns clean Markdown (or plain text / HTML) ready for LLM consumption. Optional **chunked output** splits content into embedding-ready segments for direct ingestion into vector databases.

Built for **OpenAI Assistants, custom GPTs, LangChain, CrewAI, LlamaIndex**, and any RAG pipeline that needs real-time web data.

---

### Quick Start

#### 1. Run via Apify API (one-liner)

```bash
curl -X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~rag-web-browser/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "latest AI news 2026", "maxResults": 3}'
````

#### 2. Run via Apify Client (Node.js)

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('YOUR_USERNAME/rag-web-browser').call({
    query: 'best practices for RAG pipelines',
    maxResults: 3,
    outputFormats: ['markdown'],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].markdown);
```

#### 3. Run via Apify Client (Python)

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("YOUR_USERNAME/rag-web-browser").call(
    run_input={"query": "best practices for RAG pipelines", "maxResults": 3}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["markdown"][:500])
```

***

### Main Features

| Feature | Description |
| :--- | :--- |
| **Real-Time Grounding** | Queries Google Search for up-to-date information — no stale training data. |
| **Clean Markdown Output** | Strips navigation, ads, modals, and scripts. Returns LLM-ready Markdown. |
| **Chunked Output for RAG** | Optionally splits each page into overlapping chunks, perfect for embedding into vector DBs. |
| **Hybrid Scraping** | Fast `raw-http` mode by default; falls back to full Playwright browser for JS-heavy sites. |
| **Standby / HTTP Mode** | Run as a persistent HTTP service with a `/search` endpoint for real-time queries. |
| **MCP Support** | Built-in Model Context Protocol server for native AI tool integration. |
| **OpenAPI Spec Included** | Plug directly into OpenAI custom GPTs as an Action. |

***

### 💰 Pay-per-Event (PPE) Pricing

You pay only for the pages you actually get — no CU charges for the Actor run itself.

| Event Name | Title | Unit | Price | Description |
| :--- | :--- | :--- | :--- | :--- |
| `apify-default-dataset-item` | Page crawled | Per page | **$0.007** | Charged each time a web page is successfully crawled and its content is extracted. Failed or skipped pages are not charged. |

**Example cost:** A search with `maxResults: 3` that successfully scrapes all 3 pages costs **$0.021**.

**Cost comparison vs. alternatives:**

| Service | Typical cost (3 results) | Clean Markdown | Chunking | Proxy included |
| :--- | :--- | :--- | :--- | :--- |
| **This Actor** | **~$0.021** | Yes | Yes | Yes |
| Tavily Search API | ~$0.005 (snippets only) | Partial | No | N/A |
| SerpAPI | ~$0.01 (SERP only) | No | No | Yes |
| Brave Search API | ~$0.005 (snippets only) | No | No | N/A |

***

### ⚙️ Input Parameters

| Parameter | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `query` | string | *(required)* | Google Search keywords **or** a specific URL to scrape. Supports [advanced operators](https://blog.apify.com/how-to-scrape-google-like-a-pro/). |
| `maxResults` | integer | `3` | Number of top SERP results to scrape (1–100). Ignored when `query` is a URL. |
| `outputFormats` | array | `["markdown"]` | One or more of: `text`, `markdown`, `html`. |
| `scrapingTool` | string | `raw-http` | `raw-http` (fast) or `browser-playwright` (handles JS-heavy sites). |
| `requestTimeoutSecs` | integer | `40` | Max seconds for the entire request. |
| `maxRequestRetries` | integer | `1` | Retries per target page on failure. |
| `removeCookieWarnings` | boolean | `true` | Attempt to dismiss cookie consent dialogs. |
| `debugMode` | boolean | `false` | Include timing/debug info in output. |

***

### 📤 Output Format

Each result in the dataset is a JSON object:

```json
{
  "metadata": {
    "url": "https://example.com/article",
    "title": "Example Article Title",
    "description": "Meta description of the page",
    "author": "Jane Doe",
    "languageCode": "en"
  },
  "searchResult": {
    "title": "Example Article Title",
    "description": "Google snippet for this result",
    "url": "https://example.com/article",
    "resultType": "ORGANIC",
    "rank": 1
  },
  "markdown": "# Example Article Title\n\nThe full content of the page in clean Markdown...",
  "text": null,
  "html": null,
  "query": "example search query"
}
```

***

### 🔗 Integration Examples

#### OpenAI Assistants / Custom GPTs

This Actor ships with an [OpenAPI specification](.actor/openapi.json) you can import directly as a **GPT Action**:

1. In the GPT editor, go to **Configure → Actions → Create new action**.
2. Import the schema from `.actor/openapi.json`.
3. Set the server URL to your Standby endpoint or the Apify API.
4. Your GPT can now call `searchWeb` to get live search results.

#### LangChain (Python)

```python
from langchain_community.utilities import ApifyWrapper

apify = ApifyWrapper()
loader = apify.call_actor(
    actor_id="YOUR_USERNAME/rag-web-browser",
    run_input={"query": "LangChain RAG tutorial", "maxResults": 3},
    dataset_mapping_function=lambda item: item.get("markdown", ""),
)
docs = loader.load()
## docs is a list of Document objects ready for your chain
```

#### CrewAI

```python
from crewai_tools import ApifyActorTool

search_tool = ApifyActorTool(
    actor_id="YOUR_USERNAME/rag-web-browser",
    input={"query": "{query}", "maxResults": 3},
    output_key="markdown",
)
## Use search_tool in your CrewAI agent definition
```

#### LlamaIndex

```python
from llama_index.readers.apify import ApifyActor

reader = ApifyActor("YOUR_USERNAME/rag-web-browser")
documents = reader.load_data(
    run_input={"query": "vector database comparison 2026", "maxResults": 5}
)
## Feed documents into your LlamaIndex pipeline
```

#### Direct HTTP (Standby Mode)

When the Actor runs in Standby mode, query it like any REST API:

```bash
curl "https://YOUR_STANDBY_URL/search?query=latest+AI+news&maxResults=3"
```

***

### 🤖 Use Cases

- **Ground LLM responses** with fresh web data to eliminate hallucinations
- **Build research agents** that autonomously gather and synthesize information
- **Power AI chatbots** with real-time search (like ChatGPT's browse feature)
- **Feed RAG pipelines** with up-to-date documents for question answering
- **Monitor topics** by periodically searching and extracting content
- **Create datasets** of clean web content for fine-tuning or evaluation

***

### License

ISC

# Actor input Schema

## `query` (type: `string`):

Enter Google Search keywords or a URL of a specific web page. The keywords might include the [advanced search operators](https://blog.apify.com/how-to-scrape-google-like-a-pro/). Examples:

- <code>san francisco weather</code>
- <code>https://www.cnn.com</code>
- <code>function calling site:openai.com</code>

## `maxResults` (type: `integer`):

The maximum number of top organic Google Search results whose web pages will be extracted. If `query` is a URL, then this field is ignored and the Actor only fetches the specific web page.

## `outputFormats` (type: `array`):

Select one or more formats to which the target web pages will be extracted and saved in the resulting dataset.

## `requestTimeoutSecs` (type: `integer`):

The maximum time in seconds available for the request, including querying Google Search and scraping the target web pages. For example, OpenAI allows only [45 seconds](https://platform.openai.com/docs/actions/production#timeouts) for custom actions. If a target page loading and extraction exceeds this timeout, the corresponding page will be skipped in results to ensure at least some results are returned within the timeout. If no page is extracted within the timeout, the whole request fails.

## `serpProxyGroup` (type: `string`):

Enables overriding the default Apify Proxy group used for fetching Google Search results.

## `serpMaxRetries` (type: `integer`):

The maximum number of times the Actor will retry fetching the Google Search results on error. If the last attempt fails, the entire request fails.

## `proxyConfiguration` (type: `object`):

Apify Proxy configuration used for scraping the target web pages.

## `scrapingTool` (type: `string`):

Select a scraping tool for extracting the target web pages. The Browser tool is more powerful and can handle JavaScript heavy websites, while the Plain HTML tool can't handle JavaScript but is about two times faster.

## `removeElementsCssSelector` (type: `string`):

A CSS selector matching HTML elements that will be removed from the DOM, before converting it to text, Markdown, or saving as HTML. This is useful to skip irrelevant page content. The value must be a valid CSS selector as accepted by the `document.querySelectorAll()` function.

By default, the Actor removes common navigation elements, headers, footers, modals, scripts, and inline image. You can disable the removal by setting this value to some non-existent CSS selector like `dummy_keep_everything`.

## `htmlTransformer` (type: `string`):

Specify how to transform the HTML to extract meaningful content without any extra fluff, like navigation or modals. The HTML transformation happens after removing and clicking the DOM elements.

- **None** (default) - Only removes the HTML elements specified via 'Remove HTML elements' option.

- **Readable text** - Extracts the main contents of the webpage, without navigation and other fluff.

## `desiredConcurrency` (type: `integer`):

The desired number of web browsers running in parallel. The system automatically scales the number based on the CPU and memory usage. If the initial value is `0`, the Actor picks the number automatically based on the available memory.

## `maxRequestRetries` (type: `integer`):

The maximum number of times the Actor will retry loading the target web page on error. If the last attempt fails, the page will be skipped in the results.

## `dynamicContentWaitSecs` (type: `integer`):

The maximum time in seconds to wait for dynamic page content to load. The Actor considers the web page as fully loaded once this time elapses or when the network becomes idle.

## `removeCookieWarnings` (type: `boolean`):

If enabled, the Actor attempts to close or remove cookie consent dialogs to improve the quality of extracted text. Note that this setting increases the latency.

## `debugMode` (type: `boolean`):

If enabled, the Actor will store debugging information into the resulting dataset under the `debug` field.

## `chunkSize` (type: `integer`):

If set to a value greater than 0, the extracted content will be split into overlapping chunks of approximately this size (in characters). This is useful for preparing content for embedding into vector databases. Set to 0 to disable chunking and return full-page content.

## `chunkOverlap` (type: `integer`):

The number of characters that overlap between consecutive chunks. A small overlap (e.g. 100-200 characters) helps preserve context at chunk boundaries, improving retrieval quality in RAG pipelines. Only used when `chunkSize` is greater than 0.

## Actor input object example

```json
{
  "query": "web browser for RAG pipelines -site:reddit.com",
  "maxResults": 3,
  "outputFormats": [
    "markdown"
  ],
  "requestTimeoutSecs": 40,
  "serpProxyGroup": "GOOGLE_SERP",
  "serpMaxRetries": 2,
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "scrapingTool": "raw-http",
  "removeElementsCssSelector": "nav, footer, script, style, noscript, svg, img[src^='data:'],\n[role=\"alert\"],\n[role=\"banner\"],\n[role=\"dialog\"],\n[role=\"alertdialog\"],\n[role=\"region\"][aria-label*=\"skip\" i],\n[aria-modal=\"true\"]",
  "htmlTransformer": "none",
  "desiredConcurrency": 5,
  "maxRequestRetries": 1,
  "dynamicContentWaitSecs": 10,
  "removeCookieWarnings": true,
  "debugMode": false,
  "chunkSize": 0,
  "chunkOverlap": 100
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "web browser for RAG pipelines -site:reddit.com",
    "proxyConfiguration": {
        "useApifyProxy": true
    },
    "removeElementsCssSelector": `nav, footer, script, style, noscript, svg, img[src^='data:'],
[role="alert"],
[role="banner"],
[role="dialog"],
[role="alertdialog"],
[role="region"][aria-label*="skip" i],
[aria-modal="true"]`,
    "htmlTransformer": "none"
};

// Run the Actor and wait for it to finish
const run = await client.actor("visita/rag-browser").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "query": "web browser for RAG pipelines -site:reddit.com",
    "proxyConfiguration": { "useApifyProxy": True },
    "removeElementsCssSelector": """nav, footer, script, style, noscript, svg, img[src^='data:'],
[role=\"alert\"],
[role=\"banner\"],
[role=\"dialog\"],
[role=\"alertdialog\"],
[role=\"region\"][aria-label*=\"skip\" i],
[aria-modal=\"true\"]""",
    "htmlTransformer": "none",
}

# Run the Actor and wait for it to finish
run = client.actor("visita/rag-browser").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "web browser for RAG pipelines -site:reddit.com",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "removeElementsCssSelector": "nav, footer, script, style, noscript, svg, img[src^='\''data:'\''],\\n[role=\\"alert\\"],\\n[role=\\"banner\\"],\\n[role=\\"dialog\\"],\\n[role=\\"alertdialog\\"],\\n[role=\\"region\\"][aria-label*=\\"skip\\" i],\\n[aria-modal=\\"true\\"]",
  "htmlTransformer": "none"
}' |
apify call visita/rag-browser --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=visita/rag-browser",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "RAG Browser",
        "description": "This Actor provides essential web browsing and content extraction functionality for AI Agents, LLM applications, and Retrieval-Augmented Generation (RAG) pipelines. It functions similarly to the web search feature in popular LLM chatbots, providing fresh, contextualized data directly from the web.",
        "version": "0.0",
        "x-build-id": "D7V4hORkCo8dcGOEs"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/visita~rag-browser/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-visita-rag-browser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/visita~rag-browser/runs": {
            "post": {
                "operationId": "runs-sync-visita-rag-browser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/visita~rag-browser/run-sync": {
            "post": {
                "operationId": "run-sync-visita-rag-browser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "query"
                ],
                "properties": {
                    "query": {
                        "title": "Search term or URL",
                        "pattern": "[^\\s]+",
                        "type": "string",
                        "description": "Enter Google Search keywords or a URL of a specific web page. The keywords might include the [advanced search operators](https://blog.apify.com/how-to-scrape-google-like-a-pro/). Examples:\n\n- <code>san francisco weather</code>\n- <code>https://www.cnn.com</code>\n- <code>function calling site:openai.com</code>"
                    },
                    "maxResults": {
                        "title": "Maximum results",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "The maximum number of top organic Google Search results whose web pages will be extracted. If `query` is a URL, then this field is ignored and the Actor only fetches the specific web page.",
                        "default": 3
                    },
                    "outputFormats": {
                        "title": "Output formats",
                        "type": "array",
                        "description": "Select one or more formats to which the target web pages will be extracted and saved in the resulting dataset.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "text",
                                "markdown",
                                "html"
                            ],
                            "enumTitles": [
                                "Plain text",
                                "Markdown",
                                "HTML"
                            ]
                        },
                        "default": [
                            "markdown"
                        ]
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout",
                        "minimum": 1,
                        "maximum": 300,
                        "type": "integer",
                        "description": "The maximum time in seconds available for the request, including querying Google Search and scraping the target web pages. For example, OpenAI allows only [45 seconds](https://platform.openai.com/docs/actions/production#timeouts) for custom actions. If a target page loading and extraction exceeds this timeout, the corresponding page will be skipped in results to ensure at least some results are returned within the timeout. If no page is extracted within the timeout, the whole request fails.",
                        "default": 40
                    },
                    "serpProxyGroup": {
                        "title": "SERP proxy group",
                        "enum": [
                            "GOOGLE_SERP",
                            "SHADER"
                        ],
                        "type": "string",
                        "description": "Enables overriding the default Apify Proxy group used for fetching Google Search results.",
                        "default": "GOOGLE_SERP"
                    },
                    "serpMaxRetries": {
                        "title": "SERP max retries",
                        "minimum": 0,
                        "maximum": 5,
                        "type": "integer",
                        "description": "The maximum number of times the Actor will retry fetching the Google Search results on error. If the last attempt fails, the entire request fails.",
                        "default": 2
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify Proxy configuration used for scraping the target web pages.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "scrapingTool": {
                        "title": "Select a scraping tool",
                        "enum": [
                            "browser-playwright",
                            "raw-http"
                        ],
                        "type": "string",
                        "description": "Select a scraping tool for extracting the target web pages. The Browser tool is more powerful and can handle JavaScript heavy websites, while the Plain HTML tool can't handle JavaScript but is about two times faster.",
                        "default": "raw-http"
                    },
                    "removeElementsCssSelector": {
                        "title": "Remove HTML elements (CSS selector)",
                        "type": "string",
                        "description": "A CSS selector matching HTML elements that will be removed from the DOM, before converting it to text, Markdown, or saving as HTML. This is useful to skip irrelevant page content. The value must be a valid CSS selector as accepted by the `document.querySelectorAll()` function. \n\nBy default, the Actor removes common navigation elements, headers, footers, modals, scripts, and inline image. You can disable the removal by setting this value to some non-existent CSS selector like `dummy_keep_everything`.",
                        "default": "nav, footer, script, style, noscript, svg, img[src^='data:'],\n[role=\"alert\"],\n[role=\"banner\"],\n[role=\"dialog\"],\n[role=\"alertdialog\"],\n[role=\"region\"][aria-label*=\"skip\" i],\n[aria-modal=\"true\"]"
                    },
                    "htmlTransformer": {
                        "title": "HTML transformer",
                        "type": "string",
                        "description": "Specify how to transform the HTML to extract meaningful content without any extra fluff, like navigation or modals. The HTML transformation happens after removing and clicking the DOM elements.\n\n- **None** (default) - Only removes the HTML elements specified via 'Remove HTML elements' option.\n\n- **Readable text** - Extracts the main contents of the webpage, without navigation and other fluff.",
                        "default": "none"
                    },
                    "desiredConcurrency": {
                        "title": "Desired browsing concurrency",
                        "minimum": 0,
                        "maximum": 50,
                        "type": "integer",
                        "description": "The desired number of web browsers running in parallel. The system automatically scales the number based on the CPU and memory usage. If the initial value is `0`, the Actor picks the number automatically based on the available memory.",
                        "default": 5
                    },
                    "maxRequestRetries": {
                        "title": "Target page max retries",
                        "minimum": 0,
                        "maximum": 3,
                        "type": "integer",
                        "description": "The maximum number of times the Actor will retry loading the target web page on error. If the last attempt fails, the page will be skipped in the results.",
                        "default": 1
                    },
                    "dynamicContentWaitSecs": {
                        "title": "Target page dynamic content timeout",
                        "type": "integer",
                        "description": "The maximum time in seconds to wait for dynamic page content to load. The Actor considers the web page as fully loaded once this time elapses or when the network becomes idle.",
                        "default": 10
                    },
                    "removeCookieWarnings": {
                        "title": "Remove cookie warnings",
                        "type": "boolean",
                        "description": "If enabled, the Actor attempts to close or remove cookie consent dialogs to improve the quality of extracted text. Note that this setting increases the latency.",
                        "default": true
                    },
                    "debugMode": {
                        "title": "Enable debug mode",
                        "type": "boolean",
                        "description": "If enabled, the Actor will store debugging information into the resulting dataset under the `debug` field.",
                        "default": false
                    },
                    "chunkSize": {
                        "title": "Chunk size (characters)",
                        "minimum": 0,
                        "maximum": 50000,
                        "type": "integer",
                        "description": "If set to a value greater than 0, the extracted content will be split into overlapping chunks of approximately this size (in characters). This is useful for preparing content for embedding into vector databases. Set to 0 to disable chunking and return full-page content.",
                        "default": 0
                    },
                    "chunkOverlap": {
                        "title": "Chunk overlap (characters)",
                        "minimum": 0,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "The number of characters that overlap between consecutive chunks. A small overlap (e.g. 100-200 characters) helps preserve context at chunk boundaries, improving retrieval quality in RAG pipelines. Only used when `chunkSize` is greater than 0.",
                        "default": 100
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
