# Cloudflare Web Scraper (Pay per event) (`ecomscrape/cloudflare-web-scraper-ppe`) Actor

Advanced web scraper designed to extract data from Cloudflare-protected websites with CAPTCHA bypass, proxy rotation, and JavaScript execution capabilities.

- **URL**: https://apify.com/ecomscrape/cloudflare-web-scraper-ppe.md
- **Developed by:** [ecomscrape](https://apify.com/ecomscrape) (community)
- **Categories:** Developer tools, Automation, Other
- **Stats:** 143 total users, 57 monthly users, 99.8% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.60 / 1,000 bypass cloudflares

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Contact

If you encounter any issues or need to exchange information, please feel free to contact us through the following link:
[My profile](https://apify.com/ecomscrape)


## What does Cloudflare web Scraper do?

### Introduction

Cloudflare protection systems present significant challenges for web scraping, with each website setting custom anti-bot thresholds and verification requirements. Millions of websites rely on Cloudflare's security features, including CAPTCHA challenges, bot detection algorithms, and rate limiting mechanisms that can block legitimate data collection efforts.

The Cloudflare Web Scraper addresses these challenges by providing a comprehensive solution for accessing protected websites. This tool becomes essential when businesses need to collect market data, monitor competitor pricing, gather research information, or perform automated testing on Cloudflare-protected platforms where manual access would be time-prohibitive.

### Scraper Overview

The Cloudflare Web Scraper is a sophisticated data extraction tool specifically engineered to handle modern web protection mechanisms. By utilizing proxy rotation and residential IP addresses, the scraper mimics natural browsing patterns to avoid detection.

Key advantages include automated CAPTCHA handling, JavaScript execution capabilities, and intelligent retry mechanisms. The scraper maintains session persistence, handles dynamic content loading, and provides detailed logging for troubleshooting. It's designed for developers, data analysts, researchers, and businesses requiring reliable access to protected web resources.

The tool excels in scenarios requiring large-scale data collection, real-time monitoring, and automated workflows where manual intervention isn't feasible.

### Input and Output Specifications

Example url 1: https://gitlab.com

Example url 2: https://www.manta.com/

Example url 3: https://www.cardmarket.com/en
    
Example Screenshot of product information page:
    
![](https://i.ibb.co/vzWCKnb/Screenshot-from-2025-01-06-17-36-52.png)

#### Input Format

The scraper accepts JSON configuration with the following parameters:


**Input:** 

```json
{
  "max_retries_per_url": 2, // Maximum waiting time when accessing the links you provided.
  "proxy": { // Add a proxy to ensure that during the data collection process, you are not detected as a bot.
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL" 
    ],
    "apifyProxyCountry": "SG" // You should choose an Country that coincides with the Country you want to collect data from
  },
  "urls": [ // Links to web pages.
    "https://gitlab.com",
    "https://www.manta.com/"
    "https://www.cardmarket.com/en"
  ],
  "js_script": "return 10 + 10 + 20", // JS script you want to run
  "js_timeout": 10,
  "retrieve_result_from_js_script": true, // Retrieve result from JS script
  "page_is_loaded_before_running_script": true, // Page is loaded before running script
  "execute_js_async": false, // Execute JS async
  "retrieve_html_from_url_after_loaded": true, // Retrieve page HTML from url after loaded
}
````

**Configuration Structure:**

- `max_retries_per_url` (integer): Defines maximum retry attempts when encountering failures or timeouts
- `proxy` (object): Contains proxy configuration for anonymization
  - `useApifyProxy` (boolean): Enables Apify's proxy service integration
  - `apifyProxyGroups` (array): Specifies proxy types, typically "RESIDENTIAL" for better success rates
  - `apifyProxyCountry` (string): Target country code matching data collection requirements
- `urls` (array): List of target URLs for data extraction
- `js_script` (string): Custom JavaScript code executed on each page
- `js_timeout` (integer): Maximum execution time for JavaScript operations
- `retrieve_result_from_js_script` (boolean): Whether to capture JavaScript execution results
- `page_is_loaded_before_running_script` (boolean): Ensures DOM readiness before script execution
- `execute_js_async` (boolean): Controls synchronous vs asynchronous JavaScript execution
- `retrieve_html_from_url_after_loaded` (boolean): Captures final HTML after all processing

#### Output Format

You get the output from the Idealo.de product scraper stored in a tab. The following is an example of the Information Fields collected after running the Actor.

```json
[ // List of product information
  {
    "url": "https://about.gitlab.com/",
    "result_from_js_script": 40,
    "html": "<!DOCTYPE html>...</html>" // HTML from web page
  }, // ... Many other product details
] 
```

The scraper returns structured data containing three primary components:

**URL Field**: Contains the processed website address, confirming successful navigation and any redirects encountered. This field helps verify that the correct page was accessed and provides tracking for batch operations.

**HTML Field**: Delivers the complete page HTML after Cloudflare challenges are resolved and dynamic content is loaded. This includes all rendered elements, loaded JavaScript content, and any dynamically inserted data that wouldn't be visible in the initial page source.

**Result from JS Script**: Contains the return value from the custom JavaScript code execution. This field enables extraction of specific data points, computed values, or complex page interactions that require JavaScript processing. The result format depends on the script's return statement and can include strings, numbers, objects, or arrays.

### Usage Instructions

**Step 1: Configuration Setup**
Configure your input parameters based on target website requirements. Choose appropriate proxy countries and set reasonable retry limits to balance success rates with execution time.

**Step 2: URL Preparation**
Ensure target URLs are accessible and specify the exact pages needed for data extraction. Test a small batch first to verify configuration effectiveness.

**Step 3: JavaScript Customization**
Write JavaScript code tailored to your data extraction needs. Common patterns include DOM element selection, data parsing, and API calls. Test scripts in browser console first.

**Step 4: Execution Monitoring**
Monitor scraper progress through logs and handle any errors appropriately. For persistent CAPTCHA challenges, consider integrating solver services for automated resolution.

**Best Practices:**

- Use residential proxies for better success rates
- Implement reasonable delays between requests
- Handle dynamic content loading properly
- Monitor for changes in website protection mechanisms

### Benefits and Applications

**Time Efficiency**: Automates complex bypass procedures that would require significant manual effort, enabling 24/7 data collection operations without human intervention.

**Real-World Applications**: Market research, competitive analysis, price monitoring, content aggregation, and compliance monitoring. Businesses use this for tracking product availability, monitoring competitor strategies, and gathering industry intelligence.

**Business Value**: Provides access to previously unavailable data sources, enabling data-driven decision making and competitive advantages. Organizations can maintain current market awareness and respond quickly to industry changes.

**Scalability**: Handles multiple URLs simultaneously with built-in error handling and retry mechanisms, making it suitable for enterprise-level data collection requirements.

### Conclusion

The Cloudflare Web Scraper provides a robust solution for accessing protected web content efficiently. By combining advanced bypass techniques with customizable JavaScript execution, it enables reliable data extraction from challenging sources.

Ready to overcome Cloudflare protection barriers? Configure your scraper parameters and start collecting valuable web data today.

## Your feedback

We are always working to improve Actors' performance. So, if you have any technical feedback about Cloudflare web Scraper or simply found a bug, please create an issue on the Actor's Issues tab in Apify Console.

# Actor input Schema

## `urls` (type: `array`):

Add the URLs of the Specific web page urls you want to scrape. You can paste URLs one by one, or use the Bulk edit section to add a prepared list.

## `js_script` (type: `string`):

Add the JS script you want to run and get result.

## `retrieve_result_from_js_script` (type: `boolean`):

Retrieve result from JS script.

## `page_is_loaded_before_running_script` (type: `boolean`):

The option allows waiting until the page has loaded to run the script or running before the page is loaded.

## `execute_js_async` (type: `boolean`):

Execute JS async.

## `retrieve_html_from_url_after_loaded` (type: `boolean`):

Retrieve page HTML from url after loaded

## `js_timeout` (type: `integer`):

Timeout on JS script run in seconds

## `max_retries_per_url` (type: `integer`):

Limit the number of retries for each URL, If an error occurs during the data scraping process.

## `proxy` (type: `object`):

Select proxies to be used by your scraper.

## Actor input object example

```json
{
  "urls": [
    "https://www.gitlab.com/"
  ],
  "js_script": "return 10 + 10 + 20",
  "retrieve_result_from_js_script": true,
  "page_is_loaded_before_running_script": true,
  "retrieve_html_from_url_after_loaded": true,
  "js_timeout": 10,
  "max_retries_per_url": 2,
  "proxy": {
    "useApifyProxy": false
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://www.gitlab.com/"
    ],
    "js_script": "return 10 + 10 + 20",
    "page_is_loaded_before_running_script": true,
    "execute_js_async": false,
    "retrieve_html_from_url_after_loaded": true,
    "js_timeout": 10,
    "max_retries_per_url": 2,
    "proxy": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("ecomscrape/cloudflare-web-scraper-ppe").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": ["https://www.gitlab.com/"],
    "js_script": "return 10 + 10 + 20",
    "page_is_loaded_before_running_script": True,
    "execute_js_async": False,
    "retrieve_html_from_url_after_loaded": True,
    "js_timeout": 10,
    "max_retries_per_url": 2,
    "proxy": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("ecomscrape/cloudflare-web-scraper-ppe").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://www.gitlab.com/"
  ],
  "js_script": "return 10 + 10 + 20",
  "page_is_loaded_before_running_script": true,
  "execute_js_async": false,
  "retrieve_html_from_url_after_loaded": true,
  "js_timeout": 10,
  "max_retries_per_url": 2,
  "proxy": {
    "useApifyProxy": false
  }
}' |
apify call ecomscrape/cloudflare-web-scraper-ppe --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=ecomscrape/cloudflare-web-scraper-ppe",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Cloudflare Web Scraper (Pay per event)",
        "description": "Advanced web scraper designed to extract data from Cloudflare-protected websites with CAPTCHA bypass, proxy rotation, and JavaScript execution capabilities.",
        "version": "0.0",
        "x-build-id": "br3a9Hxhx3xrZqhDe"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/ecomscrape~cloudflare-web-scraper-ppe/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-ecomscrape-cloudflare-web-scraper-ppe",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/ecomscrape~cloudflare-web-scraper-ppe/runs": {
            "post": {
                "operationId": "runs-sync-ecomscrape-cloudflare-web-scraper-ppe",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/ecomscrape~cloudflare-web-scraper-ppe/run-sync": {
            "post": {
                "operationId": "run-sync-ecomscrape-cloudflare-web-scraper-ppe",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "URLs of the Specific web page to crawl",
                        "type": "array",
                        "description": "Add the URLs of the Specific web page urls you want to scrape. You can paste URLs one by one, or use the Bulk edit section to add a prepared list.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "js_script": {
                        "title": "JS script you want to run",
                        "type": "string",
                        "description": "Add the JS script you want to run and get result."
                    },
                    "retrieve_result_from_js_script": {
                        "title": "Retrieve result from JS script",
                        "type": "boolean",
                        "description": "Retrieve result from JS script.",
                        "default": true
                    },
                    "page_is_loaded_before_running_script": {
                        "title": "Page is loaded before running script",
                        "type": "boolean",
                        "description": "The option allows waiting until the page has loaded to run the script or running before the page is loaded."
                    },
                    "execute_js_async": {
                        "title": "Execute JS async",
                        "type": "boolean",
                        "description": "Execute JS async."
                    },
                    "retrieve_html_from_url_after_loaded": {
                        "title": "Retrieve HTML from url after loaded",
                        "type": "boolean",
                        "description": "Retrieve page HTML from url after loaded"
                    },
                    "js_timeout": {
                        "title": "Timeout on JS script run",
                        "type": "integer",
                        "description": "Timeout on JS script run in seconds"
                    },
                    "max_retries_per_url": {
                        "title": "Limit the number of retries for each URL",
                        "type": "integer",
                        "description": "Limit the number of retries for each URL, If an error occurs during the data scraping process."
                    },
                    "proxy": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Select proxies to be used by your scraper."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
