# Reddit Scraper — Detect pain points, leads, emerging trends (`runtime/reddit-scraper`) Actor

Scrape Reddit posts, comments, communities, and user profiles via URLs or keyword searches. Supports proxy rotation, flexible filters, custom field names, and automatic retries. Ideal for monitoring discussions, trend analysis, research, and large-scale data collection.

- **URL**: https://apify.com/runtime/reddit-scraper.md
- **Developed by:** [scraping automation](https://apify.com/runtime) (community)
- **Categories:** Lead generation, Social media, E-commerce
- **Stats:** 26 total users, 1 monthly users, 100.0% runs succeeded, 3 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

$29.00/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### Reddit Scraper

Reddit Scraper is a comprehensive Apify Actor that collects posts, comments, communities, user profiles, and leaderboards from Reddit. Each entity saved in the dataset uses differentiated output fields (`entityType`, `headline`, `mediaBundle`, `communityTag`, `subscriberTotal`, `karmaPost`, etc.) to facilitate integration and avoid conflicts with other data sources.

#### Main Features

##### 🎯 Complete Coverage
- **Posts**: Title, content, media, votes, comments, complete metadata
- **Comments**: Complete tree structure, votes, depth, replies
- **Communities/Subreddits**: Metadata, members, descriptions, icons
- **User Profiles**: Karma, post/comment history, metadata
- **Leaderboards**: Rankings of popular subreddits by category

##### 🔧 Flexibility and Control
- **URL-based Scraping**: Supports all Reddit formats (posts, users, communities, leaderboards, searches, **multireddits**)
- **Keyword-based Scraping**: Automatic search with configurable scope (Posts or Communities & users)
- **Advanced Sorting**: 5 options (relevance, hot, top, new, comments)
- **Temporal Filters**: By hour, day, week, month, year
- **Granular Limits**: 6 independent cap types (total, post, comment, community, profile, leaderboard)
- **Score Filters**: Automatically exclude posts/comments with low scores
- **NSFW Filters**: Option to exclude NSFW content
- **Absolute Date Filters**: Filter by precise date range (dateFrom, dateTo)
- **Multireddits**: Support for URLs combining multiple subreddits (e.g., `/r/pics+funny`)
- **Automatic Pagination**: Collect more than 100 items by automatically paginating
- **Deduplication**: Avoid duplicates within the same run

##### 🛡️ Robustness and Reliability
- **Automatic Retry**: Intelligent handling of 403/429 errors with proxy rotation
- **Flexible Proxy Configuration**: Apify Proxy (residential/datacenter) or custom proxies
- **Automatic Fallback**: Default URL if no input is provided (ideal for automated tests)
- **Debug Mode**: Detailed logging for quick diagnostics
- **Configurable Concurrency**: Adjust the number of parallel requests
- **Performance Metrics**: Detailed statistics at end of run (items/sec, duration, applied filters, duplicates)

##### 🎨 Customization
- **Light Extract Mode**: Extract only essential fields (`permalink`, `headline`, `textBody`) - perfect for AI processing and minimal data needs
- **Differentiated Output Fields**: Unique names to facilitate integration
- **Extend Result Function**: Custom enrichment of each item
- **Output Format**: JSON, CSV, XML, HTML, Excel via Apify interface

#### Main Input Parameters

| Field | Type | Default | Description |
| --- | --- | --- | --- |
| `startLinks` | array | `[]` | Reddit URLs to crawl (posts, communities, users, leaderboards, searches). If empty and no `searchQueries`, automatically uses `/r/popular/` as fallback. |
| `searchQueries` | array\<string> | `[]` | Keywords to run a Reddit search. |
| `searchScope` | enum | `posts` | `posts` or `communities` to target the search tab. |
| `sortOrder` | enum | `relevance` | `relevance`, `hot`, `top`, `new`, `comments` (5 available options). |
| `timeWindow` | enum | `all` | `all`, `hour`, `day`, `week`, `month`, `year` (for posts). |
| `totalItemCap` | integer | `100` | Global limit of items in the dataset. |
| `postCap` | integer | `50` | Maximum posts per subreddit/feed/user. |
| `commentCap` | integer | `25` | Maximum comments per post. |
| `communityCap` | integer | `25` | Maximum communities from leaderboards/searches. |
| `profileCap` | integer | `25` | Maximum user profiles from searches. |
| `leaderboardCap` | integer | `25` | Number of entries from `/subreddits/leaderboard`. |
| `scrollWaitSeconds` | integer | `30` | Wait delay between retries on 403/429 errors. |
| `maxConcurrency` | integer | `10` | Maximum number of parallel HTTP requests. |
| `useApifyProxy` | boolean | `true` | Enable Apify Proxy (recommended to avoid 403 errors). |
| `proxyConfiguration` | object | `{}` | Detailed proxy configuration (Apify or custom). |
| `extendResultFunction` | string | - | JavaScript function to enrich each item. |
| `debugLog` | boolean | `false` | Enable detailed logging for diagnostics. |
| `minScore` | integer | `null` | Minimum score to filter posts and comments. Items with lower scores will be excluded. |
| `includeNSFW` | boolean | `true` | When `false`, excludes NSFW posts and communities from results. |
| `logMetrics` | boolean | `true` | Displays performance statistics at end of run (items/sec, duration, errors, filters). |
| `enablePagination` | boolean | `false` | Enables automatic pagination to collect more than 100 items per listing. |
| `dateFrom` | string | `null` | Start date to filter items (ISO 8601 format, e.g., `2024-01-01T00:00:00Z`). |
| `dateTo` | string | `null` | End date to filter items (ISO 8601 format, e.g., `2024-12-31T23:59:59Z`). |
| `enableDeduplication` | boolean | `false` | Enables deduplication to avoid duplicates (by `entityId`) within the same run. |
| `lightExtract` | boolean | `false` | When `true`, only extracts minimal fields: `permalink`, `headline`, and `textBody`. Applies to posts and comments only. Perfect for AI processing. |

#### Input Example
```json
{
  "startLinks": [
    { "url": "https://www.reddit.com/r/worldnews/" },
    { "url": "https://www.reddit.com/r/learnprogramming/comments/lp1hi4/is_webscraping_a_good_skill_to_learn_as_a_beginner/" },
    { "url": "https://www.reddit.com/subreddits/leaderboard/" },
    { "url": "https://www.reddit.com/r/pics+funny/" }
  ],
  "searchQueries": ["parrots"],
  "searchScope": "communities",
  "sortOrder": "new",
  "timeWindow": "all",
  "totalItemCap": 20,
  "postCap": 10,
  "commentCap": 5,
  "communityCap": 15,
  "leaderboardCap": 25,
  "maxConcurrency": 10,
  "scrollWaitSeconds": 30,
  "useApifyProxy": true,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  },
  "debugLog": false,
  "minScore": 10,
  "includeNSFW": false,
  "logMetrics": true,
  "enablePagination": true,
  "dateFrom": "2024-01-01T00:00:00Z",
  "dateTo": "2024-12-31T23:59:59Z",
  "enableDeduplication": true,
  "lightExtract": false
}
````

#### Use Cases

- **AI Processing**: Use `lightExtract: true` to get clean, minimal data perfect for AI models, sentiment analysis, and NLP tasks
- **Brand Monitoring**: Track discussions about your product or service
- **Trend Research**: Identify popular topics by community
- **Sentiment Analysis**: Collect comments for NLP analysis
- **Community Discovery**: Explore leaderboards by category
- **Competitive Intelligence**: Monitor competitor mentions
- **Academic Research**: Collect data for social studies
- **Content Curation**: Find relevant content by keywords

#### Example Output

##### Full Extract (default)

```json
{
  "entityType": "post",
  "entityId": "t3_144w7sn",
  "redditId": "144w7sn",
  "permalink": "https://www.reddit.com/r/HonkaiStarRail/comments/144w7sn/my_luckiest_10x_pull_yet/",
  "headline": "My Luckiest 10x Pull Yet",
  "textBody": "URL: https://i.redd.it/yod3okjkgx4b1.jpg",
  "mediaBundle": {
    "primaryUrl": "https://i.redd.it/yod3okjkgx4b1.jpg",
    "thumbnailUrl": "https://b.thumbs.redditmedia.com/lm9KxS4laQWgx4uOoioM3N7-tBK3GLPrxb9da2hGtjs.jpg",
    "isVideo": false
  },
  "authorHandle": "YourKingLives",
  "communityTag": "r/HonkaiStarRail",
  "voteScore": 1,
  "commentTotal": 0,
  "createdAt": "2023-06-09T05:23:15.000Z",
  "collectedAt": "2025-11-20T10:00:00.000Z"
}
```

##### Light Extract (`lightExtract: true`)

Perfect for AI processing, data analysis, or when you only need essential content:

```json
{
  "permalink": "https://www.reddit.com/r/HonkaiStarRail/comments/144w7sn/my_luckiest_10x_pull_yet/",
  "headline": "My Luckiest 10x Pull Yet",
  "textBody": "URL: https://i.redd.it/yod3okjkgx4b1.jpg"
}
```

**Note:** Light extract mode only applies to posts and comments. Communities and profiles are not included when `lightExtract: true`.

#### Quick Start

1. Open the actor in the Apify console
2. Configure input parameters (or use default values)
3. Click **Start** and wait for the run to complete
4. Download results from the **Dataset** tab (JSON, CSV, XML, HTML, Excel)

**Note:** If you don't provide `startLinks` or `searchQueries`, the actor automatically uses `/r/popular/` as a starting point, ensuring a valid run even for automated tests.

#### Key Advantages

##### Differentiated Output Fields

Data is structured with unique field names (`entityType`, `headline`, `mediaBundle`, `communityTag`, `subscriberTotal`, `karmaPost`, etc.) to facilitate integration and avoid conflicts with other data sources.

##### Automatic Robustness

- Automatic retry with proxy rotation on 403/429 errors
- Intelligent rate limit handling
- Automatic fallback if no input is provided
- Debug mode for quick diagnostics

##### Advanced Configuration

- Granular control with 6 independent cap types
- Adjustable concurrency and delays
- Complete support for Reddit leaderboards
- 5 sorting options (including "comments")
- Automatic pagination to collect large volumes
- Absolute date filters for precise historical analysis
- Automatic deduplication to avoid duplicates

#### Technical Notes

- `extendResultFunction` receives `{ data, page }`; `page` is `null` because we use Reddit's JSON API.
- `extendResultFunction` is not supported in light extract mode (data structure is intentionally minimal).
- Always respect Reddit's usage rules and avoid unreasonable volumes.
- Using Apify Proxy (residential recommended) is strongly advised to avoid 403 blocks.
- When `lightExtract: true`, only posts and comments are extracted with minimal fields (`permalink`, `headline`, `textBody`). Communities and profiles are skipped.

#### Legal Disclaimer

**Important:** This Actor scrapes publicly available data from Reddit. By using this Actor, you acknowledge and agree to the following:

1. **Reddit Terms of Service**: You are responsible for complying with Reddit's Terms of Service and User Agreement. Reddit's ToS can be found at https://www.reddit.com/help/useragreement.

2. **Rate Limiting**: This Actor includes automatic retry logic and proxy rotation to handle rate limits. However, you must use reasonable request rates and avoid excessive scraping that could impact Reddit's servers.

3. **Data Usage**: The scraped data is for your personal or business use only. You must respect copyright, privacy rights, and any applicable data protection laws (such as GDPR, CCPA) when using the collected data.

4. **No Warranty**: This Actor is provided "as is" without any warranties. The developers are not responsible for any consequences arising from the use of this Actor, including but not limited to account bans, legal issues, or data inaccuracies.

5. **User Responsibility**: You are solely responsible for ensuring that your use of this Actor complies with all applicable laws and regulations in your jurisdiction. This includes respecting intellectual property rights, privacy laws, and terms of service of third-party platforms.

6. **Prohibited Uses**: Do not use this Actor to:
   - Scrape private or restricted content
   - Violate Reddit's API usage policies
   - Collect personal information without consent
   - Engage in any illegal activities

**Recommendation**: For production use, consider using Reddit's official API when possible, as it provides a more reliable and compliant way to access Reddit data.

# Actor input Schema

## `startLinks` (type: `array`):

Direct Reddit URLs (posts, users, communities, leaderboards, search results) to crawl.

## `searchQueries` (type: `array`):

Keywords to run through Reddit search when no start               @ links are supplied.

## `searchScope` (type: `string`):

Which tab of Reddit search results to crawl when search queries are used.

## `lightExtract` (type: `boolean`):

When true, only extracts minimal fields: permalink, headline, and textBody. Applies to posts and comments only.

## `sortOrder` (type: `string`):

Sorting applied to searches or feed style URLs.

## `timeWindow` (type: `string`):

Temporal filter for searches (posts only).

## `totalItemCap` (type: `integer`):

Stops the run after this many dataset items have been stored.

## `postCap` (type: `integer`):

Maximum posts to take from a single subreddit/feed/user listing.

## `commentCap` (type: `integer`):

Maximum number of comments to fetch for each post.

## `communityCap` (type: `integer`):

Maximum number of communities to take from leaderboards or searches.

## `profileCap` (type: `integer`):

Maximum number of user profiles to take from searches.

## `leaderboardCap` (type: `integer`):

Number of leaderboard entries to collect when crawling /subreddits/leaderboard.

## `scrollWaitSeconds` (type: `integer`):

How long to wait between dynamic pagination requests on infinite feeds.

## `maxConcurrency` (type: `integer`):

Maximum number of parallel HTTP requests.

## `useApifyProxy` (type: `boolean`):

Whether to use Apify Proxy

## `proxyConfiguration` (type: `object`):

Choose to use no proxy, Apify Proxy, or provide custom proxy URLs.

## `extendResultFunction` (type: `string`):

Javascript function executed for each dataset item. Receives { data, page } where data is the item object and page is always null (no browser context). Return an object to merge with the item. Note: This function is not executed when lightExtract is enabled.

## `debugLog` (type: `boolean`):

When true, prints extra diagnostic information into the actor log.

## `minScore` (type: `integer`):

Filter posts and comments by minimum score. Items with score below this value will be excluded. Leave empty to disable filtering.

## `includeNSFW` (type: `boolean`):

When false, excludes NSFW posts and communities from results.

## `logMetrics` (type: `boolean`):

When true, logs performance statistics at the end of the run (items/sec, duration, errors).

## `enablePagination` (type: `boolean`):

When true, automatically paginates through listings to collect more items beyond the initial 100 items per request.

## `dateFrom` (type: `string`):

Filter items created after this date (ISO 8601 format, e.g., 2024-01-01T00:00:00Z). Leave empty to disable.

## `dateTo` (type: `string`):

Filter items created before this date (ISO 8601 format, e.g., 2024-12-31T23:59:59Z). Leave empty to disable.

## `enableDeduplication` (type: `boolean`):

When true, prevents duplicate items (by entityId) from being added to the dataset within the same run.

## `filterByFlair` (type: `array`):

Only include posts with these flair labels. Leave empty to disable filtering.

## `excludeFlairs` (type: `array`):

Exclude posts with these flair labels.

## `minComments` (type: `integer`):

Filter posts by minimum number of comments. Leave empty to disable filtering.

## `minUpvoteRatio` (type: `number`):

Filter posts by minimum upvote ratio (0-1). Leave empty to disable filtering.

## `authorWhitelist` (type: `array`):

Only include posts/comments from these authors. Leave empty to disable filtering.

## `authorBlacklist` (type: `array`):

Exclude posts/comments from these authors.

## `subredditWhitelist` (type: `array`):

Only include posts from these subreddits. Leave empty to disable filtering.

## `subredditBlacklist` (type: `array`):

Exclude posts from these subreddits.

## `maxRetries` (type: `integer`):

Maximum number of retry attempts for failed requests.

## `retryDelaySeconds` (type: `integer`):

Delay in seconds between retry attempts.

## `skipOnError` (type: `boolean`):

When true, continue processing other URLs if one fails.

## `continueOn403` (type: `boolean`):

When true, continue processing even after receiving 403 errors.

## `requestDelayMs` (type: `integer`):

Delay in milliseconds between requests to avoid rate limiting.

## `timeoutSeconds` (type: `integer`):

Timeout in seconds for each HTTP request.

## `batchSize` (type: `integer`):

Process URLs in batches of this size. Leave empty to disable batching.

## `includeAwards` (type: `boolean`):

When true, includes award information (total awards, awarders, etc.) in the output.

## `includeCrossposts` (type: `boolean`):

When true, detects and includes crosspost information.

## `extractMentions` (type: `boolean`):

When true, extracts mentioned usernames (/u/username) from post and comment text.

## `includeMediaMetadata` (type: `boolean`):

When true, includes detailed media metadata (dimensions, duration, bitrate, etc.).

## `outputFormat` (type: `string`):

Format for output data.

## `includeRawData` (type: `boolean`):

When true, includes the raw field with all original Reddit data.

## `customFields` (type: `array`):

Select only these specific fields to include in output. Leave empty to include all fields.

## `searchInSubreddit` (type: `string`):

Limit all search queries to this specific subreddit. Leave empty to search globally.

## `searchRestrictToSubreddit` (type: `boolean`):

When true, uses restrict\_sr=true for all searches, limiting results to the subreddit context.

## `searchType` (type: `string`):

Explicit search type override. Leave empty to use searchScope setting.

## `proxyRotationStrategy` (type: `string`):

Strategy for rotating between proxy servers.

## `proxySessionId` (type: `string`):

Session ID for sticky proxy sessions. Used with sticky rotation strategy.

## `logLevel` (type: `string`):

Minimum log level to display. Only messages at or above this level will be shown.

## `trackPerformance` (type: `boolean`):

When true, tracks and displays detailed performance metrics.

## Actor input object example

```json
{
  "startLinks": [
    {
      "url": "https://www.reddit.com/r/worldnews/"
    }
  ],
  "searchQueries": [
    "parrots"
  ],
  "searchScope": "posts",
  "lightExtract": false,
  "sortOrder": "relevance",
  "timeWindow": "all",
  "totalItemCap": 100,
  "postCap": 50,
  "commentCap": 25,
  "communityCap": 25,
  "profileCap": 25,
  "leaderboardCap": 25,
  "scrollWaitSeconds": 30,
  "maxConcurrency": 10,
  "useApifyProxy": true,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": []
  },
  "extendResultFunction": "async ({ data, page }) => {\n  return {\n    customField: data.headline ? data.headline.substring(0, 50) : null,\n    processedAt: new Date().toISOString()\n  };\n}",
  "debugLog": false,
  "includeNSFW": true,
  "logMetrics": true,
  "enablePagination": false,
  "enableDeduplication": false,
  "filterByFlair": [],
  "excludeFlairs": [],
  "authorWhitelist": [],
  "authorBlacklist": [],
  "subredditWhitelist": [],
  "subredditBlacklist": [],
  "maxRetries": 3,
  "retryDelaySeconds": 30,
  "skipOnError": true,
  "continueOn403": true,
  "requestDelayMs": 0,
  "timeoutSeconds": 30,
  "includeAwards": false,
  "includeCrossposts": false,
  "extractMentions": false,
  "includeMediaMetadata": false,
  "outputFormat": "json",
  "includeRawData": true,
  "searchRestrictToSubreddit": false,
  "proxyRotationStrategy": "round-robin",
  "logLevel": "info",
  "trackPerformance": true
}
```

# Actor output Schema

## `dataset` (type: `string`):

Dataset containing scraped Reddit entities. Each item can be a post, comment, community, or profile with fields like entityType, permalink, headline, textBody, and entity-specific metadata.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startLinks": [
        {
            "url": "https://www.reddit.com/r/worldnews/"
        }
    ],
    "searchQueries": [
        "parrots"
    ],
    "useApifyProxy": true,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": []
    },
    "extendResultFunction": async ({ data, page }) => {
      return {
        customField: data.headline ? data.headline.substring(0, 50) : null,
        processedAt: new Date().toISOString()
      };
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("runtime/reddit-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startLinks": [{ "url": "https://www.reddit.com/r/worldnews/" }],
    "searchQueries": ["parrots"],
    "useApifyProxy": True,
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": [],
    },
    "extendResultFunction": """async ({ data, page }) => {
  return {
    customField: data.headline ? data.headline.substring(0, 50) : null,
    processedAt: new Date().toISOString()
  };
}""",
}

# Run the Actor and wait for it to finish
run = client.actor("runtime/reddit-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startLinks": [
    {
      "url": "https://www.reddit.com/r/worldnews/"
    }
  ],
  "searchQueries": [
    "parrots"
  ],
  "useApifyProxy": true,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": []
  },
  "extendResultFunction": "async ({ data, page }) => {\\n  return {\\n    customField: data.headline ? data.headline.substring(0, 50) : null,\\n    processedAt: new Date().toISOString()\\n  };\\n}"
}' |
apify call runtime/reddit-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=runtime/reddit-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Reddit Scraper — Detect pain points, leads, emerging trends",
        "description": "Scrape Reddit posts, comments, communities, and user profiles via URLs or keyword searches. Supports proxy rotation, flexible filters, custom field names, and automatic retries. Ideal for monitoring discussions, trend analysis, research, and large-scale data collection.",
        "version": "0.8",
        "x-build-id": "dYbwfzinEIAQEr1sa"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/runtime~reddit-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-runtime-reddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/runtime~reddit-scraper/runs": {
            "post": {
                "operationId": "runs-sync-runtime-reddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/runtime~reddit-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-runtime-reddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startLinks": {
                        "title": "Start links",
                        "type": "array",
                        "description": "Direct Reddit URLs (posts, users, communities, leaderboards, search results) to crawl.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "searchQueries": {
                        "title": "Search queries",
                        "type": "array",
                        "description": "Keywords to run through Reddit search when no start               @ links are supplied.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchScope": {
                        "title": "Search scope",
                        "enum": [
                            "posts",
                            "communities"
                        ],
                        "type": "string",
                        "description": "Which tab of Reddit search results to crawl when search queries are used.",
                        "default": "posts"
                    },
                    "lightExtract": {
                        "title": "Light extract",
                        "type": "boolean",
                        "description": "When true, only extracts minimal fields: permalink, headline, and textBody. Applies to posts and comments only.",
                        "default": false
                    },
                    "sortOrder": {
                        "title": "Sort order",
                        "enum": [
                            "relevance",
                            "hot",
                            "top",
                            "new",
                            "comments"
                        ],
                        "type": "string",
                        "description": "Sorting applied to searches or feed style URLs.",
                        "default": "relevance"
                    },
                    "timeWindow": {
                        "title": "Time window",
                        "enum": [
                            "all",
                            "hour",
                            "day",
                            "week",
                            "month",
                            "year"
                        ],
                        "type": "string",
                        "description": "Temporal filter for searches (posts only).",
                        "default": "all"
                    },
                    "totalItemCap": {
                        "title": "Overall item limit",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Stops the run after this many dataset items have been stored.",
                        "default": 100
                    },
                    "postCap": {
                        "title": "Per feed post cap",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum posts to take from a single subreddit/feed/user listing.",
                        "default": 50
                    },
                    "commentCap": {
                        "title": "Per post comment cap",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of comments to fetch for each post.",
                        "default": 25
                    },
                    "communityCap": {
                        "title": "Community cap",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of communities to take from leaderboards or searches.",
                        "default": 25
                    },
                    "profileCap": {
                        "title": "Profile cap",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of user profiles to take from searches.",
                        "default": 25
                    },
                    "leaderboardCap": {
                        "title": "Leaderboard cap",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Number of leaderboard entries to collect when crawling /subreddits/leaderboard.",
                        "default": 25
                    },
                    "scrollWaitSeconds": {
                        "title": "Scroll wait (sec)",
                        "minimum": 1,
                        "type": "integer",
                        "description": "How long to wait between dynamic pagination requests on infinite feeds.",
                        "default": 30
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of parallel HTTP requests.",
                        "default": 10
                    },
                    "useApifyProxy": {
                        "title": "Use Apify Proxy",
                        "type": "boolean",
                        "description": "Whether to use Apify Proxy",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Choose to use no proxy, Apify Proxy, or provide custom proxy URLs.",
                        "default": {}
                    },
                    "extendResultFunction": {
                        "title": "Extend result function",
                        "type": "string",
                        "description": "Javascript function executed for each dataset item. Receives { data, page } where data is the item object and page is always null (no browser context). Return an object to merge with the item. Note: This function is not executed when lightExtract is enabled."
                    },
                    "debugLog": {
                        "title": "Verbose logging",
                        "type": "boolean",
                        "description": "When true, prints extra diagnostic information into the actor log.",
                        "default": false
                    },
                    "minScore": {
                        "title": "Minimum score",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Filter posts and comments by minimum score. Items with score below this value will be excluded. Leave empty to disable filtering."
                    },
                    "includeNSFW": {
                        "title": "Include NSFW content",
                        "type": "boolean",
                        "description": "When false, excludes NSFW posts and communities from results.",
                        "default": true
                    },
                    "logMetrics": {
                        "title": "Log performance metrics",
                        "type": "boolean",
                        "description": "When true, logs performance statistics at the end of the run (items/sec, duration, errors).",
                        "default": true
                    },
                    "enablePagination": {
                        "title": "Enable pagination",
                        "type": "boolean",
                        "description": "When true, automatically paginates through listings to collect more items beyond the initial 100 items per request.",
                        "default": false
                    },
                    "dateFrom": {
                        "title": "Date from",
                        "type": "string",
                        "description": "Filter items created after this date (ISO 8601 format, e.g., 2024-01-01T00:00:00Z). Leave empty to disable."
                    },
                    "dateTo": {
                        "title": "Date to",
                        "type": "string",
                        "description": "Filter items created before this date (ISO 8601 format, e.g., 2024-12-31T23:59:59Z). Leave empty to disable."
                    },
                    "enableDeduplication": {
                        "title": "Enable deduplication",
                        "type": "boolean",
                        "description": "When true, prevents duplicate items (by entityId) from being added to the dataset within the same run.",
                        "default": false
                    },
                    "filterByFlair": {
                        "title": "Filter by flair",
                        "type": "array",
                        "description": "Only include posts with these flair labels. Leave empty to disable filtering.",
                        "items": {
                            "type": "string"
                        },
                        "default": []
                    },
                    "excludeFlairs": {
                        "title": "Exclude flairs",
                        "type": "array",
                        "description": "Exclude posts with these flair labels.",
                        "items": {
                            "type": "string"
                        },
                        "default": []
                    },
                    "minComments": {
                        "title": "Minimum comments",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Filter posts by minimum number of comments. Leave empty to disable filtering."
                    },
                    "minUpvoteRatio": {
                        "title": "Minimum upvote ratio",
                        "minimum": 0,
                        "maximum": 1,
                        "type": "number",
                        "description": "Filter posts by minimum upvote ratio (0-1). Leave empty to disable filtering."
                    },
                    "authorWhitelist": {
                        "title": "Author whitelist",
                        "type": "array",
                        "description": "Only include posts/comments from these authors. Leave empty to disable filtering.",
                        "items": {
                            "type": "string"
                        },
                        "default": []
                    },
                    "authorBlacklist": {
                        "title": "Author blacklist",
                        "type": "array",
                        "description": "Exclude posts/comments from these authors.",
                        "items": {
                            "type": "string"
                        },
                        "default": []
                    },
                    "subredditWhitelist": {
                        "title": "Subreddit whitelist",
                        "type": "array",
                        "description": "Only include posts from these subreddits. Leave empty to disable filtering.",
                        "items": {
                            "type": "string"
                        },
                        "default": []
                    },
                    "subredditBlacklist": {
                        "title": "Subreddit blacklist",
                        "type": "array",
                        "description": "Exclude posts from these subreddits.",
                        "items": {
                            "type": "string"
                        },
                        "default": []
                    },
                    "maxRetries": {
                        "title": "Max retries",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of retry attempts for failed requests.",
                        "default": 3
                    },
                    "retryDelaySeconds": {
                        "title": "Retry delay (seconds)",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Delay in seconds between retry attempts.",
                        "default": 30
                    },
                    "skipOnError": {
                        "title": "Skip on error",
                        "type": "boolean",
                        "description": "When true, continue processing other URLs if one fails.",
                        "default": true
                    },
                    "continueOn403": {
                        "title": "Continue on 403",
                        "type": "boolean",
                        "description": "When true, continue processing even after receiving 403 errors.",
                        "default": true
                    },
                    "requestDelayMs": {
                        "title": "Request delay (ms)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Delay in milliseconds between requests to avoid rate limiting.",
                        "default": 0
                    },
                    "timeoutSeconds": {
                        "title": "Request timeout (seconds)",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Timeout in seconds for each HTTP request.",
                        "default": 30
                    },
                    "batchSize": {
                        "title": "Batch size",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Process URLs in batches of this size. Leave empty to disable batching."
                    },
                    "includeAwards": {
                        "title": "Include awards",
                        "type": "boolean",
                        "description": "When true, includes award information (total awards, awarders, etc.) in the output.",
                        "default": false
                    },
                    "includeCrossposts": {
                        "title": "Include crossposts",
                        "type": "boolean",
                        "description": "When true, detects and includes crosspost information.",
                        "default": false
                    },
                    "extractMentions": {
                        "title": "Extract user mentions",
                        "type": "boolean",
                        "description": "When true, extracts mentioned usernames (/u/username) from post and comment text.",
                        "default": false
                    },
                    "includeMediaMetadata": {
                        "title": "Include media metadata",
                        "type": "boolean",
                        "description": "When true, includes detailed media metadata (dimensions, duration, bitrate, etc.).",
                        "default": false
                    },
                    "outputFormat": {
                        "title": "Output format",
                        "enum": [
                            "json",
                            "csv",
                            "jsonl"
                        ],
                        "type": "string",
                        "description": "Format for output data.",
                        "default": "json"
                    },
                    "includeRawData": {
                        "title": "Include raw data",
                        "type": "boolean",
                        "description": "When true, includes the raw field with all original Reddit data.",
                        "default": true
                    },
                    "customFields": {
                        "title": "Custom fields",
                        "type": "array",
                        "description": "Select only these specific fields to include in output. Leave empty to include all fields.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchInSubreddit": {
                        "title": "Search in subreddit",
                        "type": "string",
                        "description": "Limit all search queries to this specific subreddit. Leave empty to search globally."
                    },
                    "searchRestrictToSubreddit": {
                        "title": "Restrict search to subreddit",
                        "type": "boolean",
                        "description": "When true, uses restrict_sr=true for all searches, limiting results to the subreddit context.",
                        "default": false
                    },
                    "searchType": {
                        "title": "Search type",
                        "enum": [
                            "link",
                            "user",
                            "sr"
                        ],
                        "type": "string",
                        "description": "Explicit search type override. Leave empty to use searchScope setting."
                    },
                    "proxyRotationStrategy": {
                        "title": "Proxy rotation strategy",
                        "enum": [
                            "round-robin",
                            "random",
                            "sticky"
                        ],
                        "type": "string",
                        "description": "Strategy for rotating between proxy servers.",
                        "default": "round-robin"
                    },
                    "proxySessionId": {
                        "title": "Proxy session ID",
                        "type": "string",
                        "description": "Session ID for sticky proxy sessions. Used with sticky rotation strategy."
                    },
                    "logLevel": {
                        "title": "Log level",
                        "enum": [
                            "error",
                            "warn",
                            "info",
                            "debug"
                        ],
                        "type": "string",
                        "description": "Minimum log level to display. Only messages at or above this level will be shown.",
                        "default": "info"
                    },
                    "trackPerformance": {
                        "title": "Track performance",
                        "type": "boolean",
                        "description": "When true, tracks and displays detailed performance metrics.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```