# Reddit Scraper (`janbruinier/jan-reddit-scraper`) Actor

Scrape posts and comments from Reddit

- **URL**: https://apify.com/janbruinier/jan-reddit-scraper.md
- **Developed by:** [Jan Bruinier](https://apify.com/janbruinier) (community)
- **Categories:** Social media
- **Stats:** 9 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Reddit Scraper

Scrape posts and comments from any subreddit. Uses Reddit's public JSON API -- no API key or login required.

### What it does

This actor pulls posts from any public subreddit with full metadata: scores, authors, timestamps, flairs, and more. Optionally fetches the entire comment tree for each post with configurable depth. Supports sorting by hot/new/top/rising and searching within subreddits.

### Use cases

- **Market research:** Track what people are saying about your product or industry on Reddit.
- **Content research:** Find trending topics and discussions in any niche.
- **Sentiment analysis:** Collect posts and comments for NLP analysis.
- **Competitor monitoring:** Set up scheduled runs to track mentions of competitor brands.
- **Academic research:** Gather discussion data for social science or linguistics studies.
- **Lead generation:** Find people asking for recommendations in your product category.

### Input options

| Parameter | Description | Default |
|-----------|-------------|---------|
| Subreddit | Subreddit name without "r/" prefix | Required |
| Sort Order | hot, new, top, or rising | hot |
| Timeframe | Time filter for "top" sort (hour/day/week/month/year/all) | week |
| Search Query | Search within the subreddit | Empty (no search) |
| Include Comments | Also scrape comments for each post | No |
| Max Comment Depth | How deep to go into reply chains (1-10) | 3 |
| Max Posts | Number of posts to scrape (up to 1000) | 25 |

### Output format

#### Posts:

```json
{
    "_type": "post",
    "post_id": "abc123",
    "title": "What's the best Python web framework in 2024?",
    "author": "pythondev42",
    "subreddit": "python",
    "score": 342,
    "upvote_ratio": 0.95,
    "num_comments": 89,
    "created_utc": "2024-01-15T10:30:00+00:00",
    "selftext": "I've been using Flask for years but...",
    "url": "https://reddit.com/r/python/comments/abc123/...",
    "permalink": "https://www.reddit.com/r/python/comments/abc123/...",
    "flair": "Discussion",
    "is_self": true,
    "is_video": false,
    "domain": "self.python",
    "awards_count": 2
}
````

#### Comments (when enabled):

```json
{
    "_type": "comment",
    "comment_id": "xyz789",
    "post_id": "abc123",
    "author": "django_fan",
    "body": "Django is still the best for larger projects...",
    "score": 156,
    "created_utc": "2024-01-15T11:45:00+00:00",
    "depth": 0,
    "parent_id": "t3_abc123",
    "is_submitter": false,
    "awards_count": 1
}
```

### How it works

The actor appends `.json` to Reddit URLs to get structured data without needing the official API. It handles pagination automatically and respects Reddit's rate limits with built-in delays and retry logic.

### Rate limits

Reddit rate-limits unauthenticated JSON requests. The actor handles this automatically:

- 1 second delay between post page fetches
- 1.5 second delay between comment fetches
- Automatic retry with backoff on 429 responses

For large scrapes (100+ posts with comments), runs may take several minutes.

### Tips

- Start without comments to quickly scan posts, then re-run with comments for the ones you care about.
- Use the search feature to find niche discussions within large subreddits.
- Sort by "top" with timeframe "all" to get the most popular posts of all time.
- Schedule hourly runs with "new" sort to catch every post in a subreddit.

# Actor input Schema

## `subreddit` (type: `string`):

Subreddit name without the 'r/' prefix (e.g., 'python', 'datascience', 'webdev').

## `sort` (type: `string`):

How to sort posts.

## `timeframe` (type: `string`):

Time period filter (only applies when sort is 'top').

## `searchQuery` (type: `string`):

Search for posts containing specific keywords within the subreddit. Leave empty to get posts by sort order.

## `includeComments` (type: `boolean`):

Also scrape comments for each post. This makes the run slower but gives you full discussion data.

## `maxCommentDepth` (type: `integer`):

Maximum depth of comment replies to fetch (1 = top-level only). Only used if Include Comments is enabled.

## `maxPosts` (type: `integer`):

Maximum number of posts to scrape. Reddit returns up to 100 per page.

## Actor input object example

```json
{
  "subreddit": "python",
  "sort": "hot",
  "timeframe": "week",
  "searchQuery": "",
  "includeComments": false,
  "maxCommentDepth": 3,
  "maxPosts": 25
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "subreddit": "python"
};

// Run the Actor and wait for it to finish
const run = await client.actor("janbruinier/jan-reddit-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "subreddit": "python" }

# Run the Actor and wait for it to finish
run = client.actor("janbruinier/jan-reddit-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "subreddit": "python"
}' |
apify call janbruinier/jan-reddit-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=janbruinier/jan-reddit-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Reddit Scraper",
        "description": "Scrape posts and comments from Reddit",
        "version": "1.0",
        "x-build-id": "J4FzPnkHxmBu90yQd"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/janbruinier~jan-reddit-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-janbruinier-jan-reddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/janbruinier~jan-reddit-scraper/runs": {
            "post": {
                "operationId": "runs-sync-janbruinier-jan-reddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/janbruinier~jan-reddit-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-janbruinier-jan-reddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "subreddit"
                ],
                "properties": {
                    "subreddit": {
                        "title": "Subreddit",
                        "type": "string",
                        "description": "Subreddit name without the 'r/' prefix (e.g., 'python', 'datascience', 'webdev')."
                    },
                    "sort": {
                        "title": "Sort Order",
                        "enum": [
                            "hot",
                            "new",
                            "top",
                            "rising"
                        ],
                        "type": "string",
                        "description": "How to sort posts.",
                        "default": "hot"
                    },
                    "timeframe": {
                        "title": "Timeframe",
                        "enum": [
                            "hour",
                            "day",
                            "week",
                            "month",
                            "year",
                            "all"
                        ],
                        "type": "string",
                        "description": "Time period filter (only applies when sort is 'top').",
                        "default": "week"
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Search for posts containing specific keywords within the subreddit. Leave empty to get posts by sort order.",
                        "default": ""
                    },
                    "includeComments": {
                        "title": "Include Comments",
                        "type": "boolean",
                        "description": "Also scrape comments for each post. This makes the run slower but gives you full discussion data.",
                        "default": false
                    },
                    "maxCommentDepth": {
                        "title": "Max Comment Depth",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Maximum depth of comment replies to fetch (1 = top-level only). Only used if Include Comments is enabled.",
                        "default": 3
                    },
                    "maxPosts": {
                        "title": "Maximum Posts",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of posts to scrape. Reddit returns up to 100 per page.",
                        "default": 25
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
