# Substack Scraper – Posts, Comments & Notes (`sourabhbgp/substack-scraper`) Actor

Scrape every Substack surface in one actor — full posts (50+ fields, complete article HTML), nested comment threads, emoji reaction breakdown, Substack Notes, restacker identity, multi-byline authors, custom domains. Direct JSON API + RSS, no browser, no Cloudflare. From $0.30 per 1,000 posts.

- **URL**: https://apify.com/sourabhbgp/substack-scraper.md
- **Developed by:** [Sourabh Kumar](https://apify.com/sourabhbgp) (community)
- **Categories:** News, Lead generation, Social media
- **Stats:** 33 total users, 14 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.30 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

Scrape any Substack newsletter for posts, full article content, comment threads, Substack Notes, and reactor identity — talking to Substack's own JSON endpoints, not fighting a browser. Custom domains and `.substack.com` subdomains both work. No login, no proxy required (but [optional Apify Proxy](#input) is supported for very large runs).

**$0.30 per 1,000 results.** Each scraped post, comment, or Note counts as one result. Reactor and restacker identity (opt-in via `includeFacepile`) is $0.20 per 1,000 entries.

Designed for the global Substack community: most large publications are English, but the actor handles UTF-8 cleanly so non-English newsletters (German, Spanish, Arabic, Japanese, Hebrew) come through with original characters intact.

### Why this scraper

Substack doesn't publish a content API. Most actors on the Apify Store fight Cloudflare with a headless browser and split the surface across 3 to 6 separate scrapers — one for posts, one for comments, one for Notes, one for the leaderboard. This actor talks directly to Substack's internal JSON endpoints with plain HTTP. One actor, one price, every surface.

| Concern | Browser-based scrapers | Internal-API scrapers (this one) |
|---|---|---|
| Setup time | Minutes (proxy + fingerprint config) | Seconds — paste a URL |
| Price per 1,000 posts | $2 to $20 | **$0.30** |
| Reliability | Cloudflare blocks happen | No anti-bot to fight |
| Comment threads | Often a separate paid actor | Included, full nested tree |
| Reactor + restacker identity | Nobody exposes this | Optional opt-in, included |
| Notes scraping | Often a separate actor | Built in, same actor |
| Fields per post | 5 to 25 | 50+ |

Plain HTTP isn't risk-free though, so the actor adds a `Retry-After` aware exponential backoff for 429s and an opt-in `proxyConfiguration` field for very large runs. With Apify Proxy on, each request rotates through a fresh IP and rate limits stop mattering.

### What data can you extract?

<table>
<tr>
<td>📝 Article HTML + plain text + ProseMirror <code>bodyJson</code></td>
<td>💬 Full nested comment threads (opt-in)</td>
<td>❤️ Reactions object keyed by emoji</td>
<td>👥 Reactor + restacker identity (opt-in)</td>
</tr>
<tr>
<td>📊 Reaction count, comment count, restack count</td>
<td>👤 Multi-byline guest authors with their pubs</td>
<td>🔓 Paywall preview text + meter taxonomy</td>
<td>📅 Publish date, update date, archive nav slugs</td>
</tr>
<tr>
<td>🎧 Voiceover + auto-TTS audio + podcast URL</td>
<td>🔍 SEO title + description + social title</td>
<td>📝 Substack Notes feed by author handle</td>
<td>⏱️ Reading time, word count, tags</td>
</tr>
</table>

#### URL mode (post scraping)

Paste any Substack URL — custom domain or `.substack.com` subdomain — and get every post with full HTML, plain text, structured `bodyJson`, reading time, multi-byline guest authors, full paywall taxonomy, SEO fields, audio renditions, and archive navigation slugs.

Two opt-in flags add depth:

- `includeComments: true` — full nested comment tree per post in one HTTP call. Each comment carries its own emoji-keyed reactions, ProseMirror `body_json`, link-preview attachments, depth + parentId for tree reconstruction, and pinned/edited/deleted flags.
- `includeFacepile: true` — names, handles, and primary publication of every user who reacted to or restacked the post. Useful for influence analysis. No other Substack scraper exposes this.

#### Notes mode (Substack Notes scraping)

Pass `notesHandles` instead of (or alongside) `urls` to scrape any author's Notes feed. Each Note record carries body text, structured `body_json`, attachments (link previews, image embeds), emoji-keyed reactions, restack count, and reply count.

### How to scrape Substack: step by step

1. [Create a free Apify account](https://console.apify.com/sign-up). Takes 30 seconds, no card.
2. Open [Substack Newsletter Scraper](https://console.apify.com/actors/YtOb83sq5rupKexLA?addFromActorId=YtOb83sq5rupKexLA) in the Apify Console.
3. Paste newsletter URLs into `urls` (URL mode), or pass author handles in `notesHandles` for Notes mode. Tick `includeComments` if you want comment threads.
4. Click **Start**. A typical 50-post run finishes in under 10 seconds.
5. Export results as JSON, CSV, or Excel — or fetch via the [Apify API](https://docs.apify.com/api/v2).

### How much does Substack Newsletter Scraper cost?

- Per 1,000 posts, comments, or Notes: **$0.30**
- Per 1,000 reactor/restacker entries (opt-in `includeFacepile`): **$0.20**
- Free-plan yield: roughly **16,000 results** per month on the $5 free credit
- Starter-plan yield: about **96,000 results** per month on the $29 Starter plan

A 50-post run with `includeComments=true` averaging ~10 comments per post = 50 + 500 = 550 billable items ≈ **$0.17**. Adding `includeFacepile=true` over 100 reactors/post adds 5,000 facepile entries ≈ **$1.00**. The `Actor Start` event is $0.00005 per gigabyte at startup; for any real workload that's rounding error.

Pause whenever. There's no subscription lock-in.

### Input

```json
{
    "urls": [
        "https://newsletter.pragmaticengineer.com",
        "https://www.lennysnewsletter.com"
    ],
    "maxPosts": 50,
    "includeContent": true,
    "contentFormat": "both",
    "includeComments": false,
    "includeFacepile": false,
    "searchKeyword": null,
    "audienceFilter": "all",
    "typeFilter": "all",
    "dateFrom": null,
    "dateTo": null,
    "sortBy": "newest"
}
````

| Field | Type | Default | Notes |
|---|---|---|---|
| `urls` | array | — | Newsletter URLs. Custom domains or `.substack.com` subdomains. Required for URL mode. |
| `maxPosts` | number | `50` | Max posts per newsletter. `0` means every post in the archive. |
| `includeContent` | boolean | `true` | Include `contentHtml` + `contentText`. Disable for fast metadata-only runs. |
| `contentFormat` | enum | `both` | `html`, `text`, or `both`. |
| `searchKeyword` | string | — | Filter the archive by keyword via Substack's `archive?search=` endpoint. Server-side filter, max 100 chars. |
| `audienceFilter` | enum | `all` | `all`, `free`, or `paid`. |
| `typeFilter` | enum | `all` | `all`, `newsletter`, `podcast`, `thread`, or `video`. |
| `includeComments` | boolean | `false` | Fetch full nested comment thread per post (one extra HTTP call per post). |
| `includeFacepile` | boolean | `false` | Fetch reactor + restacker identity per post (one extra HTTP call per post). |
| `dateFrom` / `dateTo` | string | — | YYYY-MM-DD bounds. |
| `sortBy` | enum | `newest` | `newest` or `oldest`. |
| `notesHandles` | array | — | Substack handles for Notes mode (e.g. `["lenny", "paulgraham"]`). Provide either `urls` or `notesHandles`. |
| `maxNotesPerHandle` | number | `50` | Max Notes per handle (1-1000). |
| `proxyConfiguration` | object | — | Optional Apify Proxy. Recommended for runs over 100 posts with `includeFacepile`, where Substack's per-IP rate limit otherwise eats wall time. Apify Proxy bandwidth is billed by Apify separately. |

#### Notes mode input

```json
{
    "notesHandles": ["lenny"],
    "maxNotesPerHandle": 50
}
```

### Recipes

Ready-to-paste inputs for common jobs.

#### Pull the last 25 posts of a newsletter with full content

```json
{
    "urls": ["https://newsletter.pragmaticengineer.com"],
    "maxPosts": 25,
    "includeContent": true
}
```

#### Scrape posts plus their full comment threads

```json
{
    "urls": ["https://www.lennysnewsletter.com"],
    "maxPosts": 25,
    "includeComments": true,
    "audienceFilter": "free"
}
```

#### Find every post about a topic in a newsletter

```json
{
    "urls": ["https://newsletter.pragmaticengineer.com"],
    "searchKeyword": "AI",
    "maxPosts": 50,
    "includeContent": true
}
```

#### Get reactor + restacker identity for influence analysis

```json
{
    "urls": ["https://www.lennysnewsletter.com"],
    "maxPosts": 10,
    "includeFacepile": true,
    "includeContent": false
}
```

#### Scrape Substack Notes for a list of authors

```json
{
    "notesHandles": ["lenny", "paulgraham", "sahilbloom"],
    "maxNotesPerHandle": 50
}
```

#### Bulk archive crawl with proxy (avoid rate limits)

```json
{
    "urls": ["https://newsletter.pragmaticengineer.com"],
    "maxPosts": 0,
    "includeFacepile": true,
    "proxyConfiguration": { "useApifyProxy": true }
}
```

#### Free posts only, sorted oldest first

```json
{
    "urls": ["https://www.lennysnewsletter.com"],
    "audienceFilter": "free",
    "sortBy": "oldest",
    "maxPosts": 100
}
```

### Output

Each post is one JSON record. Fields populated only by their corresponding flag (`comments`, `facepile`) are `null` when the flag is off.

```json
{
    "id": 165204731,
    "title": "New: A free year of Cursor, Google AI Pro, Notion",
    "subtitle": "Subscriber perks for paid members",
    "slug": "new-a-free-year-of-cursor-google",
    "url": "https://www.lennysnewsletter.com/p/new-a-free-year-of-cursor-google",
    "canonicalUrl": "https://www.lennysnewsletter.com/p/new-a-free-year-of-cursor-google",
    "author": "Lenny Rachitsky",
    "authorHandle": "lenny",
    "authorImageUrl": "https://substackcdn.com/image/...",
    "authorBio": "Writing • Angel investing • Advising",
    "authorTwitter": "lennysan",
    "bylines": [
        {
            "id": 1849774,
            "name": "Lenny Rachitsky",
            "handle": "lenny",
            "photoUrl": "https://substackcdn.com/...",
            "bio": "Writing • Angel investing • Advising",
            "twitterHandle": "lennysan",
            "isGuest": false,
            "primaryPublicationName": "Lenny's Newsletter",
            "primaryPublicationUrl": "https://www.lennysnewsletter.com"
        }
    ],
    "publishedAt": "2026-04-21T15:30:00.000Z",
    "updatedAt": null,
    "contentHtml": "<p>Today I'm thrilled to share...</p>",
    "contentText": "Today I'm thrilled to share...",
    "bodyJson": { "type": "doc", "content": [/* ProseMirror tree */] },
    "wordCount": 1602,
    "readingTimeMinutes": 7,
    "description": "Subscriber perks for paid members",
    "socialTitle": null,
    "searchEngineTitle": "A free year of Cursor + Google AI Pro for subscribers",
    "searchEngineDescription": "Lenny's Newsletter subscriber perks include...",
    "coverImageUrl": "https://substackcdn.com/image/...",
    "coverImageIsExplicit": false,
    "audienceType": "everyone",
    "isPaywalled": false,
    "truncatedBodyText": null,
    "meterType": null,
    "freeUnlockRequired": false,
    "teaserPostEligible": false,
    "isGeoblocked": false,
    "hasCashtag": false,
    "reactionCount": 293,
    "reactions": { "❤": 293 },
    "commentCount": 33,
    "childCommentCount": 20,
    "restackCount": 9,
    "tags": ["AI", "Tools"],
    "type": "newsletter",
    "hasAudio": false,
    "audioUrl": null,
    "audioItems": [],
    "podcastUrl": null,
    "podcastDuration": null,
    "previousPostSlug": "your-couch-to-5k-for-ai",
    "nextPostSlug": "a-visual-guide-to-getting-out-of",
    "newsletter": {
        "name": "Lenny's Newsletter",
        "description": "Deeply researched product, growth, and career advice",
        "url": "https://www.lennysnewsletter.com"
    },
    "comments": null,
    "facepile": null
}
```

When `includeComments` is on, `comments` is a flat array (with `parentId` + `depth` for tree reconstruction):

```json
{
    "id": 246915982,
    "parentId": null,
    "depth": 0,
    "bodyText": "Our infra is getting slammed, please bear with us...",
    "bodyJson": { "type": "doc", "content": [/* … */] },
    "authorId": 1849774,
    "authorName": "Lenny Rachitsky",
    "authorHandle": "lenny",
    "authorPhotoUrl": "https://substackcdn.com/...",
    "authorBestsellerTier": 10000,
    "publishedAt": "2026-04-21T15:56:31.307Z",
    "editedAt": "2026-04-21T16:00:11.789Z",
    "isPinned": true,
    "isDeleted": false,
    "reactionCount": 8,
    "reactions": { "❤": 8 },
    "restackCount": 0
}
```

When `includeFacepile` is on, `facepile.reactors[]` and `facepile.restackers[]` look like:

```json
{
    "id": 73273682,
    "name": "Miles Kohl",
    "handle": "mileskohl504716",
    "photoUrl": "https://substackcdn.com/...",
    "bio": null,
    "primaryPublicationName": "Miles' Substack",
    "primaryPublicationUrl": "https://mileskohl504716.substack.com",
    "bestsellerTier": null
}
```

In Notes mode each record has a different shape:

```json
{
    "id": 216329331,
    "handle": "lenny",
    "authorName": "Lenny Rachitsky",
    "authorHandle": "lenny",
    "authorPhotoUrl": "https://substackcdn.com/...",
    "authorBio": "Writing • Angel investing • Advising",
    "bestsellerTier": 10000,
    "publishedAt": "2026-02-18T18:18:50.293Z",
    "bodyText": "I'm thrilled to welcome The Skip with @Nikhyl Singhal to Lenny's Podcast Network",
    "bodyJson": { "type": "doc", "content": [/* … */] },
    "reactionCount": 142,
    "reactions": { "❤": 130, "🔥": 12 },
    "restackCount": 5,
    "childrenCount": 8,
    "attachments": [/* link previews, embedded images */],
    "publicationName": "Lenny's Newsletter",
    "publicationUrl": "https://www.lennysnewsletter.com"
}
```

### Field availability by mode

| Field group | URL mode (default) | URL + `includeComments` | URL + `includeFacepile` | Notes mode |
|---|:---:|:---:|:---:|:---:|
| Post identity (id, title, slug, url) | ✅ | ✅ | ✅ | — |
| Author (name, handle, photo, bylines) | ✅ | ✅ | ✅ | ✅ (single author) |
| Content (HTML, text, bodyJson, wordCount) | ✅ | ✅ | ✅ | bodyText + bodyJson only |
| Engagement (reactionCount, reactions emoji map, commentCount, restackCount) | ✅ | ✅ | ✅ | reactionCount + restackCount + childrenCount |
| Paywall taxonomy | ✅ | ✅ | ✅ | — |
| SEO + navigation (search engine fields, prev/next slug) | ✅ | ✅ | ✅ | — |
| Audio + podcast | ✅ | ✅ | ✅ | — |
| `comments` array | `null` | ✅ full nested tree | `null` | — |
| `facepile.reactors` + `facepile.restackers` | `null` | `null` | ✅ | — |
| Note `attachments` (link previews, embeds) | — | — | — | ✅ |

### FAQ

#### How much does Substack Newsletter Scraper cost?

Substack Newsletter Scraper uses pay-per-result pricing. Posts, comments, and Notes are **$0.30 per 1,000 items**. Optional `includeFacepile` reactor/restacker entries are **$0.20 per 1,000**. The Apify Free plan gives you $5 in usage credits a month, enough for around 16,000 results. If you run regularly, the $29/month Starter plan covers about 96,000 results.

No subscription lock-in. Pause whenever.

#### Is it legal to scrape Substack?

Scraping public data is generally allowed in the US and most of the EU, as long as you don't collect personal data covered by GDPR or CCPA without a lawful basis. This actor only touches publicly accessible Substack pages (no login, no paywall bypass). How you use the output is on you.

Apify's full breakdown: [Is web scraping legal?](https://blog.apify.com/is-web-scraping-legal/).

#### Can I integrate Substack Newsletter Scraper with other tools?

Push results into **Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive**, and more. Apify treats every actor as a webhook source, so anything that consumes webhooks or pulls from an API works.

Full list: [Apify integrations](https://docs.apify.com/platform/integrations).

#### Can I use Substack Newsletter Scraper with the Apify API?

Yes. Every run is available via the Apify REST API:

```bash
curl -X POST "https://api.apify.com/v2/acts/sourabhbgp~substack-scraper/runs?token=APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"urls":["https://newsletter.pragmaticengineer.com"],"maxPosts":25,"includeComments":true}'
```

Docs: [Apify API reference](https://docs.apify.com/api/v2).

#### Can I use Substack Newsletter Scraper through an MCP Server?

Yes. Apify ships an MCP server that exposes every actor as a tool, so Claude Desktop, Cursor, and any other MCP-capable client can call Substack Newsletter Scraper directly. Setup: [Apify MCP docs](https://docs.apify.com/platform/integrations/mcp).

### Your feedback

Bug, missing field, or odd behavior? Drop a note in the [Issues tab](https://console.apify.com/actors/YtOb83sq5rupKexLA/issues). Reports go to a human and fixes usually ship the same week.

# Actor input Schema

## `urls` (type: `array`):

List of Substack newsletter URLs (URL mode). Provide either urls or notesHandles.

## `maxPosts` (type: `integer`):

Maximum number of posts to scrape per newsletter. Set to 0 for all posts.

## `includeContent` (type: `boolean`):

Extract full article HTML and plain text. Disable for faster metadata-only scraping.

## `includeComments` (type: `boolean`):

Fetch the full nested comment thread for each post. Adds one HTTP call per post and is billed at $0.0003 per comment.

## `includeFacepile` (type: `boolean`):

Fetch the list of users who reacted to and restacked each post. One HTTP call per post; billed at $0.0002 per reactor/restacker.

## `contentFormat` (type: `string`):

Format for article content output.

## `dateFrom` (type: `string`):

Only include posts published on or after this date (YYYY-MM-DD).

## `dateTo` (type: `string`):

Only include posts published on or before this date (YYYY-MM-DD).

## `sortBy` (type: `string`):

Sort posts by date.

## `searchKeyword` (type: `string`):

If set, scrape only posts whose title or body matches this keyword. Uses Substack's archive search endpoint.

## `audienceFilter` (type: `string`):

Filter posts by audience: all, free (audience=everyone), or paid (audience=only\_paid).

## `typeFilter` (type: `string`):

Filter by post type.

## `notesHandles` (type: `array`):

Substack handles to scrape Notes for (Notes mode). Provide either urls or notesHandles.

## `maxNotesPerHandle` (type: `integer`):

Max Notes to return per handle.

## `proxyConfiguration` (type: `object`):

Optional proxy for outbound requests. Recommended for large runs (>100 posts with includeComments or includeFacepile) to avoid rate-limits. Apify Proxy bandwidth is billed separately by Apify (datacenter ~$0.6/GB, residential ~$8/GB).

## Actor input object example

```json
{
  "urls": [
    "https://newsletter.pragmaticengineer.com"
  ],
  "maxPosts": 50,
  "includeContent": true,
  "includeComments": false,
  "includeFacepile": false,
  "contentFormat": "both",
  "sortBy": "newest",
  "audienceFilter": "all",
  "typeFilter": "all",
  "maxNotesPerHandle": 50
}
```

# Actor output Schema

## `posts` (type: `string`):

Posts extracted from Substack newsletters including title, content, author, engagement metrics, and more.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://newsletter.pragmaticengineer.com"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("sourabhbgp/substack-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "urls": ["https://newsletter.pragmaticengineer.com"] }

# Run the Actor and wait for it to finish
run = client.actor("sourabhbgp/substack-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://newsletter.pragmaticengineer.com"
  ]
}' |
apify call sourabhbgp/substack-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=sourabhbgp/substack-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Substack Scraper – Posts, Comments & Notes",
        "description": "Scrape every Substack surface in one actor — full posts (50+ fields, complete article HTML), nested comment threads, emoji reaction breakdown, Substack Notes, restacker identity, multi-byline authors, custom domains. Direct JSON API + RSS, no browser, no Cloudflare. From $0.30 per 1,000 posts.",
        "version": "0.4",
        "x-build-id": "hRh0CvbEXqZpKsCzT"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/sourabhbgp~substack-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-sourabhbgp-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/sourabhbgp~substack-scraper/runs": {
            "post": {
                "operationId": "runs-sync-sourabhbgp-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/sourabhbgp~substack-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-sourabhbgp-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "Newsletter URLs",
                        "type": "array",
                        "description": "List of Substack newsletter URLs (URL mode). Provide either urls or notesHandles.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxPosts": {
                        "title": "Max Posts per Newsletter",
                        "minimum": 0,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum number of posts to scrape per newsletter. Set to 0 for all posts.",
                        "default": 50
                    },
                    "includeContent": {
                        "title": "Include Full Content",
                        "type": "boolean",
                        "description": "Extract full article HTML and plain text. Disable for faster metadata-only scraping.",
                        "default": true
                    },
                    "includeComments": {
                        "title": "Include Comments",
                        "type": "boolean",
                        "description": "Fetch the full nested comment thread for each post. Adds one HTTP call per post and is billed at $0.0003 per comment.",
                        "default": false
                    },
                    "includeFacepile": {
                        "title": "Include Facepile (Reactor + Restacker Identity)",
                        "type": "boolean",
                        "description": "Fetch the list of users who reacted to and restacked each post. One HTTP call per post; billed at $0.0002 per reactor/restacker.",
                        "default": false
                    },
                    "contentFormat": {
                        "title": "Content Format",
                        "enum": [
                            "html",
                            "text",
                            "both"
                        ],
                        "type": "string",
                        "description": "Format for article content output.",
                        "default": "both"
                    },
                    "dateFrom": {
                        "title": "Date From",
                        "type": "string",
                        "description": "Only include posts published on or after this date (YYYY-MM-DD)."
                    },
                    "dateTo": {
                        "title": "Date To",
                        "type": "string",
                        "description": "Only include posts published on or before this date (YYYY-MM-DD)."
                    },
                    "sortBy": {
                        "title": "Sort Order",
                        "enum": [
                            "newest",
                            "oldest"
                        ],
                        "type": "string",
                        "description": "Sort posts by date.",
                        "default": "newest"
                    },
                    "searchKeyword": {
                        "title": "Search Keyword (optional)",
                        "pattern": "^.{1,100}$",
                        "type": "string",
                        "description": "If set, scrape only posts whose title or body matches this keyword. Uses Substack's archive search endpoint."
                    },
                    "audienceFilter": {
                        "title": "Audience Filter",
                        "enum": [
                            "all",
                            "free",
                            "paid"
                        ],
                        "type": "string",
                        "description": "Filter posts by audience: all, free (audience=everyone), or paid (audience=only_paid).",
                        "default": "all"
                    },
                    "typeFilter": {
                        "title": "Post Type Filter",
                        "enum": [
                            "all",
                            "newsletter",
                            "podcast",
                            "thread",
                            "video"
                        ],
                        "type": "string",
                        "description": "Filter by post type.",
                        "default": "all"
                    },
                    "notesHandles": {
                        "title": "Notes Handles",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Substack handles to scrape Notes for (Notes mode). Provide either urls or notesHandles.",
                        "items": {
                            "type": "string",
                            "pattern": "^[a-zA-Z0-9_-]{1,40}$"
                        }
                    },
                    "maxNotesPerHandle": {
                        "title": "Max Notes per Handle",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Max Notes to return per handle.",
                        "default": 50
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Optional proxy for outbound requests. Recommended for large runs (>100 posts with includeComments or includeFacepile) to avoid rate-limits. Apify Proxy bandwidth is billed separately by Apify (datacenter ~$0.6/GB, residential ~$8/GB)."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
