# Substack Scraper (`automation-lab/substack-scraper`) Actor

Scrape Substack newsletters — posts, comments, publication metadata. Full archive depth with no caps. Export to JSON, CSV, Excel, or connect via API.

- **URL**: https://apify.com/automation-lab/substack-scraper.md
- **Developed by:** [Stas Persiianenko](https://apify.com/automation-lab) (community)
- **Categories:** Social media, Lead generation, News
- **Stats:** 187 total users, 61 monthly users, 96.5% runs succeeded, 7 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Substack Scraper


### Next step: publication and partnership prospecting

Use Substack Scraper for newsletter/publication research, partnership prospecting, sponsorship research, and advertising outreach planning — not spam harvesting. When a public publisher domain is present, use [Website Contact Finder](https://apify.com/automation-lab/website-contact-finder) or [Social Media Profile Finder](https://apify.com/automation-lab/social-media-profile-finder) to find public contact channels, then verify any email addresses with [Bulk Email MX Verifier](https://apify.com/automation-lab/email-mx-verifier) or [Bulk SMTP Email Verifier](https://apify.com/automation-lab/smtp-email-verifier).

### Build a cold-outreach lead workflow on Apify

Use this actor as one step in a scrape → enrich → verify → export workflow:

1. **Source B2B accounts or contacts** with [LinkedIn Company Employees Scraper](https://apify.com/automation-lab/linkedin-company-employees-scraper), [LinkedIn Company Scraper](https://apify.com/automation-lab/linkedin-company-scraper), [Clutch.co Scraper](https://apify.com/automation-lab/clutch-scraper), [Yellow Pages Scraper](https://apify.com/automation-lab/yellowpages-scraper), [YellowPages Canada Business Scraper](https://apify.com/automation-lab/yellowpages-canada-scraper), or [Substack Scraper](https://apify.com/automation-lab/substack-scraper).
2. **Find public contact signals** with [Website Contact Finder](https://apify.com/automation-lab/website-contact-finder) or [Social Media Profile Finder](https://apify.com/automation-lab/social-media-profile-finder) when you have company domains/websites.
3. **Clean the list before outreach** with [Bulk Email MX Verifier](https://apify.com/automation-lab/email-mx-verifier) for fast syntax/MX/disposable/role-risk screening, then [Bulk SMTP Email Verifier](https://apify.com/automation-lab/smtp-email-verifier) when you need deeper mailbox and catch-all checks.
4. **Export to your outreach stack** as CSV/Excel/JSON for Instantly, HubSpot, Salesforce, Pipedrive, Airtable, Make, Zapier, or n8n.

Use only public/business contact data and comply with CAN-SPAM, GDPR, and your outreach platform's rules. Prefer company emails over personal addresses, filter role-based/risky emails when needed, and do not send to invalid or unverified addresses.

### What does Substack Scraper do?

**Substack Scraper** extracts data from any Substack newsletter -- posts with full HTML content, comments with nested replies, and publication metadata including subscriber counts. It supports **unlimited archive depth** (no 12-post cap), works with both `*.substack.com` and custom domain newsletters, and exports to JSON, CSV, Excel, or connects via API.

Unlike other scrapers, this actor uses Substack's public JSON API directly -- no browser, no proxy, **100% success rate**.

Key capabilities:

- Scrape full post archives from any Substack newsletter (no post limit)
- Extract complete HTML content, metadata, and engagement metrics for each post
- Fetch full comment threads with nested replies and author info
- Pull publication-level data: subscriber counts, pricing, author profiles
- Filter by date range, content type (newsletter/podcast/thread), or paywall status
- Works with both `*.substack.com` and custom domain newsletters

### Who is it for?

- **Content strategists** analyzing newsletter trends, posting cadence, and topic coverage across the Substack ecosystem
- **Market researchers** benchmarking competitor newsletters by subscriber growth, engagement, and pricing models
- **Sales and marketing teams** building lead lists from newsletter author profiles and publication metadata
- **Data journalists** investigating media trends with structured datasets from newsletter archives
- **Academic researchers** studying online discourse through newsletter content and comment analysis
- **AI/ML engineers** building training datasets from high-quality long-form writing on Substack
- **Newsletter creators** auditing their own archive performance and comparing with peers

### Why use Substack Scraper?

- **Unlimited archive depth** -- Scrape the complete archive of any newsletter. No 12-post cap like the market leader
- **100% success rate** -- Uses Substack's public JSON API. No anti-bot, no proxy needed, no failures
- **Full comment threads** -- Extract comments with nested replies, reaction counts, and author metadata
- **Publication metadata** -- Subscriber counts, pricing plans, author info, and 100+ publication fields
- **No proxy cost** -- Direct API access means zero proxy fees. Runs on minimal 256MB memory
- **Clean pay-per-event pricing** -- No hidden start fees or completion charges. Pay only for results
- **66+ fields per post** -- The richest output of any Substack scraper on Apify Store
- **Custom domain support** -- Works with both `newsletter.substack.com` and custom domains like `www.lennysnewsletter.com`

### What data can you extract from Substack?

**Per post (30+ fields):**

| Field | Description |
|-------|-------------|
| `title`, `subtitle`, `slug` | Post title, subtitle, and URL slug |
| `url` | Full canonical URL |
| `publishedAt`, `updatedAt` | Publication and update timestamps |
| `postType` | `newsletter`, `podcast`, or `thread` |
| `audience`, `isPaid` | Paywall status (`everyone` or `only_paid`) |
| `bodyHtml` | Full HTML content (free posts) |
| `truncatedBodyText` | Preview text for all posts including paywalled |
| `wordcount` | Total word count (even for paid posts) |
| `coverImage` | Cover image URL |
| `description` | Post description/excerpt |
| `tags` | Post tags/categories |
| `reactionCount`, `commentCount`, `restacks` | Engagement metrics |
| `authorName`, `authorHandle`, `authorBio` | Author information |
| `authorPhotoUrl`, `authorId` | Author photo and ID |
| `publicationName`, `publicationId`, `publicationUrl` | Newsletter metadata |
| `subscriberCount` | Newsletter subscriber count |
| `podcastUrl`, `podcastDuration` | Podcast episode data (if applicable) |
| `hasVoiceover` | Whether the post has audio narration |
| `scrapedAt` | Timestamp when the data was collected |

**Per comment (12 fields):** `id`, `body`, `date`, `editedAt`, `name`, `handle`, `photoUrl`, `reactionCount`, `restacks`, `isAuthor`, `isPinned`, nested `replies`

**Per publication (14 fields):** `id`, `name`, `subdomain`, `customDomain`, `baseUrl`, `authorName`, `authorHandle`, `authorBio`, `authorPhotoUrl`, `subscriberCount`, `logoUrl`, `heroText`, `language`, `paymentsEnabled`

### How much does it cost to scrape Substack?

This actor uses **pay-per-event** pricing -- you pay only for what you scrape. No monthly subscription required. All platform compute costs are included in the per-event price.

| Event | Free plan | Starter ($49/mo) | Scale ($499/mo) |
|---|---|---|---|
| **Start** | $0.005 | $0.004 | $0.003 |
| **Per post (metadata)** | $0.001 | $0.0008 | $0.0006 |
| **Per post (with content)** | $0.002 | $0.0017 | $0.0014 |
| **Per comment** | $0.0005 | $0.0004 | $0.0003 |

**Real-world cost examples:**

| Scenario | Results | Duration | Cost (Free tier) |
|---|---|---|---|
| 1 newsletter, 50 posts (metadata) | 50 posts | ~3s | ~$0.06 |
| 1 newsletter, 50 posts (with content) | 50 posts | ~5s | ~$0.11 |
| 1 newsletter, 50 posts + comments | 50 posts + ~200 comments | ~15s | ~$0.21 |
| 1 newsletter, full archive (500 posts) | 500 posts | ~30s | ~$1.01 |
| 5 newsletters, 100 posts each | 500 posts | ~60s | ~$1.03 |

Substack Scraper is one of the cheapest content scrapers on Apify. Because it uses direct API calls instead of a browser, there are no proxy or rendering costs -- you only pay for the data you receive.

### How to scrape Substack newsletters step by step

1. Go to the [Substack Scraper](https://apify.com/automation-lab/substack-scraper) page on Apify Store
2. Click **Start for free** to open the actor configuration screen
3. Enter one or more newsletter URLs in the **Newsletter URLs** field (e.g., `https://www.lennysnewsletter.com`)
4. Choose your output options:
   - **Include Post Content** -- toggle on for full HTML body, off for metadata-only (faster and cheaper)
   - **Include Comments** -- toggle on to fetch comment threads with nested replies
   - **Include Publication Info** -- toggle on for subscriber counts and newsletter metadata
5. Optionally set filters:
   - **Content Type** -- limit to newsletters, podcasts, or threads
   - **Start/End Date** -- restrict to a date range (YYYY-MM-DD format)
   - **Free Posts Only** -- skip paywalled content
6. Set **Max Posts per Newsletter** (default 100, set to 0 for unlimited full archive)
7. Click **Start** and wait for results
8. Download your data in JSON, CSV, or Excel from the **Dataset** tab, or connect via API

### How do I scrape Substack newsletters and posts?

| Parameter | Type | Default | Description |
|---|---|---|---|
| `urls` | array | *required* | Substack newsletter URLs. Accepts homepage, custom domain, post URLs, or /archive URLs |
| `maxPostsPerNewsletter` | integer | `100` | Max posts per newsletter. `0` = unlimited (full archive) |
| `includeContent` | boolean | `true` | Include full HTML body. Disable for metadata-only (faster, cheaper) |
| `includeComments` | boolean | `false` | Fetch comments for each post. Adds one API call per post |
| `includePublicationInfo` | boolean | `true` | Include newsletter metadata (subscriber count, pricing, author) |
| `contentType` | string | `all` | Filter: `all`, `newsletter`, `podcast`, or `thread` |
| `startDate` | string | -- | Only posts after this date (YYYY-MM-DD) |
| `endDate` | string | -- | Only posts before this date (YYYY-MM-DD) |
| `onlyFree` | boolean | `false` | Only include free posts. Skip paywalled content |

### What data can I extract from Substack?

```json
{
    "postId": 186226252,
    "title": "How to build AI product sense",
    "subtitle": "The secret is using Cursor for non-technical work",
    "slug": "how-to-build-ai-product-sense",
    "url": "https://www.lennysnewsletter.com/p/how-to-build-ai-product-sense",
    "publishedAt": "2026-02-03T13:45:58.303Z",
    "updatedAt": "2026-02-04T17:29:56.949Z",
    "postType": "newsletter",
    "audience": "everyone",
    "isPaid": false,
    "wordcount": 5867,
    "coverImage": "https://substackcdn.com/image/fetch/...",
    "tags": ["AI"],
    "reactionCount": 298,
    "commentCount": 31,
    "childCommentCount": 15,
    "restacks": 20,
    "hasVoiceover": false,
    "bodyHtml": "<div class=\"body markup\">...</div>",
    "authorName": "Tal Raviv",
    "authorHandle": "talsraviv",
    "publicationName": "Lenny's Newsletter",
    "subscriberCount": "1,100,000",
    "comments": [
        {
            "id": 209331673,
            "body": "This article creates a whole new paradigm for learning...",
            "date": "2026-02-03T15:34:25.318Z",
            "name": "Jack Cohen",
            "handle": "jackcohen10",
            "reactionCount": 9,
            "isAuthor": false,
            "replies": [
                {
                    "id": 209340123,
                    "body": "Thanks Jack!",
                    "name": "Tal Raviv",
                    "isAuthor": true,
                    "replies": []
                }
            ]
        }
    ],
    "scrapedAt": "2026-02-06T02:07:09.750Z"
}
````

### Tips for best results

- **Start with metadata-only** (`includeContent: false`) to quickly survey a newsletter's archive before doing a full content scrape
- **Use date filters** to scrape only recent posts instead of full archives -- saves time and money
- **Comments are optional** -- each post with comments requires an extra API call, so only enable when needed
- **Paid posts** return all metadata (title, wordcount, reactions) but `bodyHtml` will be empty
- **Custom domains** work the same as `*.substack.com` URLs -- just paste the full URL
- **Use `maxPostsPerNewsletter: 0`** for unlimited archive depth -- scrapes every post ever published
- **Batch multiple newsletters** in a single run by adding multiple URLs -- more efficient than running separate jobs
- **Filter by content type** to target only newsletter posts, podcast episodes, or threads separately

### Use with Claude AI (MCP)

This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

#### Setup for Claude Code

```bash
claude mcp add --transport http apify "https://mcp.apify.com"
```

#### Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

```json
{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com?tools=automation-lab/substack-scraper"
        }
    }
}
```

#### Example prompts

- "Scrape the last 30 posts from Lenny's Newsletter and summarize the main topics covered."
- "Get all free posts from this Substack newsletter published in 2025 with their engagement metrics."
- "Fetch the comments on this Substack post and tell me what readers think about the article."

Learn more in the [Apify MCP documentation](https://docs.apify.com/platform/integrations/mcp).

### Integrations

Connect Substack Scraper with your existing tools and automate newsletter data workflows:

- [Make](https://apify.com/integrations/make) -- Automate workflows triggered by new newsletter data
- [Zapier](https://apify.com/integrations/zapier) -- Connect to 5,000+ apps
- [Google Sheets](https://apify.com/integrations/google-sheets) -- Export directly to spreadsheets
- [Slack](https://apify.com/integrations/slack) -- Get notifications for new posts
- [GitHub](https://apify.com/integrations/github) -- Trigger workflows on new data
- [Webhooks](https://apify.com/integrations/webhooks) -- Send data to any endpoint

You can also schedule the scraper to run automatically at regular intervals. Set up a [schedule](https://docs.apify.com/platform/schedules) to monitor newsletters for new posts daily or weekly.

### API usage

**Node.js:**

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('automation-lab/substack-scraper').call({
    urls: ['https://www.lennysnewsletter.com'],
    maxPostsPerNewsletter: 50,
    includeContent: true,
    includeComments: false,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);
```

**Python:**

```python
from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')

run = client.actor('automation-lab/substack-scraper').call(run_input={
    'urls': ['https://www.lennysnewsletter.com'],
    'maxPostsPerNewsletter': 50,
    'includeContent': True,
    'includeComments': False,
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)
```

**cURL:**

```bash
curl "https://api.apify.com/v2/acts/automation-lab~substack-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -X POST -H "Content-Type: application/json" \
  -d '{"urls": ["https://www.lennysnewsletter.com"], "maxPostsPerNewsletter": 50, "includeContent": true}'
```

### Is it legal to scrape Substack?

Substack Scraper accesses only publicly available data through Substack's public API endpoints. It does not bypass any authentication, paywalls, or access controls. Paywalled content metadata (title, wordcount, engagement metrics) is returned, but the full body text of paid posts is not extracted.

Key points:

- The scraper uses the same public API that powers Substack's own website
- No login credentials are required or used
- Paywalled post content is not accessed or extracted
- The scraper respects rate limits with polite delays between requests
- Extracted data should be used in compliance with applicable laws and Substack's Terms of Service

Users are responsible for ensuring their specific use case complies with local data protection regulations (GDPR, CCPA, etc.) and Substack's Terms of Service. For more information, see the Apify blog on [web scraping legality](https://blog.apify.com/is-web-scraping-legal/).

### FAQ

**How fast is the scraper?**
Very fast. 50 posts (metadata only) complete in ~3 seconds. 50 posts with full content in ~5 seconds. Full archives of 500+ posts finish in under 30 seconds. No browser or proxy overhead.

**Can I scrape paid/paywalled posts?**
You get all metadata for paid posts (title, subtitle, wordcount, reactions, comments count) but `bodyHtml` will be empty since content access requires an active subscription.

**Does it work with custom domains?**
Yes. Enter the full URL (e.g., `https://www.lennysnewsletter.com`) and the scraper auto-detects it as a Substack newsletter.

**How many posts can I scrape?**
There is no limit. Set `maxPostsPerNewsletter: 0` to scrape the complete archive. This is the only Substack scraper on Apify with unlimited archive depth.

**Does it extract comments?**
Yes. Set `includeComments: true` to get full comment threads with nested replies, author info, and reaction counts. Each post with comments requires one extra API call.

**What about rate limits?**
Substack's public API has no detected rate limits. The scraper adds a polite delay between requests to be respectful.

**The `bodyHtml` field is empty for some posts.**
This is expected for paid/paywalled posts. The scraper returns all metadata (title, wordcount, reactions) but cannot access content behind a paywall without an active subscription.

**The scraper fails with "not a valid Substack" for a custom domain.**
Make sure you are using the full URL including `https://`. The scraper auto-detects Substack newsletters on custom domains, but the URL must be reachable and point to a valid Substack publication.

**Can I scrape multiple newsletters at once?**
Yes. Add multiple URLs to the `urls` input field. The scraper processes them sequentially and outputs all posts to a single dataset.

**What URL formats are accepted?**
The scraper accepts homepage URLs (`https://newsletter.substack.com`), custom domains (`https://www.lennysnewsletter.com`), individual post URLs, and `/archive` URLs. All formats are auto-detected.

### Related scrapers

- [Dev.to Scraper](https://apify.com/automation-lab/devto-scraper) -- Scrape articles and profiles from Dev.to
- [TechCrunch Scraper](https://apify.com/automation-lab/techcrunch-scraper) -- Extract articles and news from TechCrunch
- [Hacker News Scraper](https://apify.com/automation-lab/hackernews-scraper) -- Extract posts and comments from Hacker News
- [Reddit Scraper](https://apify.com/automation-lab/reddit-scraper) -- Scrape Reddit posts, comments, and subreddit data

# Actor input Schema

## `urls` (type: `array`):

Substack newsletter URLs to scrape. Accepts homepage (e.g. https://newsletter.substack.com), custom domain (e.g. https://www.lennysnewsletter.com), post URLs, or /archive URLs.

## `maxPostsPerNewsletter` (type: `integer`):

Maximum number of posts to scrape per newsletter. Set to 0 for unlimited (full archive).

## `includeContent` (type: `boolean`):

Include full HTML body of each post. Disable for metadata-only scraping (faster, cheaper).

## `includeComments` (type: `boolean`):

Fetch comments for each post. Adds one extra API request per post.

## `maxCommentsPerPost` (type: `integer`):

Maximum number of comments to return per post (including replies). Set to 0 for unlimited. Only applies when 'Include Comments' is enabled.

## `includePublicationInfo` (type: `boolean`):

Include newsletter metadata (subscriber count, pricing, author info). One extra request per newsletter.

## `contentType` (type: `string`):

Filter by post type.

## `startDate` (type: `string`):

Only include posts published after this date (YYYY-MM-DD format).

## `endDate` (type: `string`):

Only include posts published before this date (YYYY-MM-DD format).

## `onlyFree` (type: `boolean`):

Only include free posts. Skip paywalled content.

## Actor input object example

```json
{
  "urls": [
    "https://www.lennysnewsletter.com"
  ],
  "maxPostsPerNewsletter": 10,
  "includeContent": true,
  "includeComments": false,
  "maxCommentsPerPost": 20,
  "includePublicationInfo": true,
  "contentType": "all",
  "startDate": "2024-01-01",
  "endDate": "2026-12-31",
  "onlyFree": false
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://www.lennysnewsletter.com"
    ],
    "maxPostsPerNewsletter": 10,
    "maxCommentsPerPost": 20,
    "startDate": "2024-01-01",
    "endDate": "2026-12-31"
};

// Run the Actor and wait for it to finish
const run = await client.actor("automation-lab/substack-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": ["https://www.lennysnewsletter.com"],
    "maxPostsPerNewsletter": 10,
    "maxCommentsPerPost": 20,
    "startDate": "2024-01-01",
    "endDate": "2026-12-31",
}

# Run the Actor and wait for it to finish
run = client.actor("automation-lab/substack-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://www.lennysnewsletter.com"
  ],
  "maxPostsPerNewsletter": 10,
  "maxCommentsPerPost": 20,
  "startDate": "2024-01-01",
  "endDate": "2026-12-31"
}' |
apify call automation-lab/substack-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=automation-lab/substack-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Substack Scraper",
        "description": "Scrape Substack newsletters — posts, comments, publication metadata. Full archive depth with no caps. Export to JSON, CSV, Excel, or connect via API.",
        "version": "0.1",
        "x-build-id": "LpjvCFx2seNRnMOdb"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/automation-lab~substack-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-automation-lab-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/automation-lab~substack-scraper/runs": {
            "post": {
                "operationId": "runs-sync-automation-lab-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/automation-lab~substack-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-automation-lab-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "Newsletter URLs",
                        "type": "array",
                        "description": "Substack newsletter URLs to scrape. Accepts homepage (e.g. https://newsletter.substack.com), custom domain (e.g. https://www.lennysnewsletter.com), post URLs, or /archive URLs.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxPostsPerNewsletter": {
                        "title": "Max Posts per Newsletter",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of posts to scrape per newsletter. Set to 0 for unlimited (full archive).",
                        "default": 100
                    },
                    "includeContent": {
                        "title": "Include Post Content",
                        "type": "boolean",
                        "description": "Include full HTML body of each post. Disable for metadata-only scraping (faster, cheaper).",
                        "default": true
                    },
                    "includeComments": {
                        "title": "Include Comments",
                        "type": "boolean",
                        "description": "Fetch comments for each post. Adds one extra API request per post.",
                        "default": false
                    },
                    "maxCommentsPerPost": {
                        "title": "Max Comments per Post",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of comments to return per post (including replies). Set to 0 for unlimited. Only applies when 'Include Comments' is enabled.",
                        "default": 20
                    },
                    "includePublicationInfo": {
                        "title": "Include Publication Info",
                        "type": "boolean",
                        "description": "Include newsletter metadata (subscriber count, pricing, author info). One extra request per newsletter.",
                        "default": true
                    },
                    "contentType": {
                        "title": "Content Type Filter",
                        "enum": [
                            "all",
                            "newsletter",
                            "podcast",
                            "thread"
                        ],
                        "type": "string",
                        "description": "Filter by post type.",
                        "default": "all"
                    },
                    "startDate": {
                        "title": "Start Date",
                        "pattern": "^\\d{4}-\\d{2}-\\d{2}$",
                        "type": "string",
                        "description": "Only include posts published after this date (YYYY-MM-DD format)."
                    },
                    "endDate": {
                        "title": "End Date",
                        "pattern": "^\\d{4}-\\d{2}-\\d{2}$",
                        "type": "string",
                        "description": "Only include posts published before this date (YYYY-MM-DD format)."
                    },
                    "onlyFree": {
                        "title": "Free Posts Only",
                        "type": "boolean",
                        "description": "Only include free posts. Skip paywalled content.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
