# Substack Posts Scraper (`fetch_cat/substack-posts-scraper`) Actor

📰 Scrape public Substack posts, archive metadata, URLs, dates, previews, reactions, and comments for newsletter research.

- **URL**: https://apify.com/fetch\_cat/substack-posts-scraper.md
- **Developed by:** [Hanna Nosova](https://apify.com/fetch_cat) (community)
- **Categories:** Social media
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.03 / 1,000 post extracteds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Substack Posts Scraper

Collect public Substack newsletter posts, archive metadata, and article previews from one or more publications.

### What does Substack Posts Scraper do?

Substack Posts Scraper helps you monitor newsletters and collect public post data at scale.

It accepts Substack publication URLs, custom domains, and bare domains.

It returns clean dataset rows for public posts.

You can use the data for content research, media monitoring, competitive intelligence, and creator discovery.

The actor does not require your Substack login.

It only collects public data that is available from the publication.

Paid-only or limited preview posts are marked clearly in the output.

### Who is it for?

Marketing teams use it to monitor newsletters in their niche.

PR teams use it to track creator and journalist coverage.

Content teams use it to research headlines, topics, and publishing cadence.

Investors use it to follow operators and analysts.

Sales teams use it to discover creators and potential leads.

Researchers use it to build datasets of public newsletters.

Agencies use it to report on content trends for clients.

### Why use this actor?

📰 Collect posts from multiple publications in one run.

🔎 Search archives by keyword.

📅 Filter by publication date.

📊 Export structured data to JSON, CSV, Excel, or API.

⚙️ Control per-publication limits.

🧾 Keep paid/limited previews visible without bypassing access rules.

### What data can I extract?

| Field | Description |
| --- | --- |
| publicationDomain | Publication domain |
| title | Post title |
| subtitle | Post subtitle when available |
| canonicalUrl | Public post URL |
| postDate | Published date |
| audience | Audience/visibility value |
| isPaidOnly | Whether the post appears paid-only |
| isLimited | Whether only a limited preview is available |
| description | Public description or preview |
| bodyText | Public body text or preview text |
| wordCount | Word count when available |
| reactionCount | Reactions when available |
| commentCount | Comments when available |
| tags | Tags/categories when available |
| sourceEndpoint | Source used for the record |

### How much does it cost to scrape Substack posts?

This actor uses pay-per-event pricing.

There is a small run start charge.

There is a per-post charge for each saved dataset item.

Small tests with the default input are inexpensive.

Final store pricing is shown on the Apify actor page before you run it.

### How to use Substack Posts Scraper

1. Open the actor on Apify.

2. Add one or more Substack publication URLs or domains.

3. Set the maximum posts per publication.

4. Optionally add a search term or date filters.

5. Choose whether to include body HTML.

6. Click Start.

7. Download the dataset when the run finishes.

### Input

#### Publication URLs or domains

Add URLs such as `https://www.lennysnewsletter.com`.

Add Substack domains such as `https://example.substack.com`.

Bare domains are also accepted.

#### Maximum posts per publication

Controls how many posts are saved from each publication.

Use a small number for testing.

Increase it for production monitoring.

#### Search term

Use a keyword to search publication archives.

Leave it empty to collect newest posts.

#### Date filters

Use `dateFrom` and `dateTo` to limit results by publication date.

Dates should use `YYYY-MM-DD` format.

#### Include body HTML

Enable this if your workflow needs public HTML.

Disable it for smaller exports.

#### Use feed fallback

Keep this enabled for broader publication coverage.

If one source is unavailable, the actor can still collect public feed data.

#### Concurrency

Controls how many publications are processed in parallel.

The default is conservative and reliable.

### Output

Each result is a single Substack post record.

The dataset is ready for spreadsheets, BI tools, automation, and APIs.

Example item:

```json
{
  "publicationDomain": "lennysnewsletter.com",
  "title": "Example newsletter post",
  "canonicalUrl": "https://www.lennysnewsletter.com/p/example",
  "postDate": "2026-01-01T12:00:00.000Z",
  "isPaidOnly": false,
  "isLimited": false,
  "wordCount": 1200,
  "reactionCount": 42,
  "commentCount": 7
}
````

### Tips for best results

Start with one publication and a low post limit.

Check the output fields before running a large batch.

Use date filters for recurring monitoring.

Use search terms for topical research.

Leave feed fallback enabled unless you need only archive-rich records.

Disable body HTML when you only need metadata.

### Integrations

Send results to Google Sheets through Apify integrations.

Trigger runs from Zapier or Make.

Use the dataset API in dashboards.

Feed new posts into a CRM or lead database.

Schedule recurring runs for weekly media monitoring.

Connect results to a vector database for content analysis.

### API usage

#### Node.js

```js
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('fetch_cat/substack-posts-scraper').call({
  publicationUrls: [{ url: 'https://www.lennysnewsletter.com' }],
  maxPostsPerPublication: 10
});
console.log(run.defaultDatasetId);
```

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('fetch_cat/substack-posts-scraper').call(run_input={
    'publicationUrls': [{'url': 'https://www.lennysnewsletter.com'}],
    'maxPostsPerPublication': 10,
})
print(run['defaultDatasetId'])
```

#### cURL

```bash
curl -X POST 'https://api.apify.com/v2/acts/fetch_cat~substack-posts-scraper/runs?token=YOUR_APIFY_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{"publicationUrls":[{"url":"https://www.lennysnewsletter.com"}],"maxPostsPerPublication":10}'
```

### MCP usage

Use Apify MCP to run this actor from Claude tools.

MCP URL format:

```text
https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper
```

Claude Code setup:

```bash
claude mcp add apify-substack --transport http https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper
```

Claude Desktop JSON config:

```json
{
  "mcpServers": {
    "apify-substack": {
      "url": "https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper"
    }
  }
}
```

Example prompts:

- Run Substack Posts Scraper for Lenny's Newsletter and return the newest 10 post titles.

- Collect public posts mentioning pricing from these three newsletters.

- Export the latest creator newsletter posts to a CSV dataset.

### Scheduling

Use Apify schedules to run this actor daily, weekly, or monthly.

Recurring runs are useful for media monitoring.

Combine date filters with schedules to collect fresh posts.

### Data freshness

The actor collects data available at run time.

Publication owners can edit, delete, or restrict posts.

Run the actor regularly if freshness matters.

### Limitations

The actor does not bypass paywalls.

Paid-only posts may contain only public previews.

Some publications use custom domains or settings that expose fewer fields.

Very old archives may have missing metadata.

Feeds may contain fewer fields than publication archives.

### Troubleshooting

### FAQ

#### Why did I get fewer posts than requested?

The publication may have fewer public posts, date filters may exclude posts, or paid posts may expose only limited preview data.

#### Why is body text missing?

The publication may not expose full public body text for that post. Enable body HTML only when you need it.

#### Why did a custom domain fail?

Check that the domain is a public Substack publication homepage and can be opened in a browser without login.

### Legality

### Legal and ethical use

Only collect public data.

Respect Substack authors and publication terms.

Do not use the actor to bypass subscriptions or access controls.

Use reasonable limits and schedules.

If you process personal data, make sure your use complies with applicable laws.

### Related scrapers

Explore other Apify actors from `fetch_cat` for content research, social monitoring, and lead generation.

Use related actors together to enrich creator, publication, and company datasets.

### Support

If a publication does not work, provide the run URL and input used.

Include whether the publication is a custom domain or a `substack.com` subdomain.

Share a small reproducible input when asking for help.

### Changelog

Initial version collects public Substack publication posts and archive metadata.

# Actor input Schema

## `publicationUrls` (type: `array`):

Substack publication homepages, custom domains, or bare domains to scrape (for example, https://www.lennysnewsletter.com or https://example.substack.com).

## `maxPostsPerPublication` (type: `integer`):

Maximum number of posts to save from each publication.

## `searchTerm` (type: `string`):

Optional archive search term. Leave empty to collect newest posts.

## `dateFrom` (type: `string`):

Optional ISO date (YYYY-MM-DD). Posts before this date are skipped.

## `dateTo` (type: `string`):

Optional ISO date (YYYY-MM-DD). Posts after this date are skipped.

## `includeBodyHtml` (type: `boolean`):

Include public body HTML returned by Substack. Disable for smaller datasets.

## `includeRssFallback` (type: `boolean`):

If the archive endpoint is unavailable for a publication, try the public feed and save available post fields.

## `concurrency` (type: `integer`):

Number of publications processed in parallel. Keep low for reliable public archive access.

## Actor input object example

```json
{
  "publicationUrls": [
    {
      "url": "https://www.lennysnewsletter.com"
    },
    {
      "url": "https://stratechery.com"
    }
  ],
  "maxPostsPerPublication": 20,
  "searchTerm": "",
  "includeBodyHtml": false,
  "includeRssFallback": true,
  "concurrency": 3
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "publicationUrls": [
        {
            "url": "https://www.lennysnewsletter.com"
        },
        {
            "url": "https://stratechery.com"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("fetch_cat/substack-posts-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "publicationUrls": [
        { "url": "https://www.lennysnewsletter.com" },
        { "url": "https://stratechery.com" },
    ] }

# Run the Actor and wait for it to finish
run = client.actor("fetch_cat/substack-posts-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "publicationUrls": [
    {
      "url": "https://www.lennysnewsletter.com"
    },
    {
      "url": "https://stratechery.com"
    }
  ]
}' |
apify call fetch_cat/substack-posts-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Substack Posts Scraper",
        "description": "📰 Scrape public Substack posts, archive metadata, URLs, dates, previews, reactions, and comments for newsletter research.",
        "version": "0.1",
        "x-build-id": "zowf7xmiqpbcBm0fG"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/fetch_cat~substack-posts-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-fetch_cat-substack-posts-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/fetch_cat~substack-posts-scraper/runs": {
            "post": {
                "operationId": "runs-sync-fetch_cat-substack-posts-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/fetch_cat~substack-posts-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-fetch_cat-substack-posts-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "publicationUrls"
                ],
                "properties": {
                    "publicationUrls": {
                        "title": "Publication URLs or domains",
                        "type": "array",
                        "description": "Substack publication homepages, custom domains, or bare domains to scrape (for example, https://www.lennysnewsletter.com or https://example.substack.com).",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxPostsPerPublication": {
                        "title": "Maximum posts per publication",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of posts to save from each publication.",
                        "default": 20
                    },
                    "searchTerm": {
                        "title": "Search term",
                        "type": "string",
                        "description": "Optional archive search term. Leave empty to collect newest posts.",
                        "default": ""
                    },
                    "dateFrom": {
                        "title": "Published after",
                        "type": "string",
                        "description": "Optional ISO date (YYYY-MM-DD). Posts before this date are skipped."
                    },
                    "dateTo": {
                        "title": "Published before",
                        "type": "string",
                        "description": "Optional ISO date (YYYY-MM-DD). Posts after this date are skipped."
                    },
                    "includeBodyHtml": {
                        "title": "Include body HTML",
                        "type": "boolean",
                        "description": "Include public body HTML returned by Substack. Disable for smaller datasets.",
                        "default": false
                    },
                    "includeRssFallback": {
                        "title": "Use feed fallback",
                        "type": "boolean",
                        "description": "If the archive endpoint is unavailable for a publication, try the public feed and save available post fields.",
                        "default": true
                    },
                    "concurrency": {
                        "title": "Publication concurrency",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Number of publications processed in parallel. Keep low for reliable public archive access.",
                        "default": 3
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
