# Substack Scraper — Posts, Authors & Newsletters (`cryptosignals/substack-scraper`) Actor

Extract Substack newsletter content. Get post titles, authors, publish dates, paywall status, subscriber counts, and full article text. Ideal for newsletter research and content monitoring. PPE pricing — pay only for results.

- **URL**: https://apify.com/cryptosignals/substack-scraper.md
- **Developed by:** [Web Data Labs](https://apify.com/cryptosignals) (community)
- **Categories:** Social media, News
- **Stats:** 27 total users, 7 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$5.00 / 1,000 result scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Substack Scraper — Posts, Comments & Publication Data

Extract structured data from any Substack newsletter at scale. Scrape posts with full article text, reader comments, and publication metadata — no login required. Export to JSON, CSV, or Excel with a single click.

### Why Use This Scraper?

Substack has grown into one of the most important platforms for independent journalism, thought leadership, and niche expertise. With over 35 million active subscriptions and 17,000+ paid writers, it's a goldmine for researchers, marketers, and analysts — but Substack offers no bulk export or public API.

This actor solves that. It programmatically extracts posts, comments, and publication info from any Substack newsletter, giving you clean, structured data ready for analysis.

### Key Features

- **Three scrape modes**: Posts, comments, and publication info
- **Search across Substack**: Find posts by keyword across the entire platform
- **Publication-specific scraping**: Target one or more newsletters by subdomain
- **Full article text**: Optionally include the complete body text of each post
- **Flexible sorting**: Sort by newest or top-performing posts
- **Scale control**: Scrape from 1 to 500 items per run
- **No authentication needed**: Works without any Substack account
- **Multiple export formats**: JSON, CSV, Excel, XML, HTML

### Use Cases

#### 1. Content Research & Competitive Analysis
Track what topics are trending across newsletters in your industry. Monitor competitors' publishing frequency, engagement, and content strategy.

#### 2. Media Monitoring & PR Intelligence
Set up regular scrapes to track mentions of your brand, product, or industry across Substack newsletters. Stay ahead of narratives before they hit mainstream media.

#### 3. Academic & Market Research
Collect large datasets of expert opinion pieces, industry analysis, and commentary for qualitative research. Study how narratives form and spread through independent media.

#### 4. Newsletter Discovery & Curation
Search for newsletters covering specific topics, then scrape their publication info to evaluate subscriber counts, posting cadence, and content quality.

#### 5. Sentiment & Trend Analysis
Extract posts about specific topics or companies, then run NLP or sentiment analysis on the text. Detect shifts in expert opinion over time.

#### 6. Lead Generation for B2B
Find Substack authors writing about your domain and extract their publication details. These are high-value contacts who are actively engaged in your space.

#### 7. Content Repurposing & Summarization
Pull posts from newsletters you subscribe to and feed them into LLMs for summarization, translation, or content repurposing workflows.

### Input Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `publications` | Array of strings | No | — | Substack subdomains to scrape (e.g., `platformer` for platformer.substack.com) |
| `searchQuery` | String | No | — | Search keyword to find posts across all of Substack |
| `scrapeType` | String | No | `posts` | What to scrape: `posts`, `comments`, or `info` |
| `maxItems` | Integer | No | `50` | Maximum items to return (1–500) |
| `sortBy` | String | No | `new` | Sort order: `new` (newest first) or `top` (most popular) |
| `includeBodyText` | Boolean | No | `false` | Include the full body text of each post |

> **Tip**: Use `publications` to target specific newsletters, or `searchQuery` to search across the entire platform. You can combine both.

### Sample Output

#### Posts Output
```json
{
  "title": "The AI Trust Crisis",
  "subtitle": "Why users are losing faith in AI-generated content",
  "slug": "the-ai-trust-crisis",
  "publishedAt": "2026-03-01T10:30:00.000Z",
  "canonicalUrl": "https://platformer.substack.com/p/the-ai-trust-crisis",
  "author": "Casey Newton",
  "publicationName": "Platformer",
  "publicationSubdomain": "platformer",
  "likes": 847,
  "comments": 132,
  "wordCount": 2450,
  "isPaywalled": false,
  "previewText": "The past month has brought a reckoning for AI companies...",
  "coverImage": "https://substackcdn.com/image/fetch/...",
  "tags": ["AI", "trust", "technology"]
}
````

#### Comments Output

```json
{
  "body": "This is exactly what I've been seeing in my industry...",
  "author": "John Reader",
  "date": "2026-03-01T14:22:00.000Z",
  "likes": 23,
  "postTitle": "The AI Trust Crisis",
  "publicationSubdomain": "platformer"
}
```

#### Publication Info Output

```json
{
  "name": "Platformer",
  "subdomain": "platformer",
  "description": "Tech and democracy coverage",
  "authorName": "Casey Newton",
  "heroImage": "https://substackcdn.com/image/fetch/...",
  "logoUrl": "https://substackcdn.com/image/fetch/...",
  "themeColor": "#FF6719",
  "subscriberCount": 250000,
  "postCount": 1200
}
```

### Integration Examples

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run_input = {
    "publications": ["platformer", "thebrowser"],
    "scrapeType": "posts",
    "maxItems": 50,
    "sortBy": "new",
    "includeBodyText": True,
}

run = client.actor("cryptosignals/substack-scraper").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item['title']} — {item.get('likes', 0)} likes")
```

#### Node.js

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const input = {
    publications: ["platformer", "thebrowser"],
    scrapeType: "posts",
    maxItems: 50,
    sortBy: "new",
    includeBodyText: true,
};

const run = await client.actor("cryptosignals/substack-scraper").call(input);

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => {
    console.log(`${item.title} — ${item.likes || 0} likes`);
});
```

#### Using the Apify API Directly

```bash
curl -X POST "https://api.apify.com/v2/acts/cryptosignals~substack-scraper/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "publications": ["platformer"],
    "scrapeType": "posts",
    "maxItems": 20
  }'
```

### Pricing & Costs

This actor runs on the **Apify platform** using your account's compute units (CUs).

| Scenario | Estimated Cost |
|----------|---------------|
| 50 posts from one publication | ~$0.01–$0.02 |
| 200 posts from multiple publications | ~$0.05–$0.10 |
| 500 posts with full body text | ~$0.10–$0.25 |

Costs depend on the number of items, whether body text is included (larger payloads), and the Apify plan you're on. Free plan users get $5/month in platform credits — enough for hundreds of scrapes.

### Tips for Best Results

1. **Start small**: Set `maxItems` to 5–10 for your first run to verify the output format meets your needs.
2. **Use publication subdomains**: For `platformer.substack.com`, enter just `platformer` in the publications list.
3. **Enable body text selectively**: Full article text significantly increases output size. Only enable it when you need the content for analysis.
4. **Combine with Apify integrations**: Send results directly to Google Sheets, Slack, Zapier, Make, or webhooks for automated workflows.
5. **Schedule regular runs**: Set up recurring scrapes to build longitudinal datasets or monitor newsletters over time.

### Frequently Asked Questions

#### Can I scrape paywalled/subscriber-only posts?

The scraper extracts publicly available data. For paywalled posts, you'll get the title, preview text, metadata, and publication info, but not the full subscriber-only content.

#### How do I find a publication's subdomain?

Look at the newsletter URL. For `https://platformer.substack.com`, the subdomain is `platformer`. For custom domains, check the Substack about page.

#### Can I scrape custom domain Substack newsletters?

Yes. Use the publication's original Substack subdomain (before they switched to a custom domain). You can usually find it referenced on their about page or through a web search.

#### How often is the data updated?

Every run fetches live data directly from Substack. You always get the latest posts, comments, and metrics.

#### Is there a rate limit?

The scraper handles rate limiting automatically with built-in delays and retries. You don't need to configure anything.

#### Can I search for posts about a specific topic?

Yes! Use the `searchQuery` parameter to search across all of Substack, or combine it with `publications` to search within specific newsletters.

#### What export formats are available?

Apify supports JSON, CSV, Excel (XLSX), XML, HTML, and RSS. You can download in any format from the dataset tab after a run completes.

#### How do I integrate this with my existing workflow?

Use Apify's built-in integrations (Zapier, Make, Google Sheets, webhooks) or call the API directly from any programming language. See the code examples above.

#### Can I run this on a schedule?

Yes. Apify supports cron-like scheduling. Set up daily, weekly, or custom schedules from the actor's Schedules tab. Each run stores results in a new dataset.

#### What happens if a publication doesn't exist?

The scraper will log a warning for invalid subdomains and continue processing the remaining publications. Your run won't fail because of one bad input.

# Actor input Schema

## `publications` (type: `array`):

List of Substack publication subdomains to scrape (e.g. 'platformer' for platformer.substack.com)

## `searchQuery` (type: `string`):

Search query to find posts across Substack

## `scrapeType` (type: `string`):

What to scrape: posts, comments, or publication info

## `maxItems` (type: `integer`):

Maximum number of items to return

## `sortBy` (type: `string`):

Sort order for results

## `includeBodyText` (type: `boolean`):

Include the full body text of posts

## Actor input object example

```json
{
  "publications": [
    "platformer"
  ],
  "scrapeType": "posts",
  "maxItems": 5,
  "sortBy": "new",
  "includeBodyText": false
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "publications": [
        "platformer"
    ],
    "searchQuery": "",
    "scrapeType": "posts",
    "maxItems": 5,
    "sortBy": "new",
    "includeBodyText": false
};

// Run the Actor and wait for it to finish
const run = await client.actor("cryptosignals/substack-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "publications": ["platformer"],
    "searchQuery": "",
    "scrapeType": "posts",
    "maxItems": 5,
    "sortBy": "new",
    "includeBodyText": False,
}

# Run the Actor and wait for it to finish
run = client.actor("cryptosignals/substack-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "publications": [
    "platformer"
  ],
  "searchQuery": "",
  "scrapeType": "posts",
  "maxItems": 5,
  "sortBy": "new",
  "includeBodyText": false
}' |
apify call cryptosignals/substack-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=cryptosignals/substack-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Substack Scraper — Posts, Authors & Newsletters",
        "description": "Extract Substack newsletter content. Get post titles, authors, publish dates, paywall status, subscriber counts, and full article text. Ideal for newsletter research and content monitoring. PPE pricing — pay only for results.",
        "version": "1.0",
        "x-build-id": "lm6qOl8CG5fSkWTmv"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/cryptosignals~substack-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-cryptosignals-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/cryptosignals~substack-scraper/runs": {
            "post": {
                "operationId": "runs-sync-cryptosignals-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/cryptosignals~substack-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-cryptosignals-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "publications": {
                        "title": "Publications",
                        "type": "array",
                        "description": "List of Substack publication subdomains to scrape (e.g. 'platformer' for platformer.substack.com)",
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Search query to find posts across Substack"
                    },
                    "scrapeType": {
                        "title": "Scrape Type",
                        "enum": [
                            "posts",
                            "comments",
                            "info"
                        ],
                        "type": "string",
                        "description": "What to scrape: posts, comments, or publication info",
                        "default": "posts"
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of items to return",
                        "default": 50
                    },
                    "sortBy": {
                        "title": "Sort By",
                        "enum": [
                            "new",
                            "top"
                        ],
                        "type": "string",
                        "description": "Sort order for results",
                        "default": "new"
                    },
                    "includeBodyText": {
                        "title": "Include Body Text",
                        "type": "boolean",
                        "description": "Include the full body text of posts",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
