Substack Posts Scraper
Pricing
from $0.03 / 1,000 post extracteds
Substack Posts Scraper
📰 Scrape public Substack posts, archive metadata, URLs, dates, previews, reactions, and comments for newsletter research.
Pricing
from $0.03 / 1,000 post extracteds
Rating
0.0
(0)
Developer
Hanna Nosova
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
11 hours ago
Last modified
Categories
Share
Collect public Substack newsletter posts, archive metadata, and article previews from one or more publications.
What does Substack Posts Scraper do?
Substack Posts Scraper helps you monitor newsletters and collect public post data at scale.
It accepts Substack publication URLs, custom domains, and bare domains.
It returns clean dataset rows for public posts.
You can use the data for content research, media monitoring, competitive intelligence, and creator discovery.
The actor does not require your Substack login.
It only collects public data that is available from the publication.
Paid-only or limited preview posts are marked clearly in the output.
Who is it for?
Marketing teams use it to monitor newsletters in their niche.
PR teams use it to track creator and journalist coverage.
Content teams use it to research headlines, topics, and publishing cadence.
Investors use it to follow operators and analysts.
Sales teams use it to discover creators and potential leads.
Researchers use it to build datasets of public newsletters.
Agencies use it to report on content trends for clients.
Why use this actor?
📰 Collect posts from multiple publications in one run.
🔎 Search archives by keyword.
📅 Filter by publication date.
📊 Export structured data to JSON, CSV, Excel, or API.
⚙️ Control per-publication limits.
🧾 Keep paid/limited previews visible without bypassing access rules.
What data can I extract?
| Field | Description |
|---|---|
| publicationDomain | Publication domain |
| title | Post title |
| subtitle | Post subtitle when available |
| canonicalUrl | Public post URL |
| postDate | Published date |
| audience | Audience/visibility value |
| isPaidOnly | Whether the post appears paid-only |
| isLimited | Whether only a limited preview is available |
| description | Public description or preview |
| bodyText | Public body text or preview text |
| wordCount | Word count when available |
| reactionCount | Reactions when available |
| commentCount | Comments when available |
| tags | Tags/categories when available |
| sourceEndpoint | Source used for the record |
How much does it cost to scrape Substack posts?
This actor uses pay-per-event pricing.
There is a small run start charge.
There is a per-post charge for each saved dataset item.
Small tests with the default input are inexpensive.
Final store pricing is shown on the Apify actor page before you run it.
How to use Substack Posts Scraper
-
Open the actor on Apify.
-
Add one or more Substack publication URLs or domains.
-
Set the maximum posts per publication.
-
Optionally add a search term or date filters.
-
Choose whether to include body HTML.
-
Click Start.
-
Download the dataset when the run finishes.
Input
Publication URLs or domains
Add URLs such as https://www.lennysnewsletter.com.
Add Substack domains such as https://example.substack.com.
Bare domains are also accepted.
Maximum posts per publication
Controls how many posts are saved from each publication.
Use a small number for testing.
Increase it for production monitoring.
Search term
Use a keyword to search publication archives.
Leave it empty to collect newest posts.
Date filters
Use dateFrom and dateTo to limit results by publication date.
Dates should use YYYY-MM-DD format.
Include body HTML
Enable this if your workflow needs public HTML.
Disable it for smaller exports.
Use feed fallback
Keep this enabled for broader publication coverage.
If one source is unavailable, the actor can still collect public feed data.
Concurrency
Controls how many publications are processed in parallel.
The default is conservative and reliable.
Output
Each result is a single Substack post record.
The dataset is ready for spreadsheets, BI tools, automation, and APIs.
Example item:
{"publicationDomain": "lennysnewsletter.com","title": "Example newsletter post","canonicalUrl": "https://www.lennysnewsletter.com/p/example","postDate": "2026-01-01T12:00:00.000Z","isPaidOnly": false,"isLimited": false,"wordCount": 1200,"reactionCount": 42,"commentCount": 7}
Tips for best results
Start with one publication and a low post limit.
Check the output fields before running a large batch.
Use date filters for recurring monitoring.
Use search terms for topical research.
Leave feed fallback enabled unless you need only archive-rich records.
Disable body HTML when you only need metadata.
Integrations
Send results to Google Sheets through Apify integrations.
Trigger runs from Zapier or Make.
Use the dataset API in dashboards.
Feed new posts into a CRM or lead database.
Schedule recurring runs for weekly media monitoring.
Connect results to a vector database for content analysis.
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('fetch_cat/substack-posts-scraper').call({publicationUrls: [{ url: 'https://www.lennysnewsletter.com' }],maxPostsPerPublication: 10});console.log(run.defaultDatasetId);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_APIFY_TOKEN')run = client.actor('fetch_cat/substack-posts-scraper').call(run_input={'publicationUrls': [{'url': 'https://www.lennysnewsletter.com'}],'maxPostsPerPublication': 10,})print(run['defaultDatasetId'])
cURL
curl -X POST 'https://api.apify.com/v2/acts/fetch_cat~substack-posts-scraper/runs?token=YOUR_APIFY_TOKEN' \-H 'Content-Type: application/json' \-d '{"publicationUrls":[{"url":"https://www.lennysnewsletter.com"}],"maxPostsPerPublication":10}'
MCP usage
Use Apify MCP to run this actor from Claude tools.
MCP URL format:
https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper
Claude Code setup:
$claude mcp add apify-substack --transport http https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper
Claude Desktop JSON config:
{"mcpServers": {"apify-substack": {"url": "https://mcp.apify.com/?tools=fetch_cat/substack-posts-scraper"}}}
Example prompts:
-
Run Substack Posts Scraper for Lenny's Newsletter and return the newest 10 post titles.
-
Collect public posts mentioning pricing from these three newsletters.
-
Export the latest creator newsletter posts to a CSV dataset.
Scheduling
Use Apify schedules to run this actor daily, weekly, or monthly.
Recurring runs are useful for media monitoring.
Combine date filters with schedules to collect fresh posts.
Data freshness
The actor collects data available at run time.
Publication owners can edit, delete, or restrict posts.
Run the actor regularly if freshness matters.
Limitations
The actor does not bypass paywalls.
Paid-only posts may contain only public previews.
Some publications use custom domains or settings that expose fewer fields.
Very old archives may have missing metadata.
Feeds may contain fewer fields than publication archives.
Troubleshooting
FAQ
Why did I get fewer posts than requested?
The publication may have fewer public posts, date filters may exclude posts, or paid posts may expose only limited preview data.
Why is body text missing?
The publication may not expose full public body text for that post. Enable body HTML only when you need it.
Why did a custom domain fail?
Check that the domain is a public Substack publication homepage and can be opened in a browser without login.
Legality
Legal and ethical use
Only collect public data.
Respect Substack authors and publication terms.
Do not use the actor to bypass subscriptions or access controls.
Use reasonable limits and schedules.
If you process personal data, make sure your use complies with applicable laws.
Related scrapers
Explore other Apify actors from fetch_cat for content research, social monitoring, and lead generation.
Use related actors together to enrich creator, publication, and company datasets.
Support
If a publication does not work, provide the run URL and input used.
Include whether the publication is a custom domain or a substack.com subdomain.
Share a small reproducible input when asking for help.
Changelog
Initial version collects public Substack publication posts and archive metadata.