Substack Leaderboard Scraper
Pricing
Pay per event
Substack Leaderboard Scraper
📊 Scrape public Substack leaderboards for ranked newsletters, author details, subscriber labels, and publication URLs.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Find ranked Substack publications from public category leaderboards. Export bestseller and rising newsletters with publication URLs, subscriber labels, author details, descriptions, and ranking context.
Use this actor when you need a clean dataset for newsletter sponsorship research, creator discovery, competitive intelligence, media lists, or partnership prospecting.
What does Substack Leaderboard Scraper do?
Substack Leaderboard Scraper collects public rows from Substack category leaderboards such as Technology, Business, Culture, Finance, Food & Drink, News, and more.
It uses public Substack leaderboard data and saves one dataset row per ranked publication.
Typical results include:
- 🏆 leaderboard rank
- 🗂️ category name and slug
- 📈 ranking tab: Top Bestsellers or Rising
- 📰 publication name and URL
- 👤 author name and profile URL
- 👥 subscriber labels such as thousands of paid subscribers
- 🔗 Substack hostname and subdomain
- 🧭 source leaderboard URL
Who is it for?
Sponsorship and growth teams
Use the dataset to discover newsletters that already have audience traction in a niche.
Creator partnership teams
Find creators by category and collect publication metadata before outreach.
Newsletter operators
Monitor adjacent categories to understand who is rising and how top publications position themselves.
Market researchers
Build a structured view of the Substack creator market by category.
Agencies and media buyers
Export publication URLs, authors, subscriber labels, and descriptions for campaign planning.
Why use this actor?
Substack leaderboards are useful, but they are built for browsing, not analysis. This actor turns those public pages into structured rows that can be filtered, joined, deduplicated, and exported.
Benefits:
- ⚡ HTTP-only scraping for fast low-cost runs
- 🎯 category slug input instead of internal category IDs
- 📊 bestseller and rising ranking tabs
- 🧾 dataset rows ready for CSV, JSON, Excel, Airtable, or CRM imports
- 🔁 repeatable monitoring of the same categories over time
What data can you extract?
| Field | Description |
|---|---|
categoryName | Human-readable leaderboard category |
rankingLabel | Top Bestsellers or Rising |
rank | Rank within that category/ranking page |
publicationName | Substack publication name |
publicationUrl | Public publication URL |
description | Public publication description or hero text |
authorName | Public author name when available |
authorUrl | Public Substack profile URL |
paidSubscriberLabel | Paid subscriber range label from Substack |
subscriberLabel | Broader subscriber label when available |
freeSubscriberCount | Free subscriber count text when exposed |
hasPodcast | Whether the publication has podcast support |
twitterScreenName | Twitter/X screen name when exposed |
sourceUrl | Leaderboard URL that produced the row |
How much does it cost to scrape Substack leaderboard rows?
Pricing is pay per event:
- Start event: $0.005 per run
- Leaderboard row event: starts at about $0.00018 per saved row on the BRONZE tier, with lower per-row prices on higher Apify tiers
That means 1,000 saved leaderboard rows cost about $0.18 on the BRONZE tier plus the small run start fee before Apify platform charges or plan-specific details.
Quick start
- Open the actor on Apify.
- Enter one or more category slugs, for example
technologyandbusiness. - Choose
paid,rising, or both ranking tabs. - Set a small
maxItemsfor your first run. - Start the actor.
- Export the dataset as CSV, JSON, or Excel.
Input options
categorySlugs
List of Substack leaderboard category slugs.
Examples:
technologybusinessculturefinancenewsfood
startUrls
Optional direct leaderboard URLs.
Examples:
https://substack.com/leaderboard/technologyhttps://substack.com/leaderboard/technology/risinghttps://substack.com/leaderboard/business/paid
rankings
Choose one or both:
paidfor Top Bestsellersrisingfor Rising publications
maxItems
Maximum rows saved across all selected categories and ranking tabs.
includeAllCategories
Set this to true to scrape every public category returned by Substack's leaderboard category API. Keep maxItems modest for the first run.
Example input
{"categorySlugs": ["technology", "business"],"rankings": ["paid", "rising"],"maxItems": 100,"includeAllCategories": false}
Example output
{"category": "technology","categoryName": "Technology","categoryId": 4,"rankingType": "paid","rankingLabel": "Top Bestsellers","rank": 1,"publicationId": 6349492,"publicationName": "SemiAnalysis","publicationUrl": "https://newsletter.semianalysis.com","description": "Bridging the gap between the world's most important industry, semiconductors, and business.","authorName": "Dylan Patel","authorHandle": "semianalysis","authorUrl": "https://substack.com/@semianalysis","paidSubscriberLabel": "Thousands of paid subscribers","subscriberLabel": "Hundreds of thousands of subscribers","freeSubscriberCount": "287,000","hasPodcast": false,"sourceUrl": "https://substack.com/leaderboard/technology/paid"}
Tips for better results
- Start with one or two categories.
- Use both
paidandrisingwhen you want mature and emerging publications. - Use
maxItemsto control cost and dataset size. - Run the same input weekly to monitor ranking changes.
- Combine with your CRM or spreadsheet to track outreach status.
Integrations
Google Sheets
Export the dataset as CSV and import it into Google Sheets for review and tagging.
Airtable
Use the Apify integration to sync publication rows into an Airtable base.
CRM systems
Use publication URLs, author names, and profile URLs as enrichment inputs for sponsorship outreach.
BI dashboards
Track category rank, subscriber labels, and rising publications over time.
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('automation-lab/substack-leaderboard-scraper').call({categorySlugs: ['technology'],rankings: ['paid', 'rising'],maxItems: 50});console.log(run.defaultDatasetId);
Python
from apify_client import ApifyClientimport osclient = ApifyClient(os.environ['APIFY_TOKEN'])run = client.actor('automation-lab/substack-leaderboard-scraper').call(run_input={'categorySlugs': ['technology'],'rankings': ['paid', 'rising'],'maxItems': 50,})print(run['defaultDatasetId'])
cURL
curl -X POST "https://api.apify.com/v2/acts/automation-lab~substack-leaderboard-scraper/runs?token=$APIFY_TOKEN" \-H 'Content-Type: application/json' \-d '{"categorySlugs":["technology"],"rankings":["paid"],"maxItems":25}'
MCP usage
You can use this actor through Apify MCP tools in Claude Desktop or Claude Code.
MCP URL:
https://mcp.apify.com/?tools=automation-lab/substack-leaderboard-scraper
Claude Code quick add:
$claude mcp add apify-substack-leaderboard https://mcp.apify.com/?tools=automation-lab/substack-leaderboard-scraper
Claude Desktop / JSON MCP config:
{"mcpServers": {"apify-substack-leaderboard": {"url": "https://mcp.apify.com/?tools=automation-lab/substack-leaderboard-scraper"}}}
Example prompts:
- "Scrape the Technology and Business Substack leaderboards and summarize top sponsorship targets."
- "Find rising Substack newsletters in Finance and return publication URLs with subscriber labels."
- "Export top Culture newsletters and group them by author details."
Data quality notes
Substack exposes subscriber counts as labels and rounded text, not always exact numbers. The actor preserves those public labels and adds magnitude fields when Substack provides them.
Some publications may not expose a Twitter/X handle, author bio, or podcast flag. Those fields are returned as null when unavailable.
FAQ
Troubleshooting
Why did I get fewer rows than maxItems?
The selected category/ranking combination may have fewer public rows than your limit, or the actor reached the end of available leaderboard pages.
Why are subscriber counts rounded?
Substack leaderboards typically show public ranges or rounded counts. The actor does not infer private exact subscriber totals.
Why was a category skipped?
Use the category slug from the public leaderboard URL. If Substack does not return that slug in its leaderboard category API, the actor skips it and logs a warning.
Legality
This actor collects publicly available information from public Substack leaderboard endpoints. You are responsible for using the data lawfully, respecting applicable terms, privacy rules, and outreach regulations.
Is scraping Substack leaderboards legal?
Yes, the actor is designed for public leaderboard data only. It does not access private dashboards, subscriber lists, paid posts, or account-only content.
Related scrapers
Other automation-lab actors that may fit the same workflow:
- https://apify.com/automation-lab/substack-scraper
- https://apify.com/automation-lab/newsletter-scraper
- https://apify.com/automation-lab/website-content-crawler
- https://apify.com/automation-lab/rss-feed-scraper
Changelog
0.1
Initial version with public Substack category leaderboards, bestseller and rising ranking tabs, subscriber labels, author details, and publication URLs.
Limitations
The actor focuses on leaderboard rows. It does not scrape individual posts, paid content, private subscriber lists, or account-only dashboards.
Support
If a public Substack leaderboard category stops working, include the category slug, input JSON, run ID, and expected output when reporting the issue.