Reddit Comments Search Scraper
Pricing
from $4.99 / 1,000 results
Reddit Comments Search Scraper
Pricing
from $4.99 / 1,000 results
Rating
0.0
(0)
Developer
API Empire
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
🔍 Reddit Search Scraper
Scrape Reddit search results and subreddit listings at scale — paste any Reddit URL (search, subreddit, or subreddit search) and the actor pulls clean structured records from public Reddit data archives (no Reddit login or API key required) and live-saves each post to the dataset.
ℹ️ How it works: Reddit shut down unauthenticated access to its public
.jsonendpoints. This actor instead reads from two public Reddit data archives — PullPush (primary, full-text + subreddit search) and Arctic Shift (fallback for subreddit/author queries) — so it keeps working without you registering a Reddit OAuth app.
💡 Built for marketers, researchers, AI/LLM data pipelines, and competitive-intelligence teams who need clean, structured Reddit data without scraping headaches.
✨ Why choose this Actor?
- 🚀 Fast — pure async HTTP, no headless browser overhead.
- 🔓 No credentials needed — reads public Reddit archives, so there's no OAuth app, client ID, or rate-limited Reddit key to manage.
- 🛡️ Smart proxy ladder — starts direct, auto-falls-back to datacenter → residential if an archive rate-limits the request IP, and stays on residential once it kicks in.
- 🔁 Resilient — per-request retries with jittered backoff, and 3 retries on the residential tier before giving up.
- 💾 Live saving — every post is pushed to the dataset as it's scraped, so a mid-run crash never loses work.
- 🧱 Bulk URLs — feed it any number of Reddit URLs in one run.
- 📊 Pre-built dataset views — Overview, Post, Subreddit, Author, Content, and Full Record tabs in the Apify Console.
🎯 Key features
- 🌐 Bulk URL input (search URLs, subreddit URLs, subreddit search URLs)
- 🔎 Optional keyword fallback when no URLs are supplied
- 📊 Sort by Relevance / Hot / Top / New / Most Comments
- 🔞 Safe-search toggle
- 📦 Hard cap on total items via
maxItems - 🛡️ Default no-proxy, auto-escalating fallback ladder
- 📝 Detailed real-time logs so you can watch progress live
📥 Input
{"urls": [{ "url": "https://www.reddit.com/search/?q=ai&sort=new" },{ "url": "https://www.reddit.com/r/python/" }],"query": "artificial intelligence","sort": "relevance","safeSearch": "off","maxItems": 300,"maxRetries": 3,"proxyConfiguration": { "useApifyProxy": false }}
| Field | Type | Description |
|---|---|---|
urls | array | Reddit URLs to scrape (search, subreddit, or subreddit search). |
query | string | Keyword fallback used only when urls is empty. |
sort | enum | relevance / hot / top / new / comments. |
safeSearch | enum | off (include NSFW) or on (hide NSFW). |
maxItems | integer | Hard cap on total posts across all URLs. |
maxRetries | integer | Per-request retries before escalating proxy tier. |
proxyConfiguration | object | Standard Apify proxy input. Defaults to no proxy. |
📤 Output
Each dataset record matches the original reference shape exactly, plus a few top-level mirror fields so the table views work without nested-path lookups:
{"post": {"title": "The more young people use AI, the more they hate it","url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/","score": 22036,"comment_count": 1612},"subreddit": { "name": "technology" },"author": { "name": "spherocytes" },"contentText": "","content_type": "link","created_timestamp": "2026-04-30T12:34:21.000000+0000","title": "The more young people use AI, the more they hate it","subreddit_name": "technology","author_name": "spherocytes","score": 22036,"comment_count": 1612,"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/"}
🚀 How to use the Actor (via Apify Console)
- 🔐 Log in at console.apify.com → Actors.
- 🔎 Find Reddit Search Scraper and open it.
- 📝 Paste one or more Reddit URLs (or type a keyword in the
queryfield). - ⚙️ Pick a
sort(Relevance / Hot / Top / New / Most Comments) and setmaxItems. - 🛡️ Leave Proxy on default (no proxy) — the scraper auto-escalates if Reddit pushes back.
- ▶️ Click Start.
- 📊 Watch logs in real time; open the Output tab as records stream in.
- 📁 Export to JSON / CSV / Excel.
🛡️ Proxy strategy
The scraper uses a three-tier ladder (the archives can rate-limit a busy IP):
| Tier | When it's used |
|---|---|
| 🌐 Direct | Default — the archives usually serve fine without a proxy. |
| 🏢 Datacenter | Auto-engaged if direct requests get 403 / 429 / rate-limited. |
| 🏠 Residential | Auto-engaged if datacenter still fails. Retries then sticks for the rest of the run. |
You can also start higher up the ladder by selecting a proxy group in the input.
📊 Sort & data-source notes
- Source: PullPush handles global keyword search and subreddit/author search; Arctic Shift serves subreddit- and author-scoped queries as a fast fallback. Both are public Reddit archives.
- Sort mapping — Reddit's sort intents map onto the archives' sort fields:
- 🎯 Relevance / ⭐ Top / 🔥 Hot → highest score first
- 🆕 New → newest created first
- 💬 Most Comments → highest comment count first
- Coverage: archives index publicly posted content; very recent posts (last few minutes) or removed content may not appear. Pagination walks backward in time, so large
maxItemsruns are ordered newest-to-oldest within each time window.
💼 Best use cases
- 🤖 Building AI / LLM training datasets from Reddit discussion
- 📊 Brand monitoring & sentiment analysis
- 🧠 Market research and competitive intelligence
- 📝 Content trend discovery
- 🔬 Academic research on online communities
❓ Frequently asked questions
Q: Does it scrape comments?
A: This actor returns post-level metadata (title, score, comment count, body text). For per-post comment threads, use an additional actor or extend this one to fetch <permalink>.json.
Q: Does it support private subreddits? A: No — only publicly accessible subreddits and search results.
Q: Do I need a Reddit account or API key? A: No. The actor reads public Reddit data archives, so there's nothing to register or authenticate.
Q: What happens if an archive rate-limits me? A: The scraper auto-escalates the proxy tier (direct → datacenter → residential) and retries. If every tier still fails, the run ends with a clear status message.
📨 Support and feedback
For issues, custom features, or feedback: dev.scraperengine@gmail.com
⚠️ Legal & ethical use
- Only collect data from publicly accessible Reddit pages.
- Respect Reddit's terms of service and applicable privacy laws (GDPR / CCPA).
- The end user is responsible for downstream use of the data.