Reddit Comments Search Scraper avatar

Reddit Comments Search Scraper

Pricing

from $2.99 / 1,000 results

Go to Apify Store
Reddit Comments Search Scraper

Reddit Comments Search Scraper

Pricing

from $2.99 / 1,000 results

Rating

0.0

(0)

Developer

SimpleAPI

SimpleAPI

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

🔍 Reddit Search Scraper

Scrape Reddit search results and subreddit listings at scale — paste any Reddit URL (search, subreddit, or subreddit search) and the actor pulls clean structured records from public Reddit data archives (no Reddit login or API key required) and live-saves each post to the dataset.

ℹ️ How it works: Reddit shut down unauthenticated access to its public .json endpoints. This actor instead reads from two public Reddit data archives — PullPush (primary, full-text + subreddit search) and Arctic Shift (fallback for subreddit/author queries) — so it keeps working without you registering a Reddit OAuth app.

💡 Built for marketers, researchers, AI/LLM data pipelines, and competitive-intelligence teams who need clean, structured Reddit data without scraping headaches.


✨ Why choose this Actor?

  • 🚀 Fast — pure async HTTP, no headless browser overhead.
  • 🔓 No credentials needed — reads public Reddit archives, so there's no OAuth app, client ID, or rate-limited Reddit key to manage.
  • 🛡️ Smart proxy ladder — starts direct, auto-falls-back to datacenter → residential if an archive rate-limits the request IP, and stays on residential once it kicks in.
  • 🔁 Resilient — per-request retries with jittered backoff, and 3 retries on the residential tier before giving up.
  • 💾 Live saving — every post is pushed to the dataset as it's scraped, so a mid-run crash never loses work.
  • 🧱 Bulk URLs — feed it any number of Reddit URLs in one run.
  • 📊 Pre-built dataset views — Overview, Post, Subreddit, Author, Content, and Full Record tabs in the Apify Console.

🎯 Key features

  • 🌐 Bulk URL input (search URLs, subreddit URLs, subreddit search URLs)
  • 🔎 Optional keyword fallback when no URLs are supplied
  • 📊 Sort by Relevance / Hot / Top / New / Most Comments
  • 🔞 Safe-search toggle
  • 📦 Hard cap on total items via maxItems
  • 🛡️ Default no-proxy, auto-escalating fallback ladder
  • 📝 Detailed real-time logs so you can watch progress live

📥 Input

{
"urls": [
{ "url": "https://www.reddit.com/search/?q=ai&sort=new" },
{ "url": "https://www.reddit.com/r/python/" }
],
"query": "artificial intelligence",
"sort": "relevance",
"safeSearch": "off",
"maxItems": 300,
"maxRetries": 3,
"proxyConfiguration": { "useApifyProxy": false }
}
FieldTypeDescription
urlsarrayReddit URLs to scrape (search, subreddit, or subreddit search).
querystringKeyword fallback used only when urls is empty.
sortenumrelevance / hot / top / new / comments.
safeSearchenumoff (include NSFW) or on (hide NSFW).
maxItemsintegerHard cap on total posts across all URLs.
maxRetriesintegerPer-request retries before escalating proxy tier.
proxyConfigurationobjectStandard Apify proxy input. Defaults to no proxy.

📤 Output

Each dataset record matches the original reference shape exactly, plus a few top-level mirror fields so the table views work without nested-path lookups:

{
"post": {
"title": "The more young people use AI, the more they hate it",
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/",
"score": 22036,
"comment_count": 1612
},
"subreddit": { "name": "technology" },
"author": { "name": "spherocytes" },
"contentText": "",
"content_type": "link",
"created_timestamp": "2026-04-30T12:34:21.000000+0000",
"title": "The more young people use AI, the more they hate it",
"subreddit_name": "technology",
"author_name": "spherocytes",
"score": 22036,
"comment_count": 1612,
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/"
}

🚀 How to use the Actor (via Apify Console)

  1. 🔐 Log in at console.apify.comActors.
  2. 🔎 Find Reddit Search Scraper and open it.
  3. 📝 Paste one or more Reddit URLs (or type a keyword in the query field).
  4. ⚙️ Pick a sort (Relevance / Hot / Top / New / Most Comments) and set maxItems.
  5. 🛡️ Leave Proxy on default (no proxy) — the scraper auto-escalates if Reddit pushes back.
  6. ▶️ Click Start.
  7. 📊 Watch logs in real time; open the Output tab as records stream in.
  8. 📁 Export to JSON / CSV / Excel.

🛡️ Proxy strategy

The scraper uses a three-tier ladder (the archives can rate-limit a busy IP):

TierWhen it's used
🌐 DirectDefault — the archives usually serve fine without a proxy.
🏢 DatacenterAuto-engaged if direct requests get 403 / 429 / rate-limited.
🏠 ResidentialAuto-engaged if datacenter still fails. Retries then sticks for the rest of the run.

You can also start higher up the ladder by selecting a proxy group in the input.


📊 Sort & data-source notes

  • Source: PullPush handles global keyword search and subreddit/author search; Arctic Shift serves subreddit- and author-scoped queries as a fast fallback. Both are public Reddit archives.
  • Sort mapping — Reddit's sort intents map onto the archives' sort fields:
    • 🎯 Relevance / ⭐ Top / 🔥 Hot → highest score first
    • 🆕 New → newest created first
    • 💬 Most Comments → highest comment count first
  • Coverage: archives index publicly posted content; very recent posts (last few minutes) or removed content may not appear. Pagination walks backward in time, so large maxItems runs are ordered newest-to-oldest within each time window.

💼 Best use cases

  • 🤖 Building AI / LLM training datasets from Reddit discussion
  • 📊 Brand monitoring & sentiment analysis
  • 🧠 Market research and competitive intelligence
  • 📝 Content trend discovery
  • 🔬 Academic research on online communities

❓ Frequently asked questions

Q: Does it scrape comments? A: This actor returns post-level metadata (title, score, comment count, body text). For per-post comment threads, use an additional actor or extend this one to fetch <permalink>.json.

Q: Does it support private subreddits? A: No — only publicly accessible subreddits and search results.

Q: Do I need a Reddit account or API key? A: No. The actor reads public Reddit data archives, so there's nothing to register or authenticate.

Q: What happens if an archive rate-limits me? A: The scraper auto-escalates the proxy tier (direct → datacenter → residential) and retries. If every tier still fails, the run ends with a clear status message.


📨 Support and feedback

For issues, custom features, or feedback: dev.scraperengine@gmail.com


  • Only collect data from publicly accessible Reddit pages.
  • Respect Reddit's terms of service and applicable privacy laws (GDPR / CCPA).
  • The end user is responsible for downstream use of the data.