Reddit Comments Search Scraper avatar

Reddit Comments Search Scraper

Pricing

from $4.99 / 1,000 results

Go to Apify Store
Reddit Comments Search Scraper

Reddit Comments Search Scraper

Scrape Reddit comments by URL or keyword. Returns structured records with subreddit, author, score, comment count, content, and timestamps. Auto-falls-back through direct → datacenter → residential proxies if Reddit rate-limits the request.

Pricing

from $4.99 / 1,000 results

Rating

0.0

(0)

Developer

Scrapier

Scrapier

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

🔍 Reddit Search Scraper

Scrape Reddit search results and subreddit listings at scale — paste any Reddit URL (search, subreddit, or subreddit search) and the actor pulls clean structured records from public Reddit data archives (no Reddit login or API key required) and live-saves each post to the dataset.

ℹ️ How it works: Reddit shut down unauthenticated access to its public .json endpoints. This actor instead reads from two public Reddit data archives — PullPush (primary, full-text + subreddit search) and Arctic Shift (fallback for subreddit/author queries) — so it keeps working without you registering a Reddit OAuth app.

💡 Built for marketers, researchers, AI/LLM data pipelines, and competitive-intelligence teams who need clean, structured Reddit data without scraping headaches.


✨ Why choose this Actor?

  • 🚀 Fast — pure async HTTP, no headless browser overhead.
  • 🔓 No credentials needed — reads public Reddit archives, so there's no OAuth app, client ID, or rate-limited Reddit key to manage.
  • 🛡️ Smart proxy ladder — starts direct, auto-falls-back to datacenter → residential if an archive rate-limits the request IP, and stays on residential once it kicks in.
  • 🔁 Resilient — per-request retries with jittered backoff, and 3 retries on the residential tier before giving up.
  • 💾 Live saving — every post is pushed to the dataset as it's scraped, so a mid-run crash never loses work.
  • 🧱 Bulk URLs — feed it any number of Reddit URLs in one run.
  • 📊 Pre-built dataset views — Overview, Post, Subreddit, Author, Content, and Full Record tabs in the Apify Console.

🎯 Key features

  • 🌐 Bulk URL input (search URLs, subreddit URLs, subreddit search URLs)
  • 🔎 Optional keyword fallback when no URLs are supplied
  • 📊 Sort by Relevance / Hot / Top / New / Most Comments
  • 🔞 Safe-search toggle
  • 📦 Hard cap on total items via maxItems
  • 🛡️ Default no-proxy, auto-escalating fallback ladder
  • 📝 Detailed real-time logs so you can watch progress live

📥 Input

{
"urls": [
{ "url": "https://www.reddit.com/search/?q=ai&sort=new" },
{ "url": "https://www.reddit.com/r/python/" }
],
"query": "artificial intelligence",
"sort": "relevance",
"safeSearch": "off",
"maxItems": 300,
"maxRetries": 3,
"proxyConfiguration": { "useApifyProxy": false }
}
FieldTypeDescription
urlsarrayReddit URLs to scrape (search, subreddit, or subreddit search).
querystringKeyword fallback used only when urls is empty.
sortenumrelevance / hot / top / new / comments.
safeSearchenumoff (include NSFW) or on (hide NSFW).
maxItemsintegerHard cap on total posts across all URLs.
maxRetriesintegerPer-request retries before escalating proxy tier.
proxyConfigurationobjectStandard Apify proxy input. Defaults to no proxy.

📤 Output

Each dataset record matches the original reference shape exactly, plus a few top-level mirror fields so the table views work without nested-path lookups:

{
"post": {
"title": "The more young people use AI, the more they hate it",
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/",
"score": 22036,
"comment_count": 1612
},
"subreddit": { "name": "technology" },
"author": { "name": "spherocytes" },
"contentText": "",
"content_type": "link",
"created_timestamp": "2026-04-30T12:34:21.000000+0000",
"title": "The more young people use AI, the more they hate it",
"subreddit_name": "technology",
"author_name": "spherocytes",
"score": 22036,
"comment_count": 1612,
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/"
}

🚀 How to use the Actor (via Apify Console)

  1. 🔐 Log in at console.apify.comActors.
  2. 🔎 Find Reddit Search Scraper and open it.
  3. 📝 Paste one or more Reddit URLs (or type a keyword in the query field).
  4. ⚙️ Pick a sort (Relevance / Hot / Top / New / Most Comments) and set maxItems.
  5. 🛡️ Leave Proxy on default (no proxy) — the scraper auto-escalates if Reddit pushes back.
  6. ▶️ Click Start.
  7. 📊 Watch logs in real time; open the Output tab as records stream in.
  8. 📁 Export to JSON / CSV / Excel.

🛡️ Proxy strategy

The scraper uses a three-tier ladder (the archives can rate-limit a busy IP):

TierWhen it's used
🌐 DirectDefault — the archives usually serve fine without a proxy.
🏢 DatacenterAuto-engaged if direct requests get 403 / 429 / rate-limited.
🏠 ResidentialAuto-engaged if datacenter still fails. Retries then sticks for the rest of the run.

You can also start higher up the ladder by selecting a proxy group in the input.


📊 Sort & data-source notes

  • Source: PullPush handles global keyword search and subreddit/author search; Arctic Shift serves subreddit- and author-scoped queries as a fast fallback. Both are public Reddit archives.
  • Sort mapping — Reddit's sort intents map onto the archives' sort fields:
    • 🎯 Relevance / ⭐ Top / 🔥 Hot → highest score first
    • 🆕 New → newest created first
    • 💬 Most Comments → highest comment count first
  • Coverage: archives index publicly posted content; very recent posts (last few minutes) or removed content may not appear. Pagination walks backward in time, so large maxItems runs are ordered newest-to-oldest within each time window.

💼 Best use cases

  • 🤖 Building AI / LLM training datasets from Reddit discussion
  • 📊 Brand monitoring & sentiment analysis
  • 🧠 Market research and competitive intelligence
  • 📝 Content trend discovery
  • 🔬 Academic research on online communities

❓ Frequently asked questions

Q: Does it scrape comments? A: This actor returns post-level metadata (title, score, comment count, body text). For per-post comment threads, use an additional actor or extend this one to fetch <permalink>.json.

Q: Does it support private subreddits? A: No — only publicly accessible subreddits and search results.

Q: Do I need a Reddit account or API key? A: No. The actor reads public Reddit data archives, so there's nothing to register or authenticate.

Q: What happens if an archive rate-limits me? A: The scraper auto-escalates the proxy tier (direct → datacenter → residential) and retries. If every tier still fails, the run ends with a clear status message.


📨 Support and feedback

For issues, custom features, or feedback: dev.scraperengine@gmail.com


  • Only collect data from publicly accessible Reddit pages.
  • Respect Reddit's terms of service and applicable privacy laws (GDPR / CCPA).
  • The end user is responsible for downstream use of the data.