🔍 Google Scholar Scraper avatar

🔍 Google Scholar Scraper

Pricing

from $3.99 / 1,000 results

Go to Apify Store
🔍 Google Scholar Scraper

🔍 Google Scholar Scraper

Pricing

from $3.99 / 1,000 results

Rating

0.0

(0)

Developer

ScraperX

ScraperX

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

📚 Google Scholar Scraper

A blazing-fast, production-grade Apify Actor that pulls academic papers from the global Scholar knowledge graph (OpenAlex + Semantic Scholar) and delivers clean, structured JSON ready for analysis, citation review, or literature dashboards.

Bulk in. Citations out. Throw a list of keywords or Google Scholar URLs and walk away — the Actor does the heavy lifting.


🚀 Why Choose This Actor?

  • 🧠 Multi-source intelligence — combines OpenAlex (250 M+ works) and Semantic Scholar so you never miss a paper.
  • 🌐 Smart auto-escalating proxy — starts direct, falls back to Datacenter → Residential only when needed. You don't have to think about it.
  • Live streaming results — each paper hits the dataset the moment it's scraped. A crash mid-run still leaves you with rows.
  • 🧹 Built-in deduplication, filters, and sort — citations, recency, open-access, article-type filters out of the box.
  • 🪶 Light & fast — no headless browser, no Playwright overhead — just well-engineered HTTP calls.
  • 💸 Pay only for what you use — no hidden compute time waste.

✨ Key Features

  • 🔎 Bulk search — submit dozens of queries / Scholar URLs at once.
  • 📥 Up to 5 000 papers per query with cursor-based pagination.
  • 🏷️ Rich metadata — title, authors, year, citations, source, PDF link, abstract snippet, etc.
  • 🛡️ Auto-rotating proxies with sticky residential mode after escalation.
  • 📊 Two pre-configured dataset views — Overview (essentials) + Full Details (everything).
  • 📝 Per-query sectioning — every record carries a query field so you can split results by topic in seconds.

⚙️ Input

FieldTypeDescription
searchQueriesarray of stringsSearch keywords or Scholar URLs (e.g. https://scholar.google.com/scholar?q=...). Required.
maxItemsinteger (1 – 5000)Max papers per query. Default 100.
sortByenumrelevance (default) | cited_by_count
filterenumall (default) | has_pdf | open_access | recent_5_years
articleTypeenumany (default) | journal | conference | book | preprint
proxyConfigurationobjectOptional. Defaults to no proxy — the actor will auto-escalate to Datacenter/Residential on rate-limits.

Example input

{
"searchQueries": [
"Tomato Shelf Life Prediction using IoT and Machine Learning",
"Federated learning healthcare"
],
"maxItems": 100,
"sortBy": "cited_by_count",
"filter": "open_access",
"articleType": "journal",
"proxyConfiguration": { "useApifyProxy": false }
}

📦 Output

Each dataset row matches the well-known Scholar / SerpAPI-style shape:

{
"query": "Tomato Shelf Life Prediction using IoT and Machine Learning",
"cidCode": "W4409060190",
"didCode": "W4409060190",
"lidCode": "",
"aidCode": "W4409060190",
"resultIndex": 0,
"type": "ARTICLE",
"title": "Tomato Shelf Life Prediction using IoT and Machine Learning",
"link": "https://doi.org/10.1109/iciset62123.2024.10939467",
"documentLink": "",
"documentType": "",
"fullAttribution": "Nazmul Arafin Naim, Raisul Islam, Mohammed Saifuddin, ... - , 2024",
"authors": "Nazmul Arafin Naim, Raisul Islam, Mohammed Saifuddin, ...",
"publication": "",
"year": 2024,
"source": "",
"searchMatch": "Predicting tomato shelf life is crucial for ...",
"citations": 1,
"citationsLink": "https://openalex.org/W4409060190",
"relatedArticlesLink": "https://openalex.org/W4409060190",
"versions": 1,
"versionsLink": "https://openalex.org/W4409060190"
}
FieldMeaning
queryOriginal query that produced this row (lets you group sections).
cidCode / didCode / aidCodeStable record identifiers (OpenAlex ID or hash).
resultIndexPosition within that query's result set.
titlePaper title.
authorsUp to five lead authors.
publication / sourceJournal / venue name.
yearPublication year.
citationsTotal citation count.
documentLink / documentTypeDirect PDF/OA URL when available.
searchMatchAbstract snippet (first ~300 chars).
citationsLink / relatedArticlesLink / versionsLinkApify-friendly clickable links.

🚀 How to Use (Apify Console)

  1. Log in at https://console.apify.comActors.
  2. Open Google Scholar Scraper.
  3. Paste your queries (or Scholar URLs) into Search Queries.
  4. Tune maxItems, sortBy, filter, articleType to taste.
  5. Leave Proxy on its default (no proxy) — the Actor auto-escalates on rate-limits.
  6. Click ▶ Start.
  7. Watch the live log — every section reports progress in real time.
  8. Open the Output tab and export to JSON / CSV / XLSX.

🤖 Use via API

curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"searchQueries": ["Federated learning healthcare"],
"maxItems": 50,
"sortBy": "cited_by_count"
}'

🎯 Best Use Cases

  • 🔬 Literature reviews — pull a full corpus on a research topic in minutes.
  • 📈 Citation tracking — monitor how a paper or author cluster grows over time.
  • 🧪 Trend detection — slice by recent_5_years to spot emerging directions.
  • 📚 Library / EdTech tools — feed clean, normalised records into your platform.
  • 🤖 AI agents — give RAG/LLM pipelines high-quality academic context.

💸 Pricing

This Actor is best deployed under the Pay-per-event (PPE) model:

  • One event = one paper pushed to the dataset (apify-default-dataset-item).
  • No surprise compute charges, no rental — you pay for results, not waiting.
  • Free 5-second startup included by Apify on every run.

Configure the exact event prices in the Apify Console → Publication → Monetization tab.


❓ Frequently Asked Questions

Q: Do I need a Google Scholar account? No. We connect to OpenAlex + Semantic Scholar — both are open scholarly knowledge graphs.

Q: How fresh is the data? OpenAlex syncs daily with Crossref, DOAJ, PubMed and others. Most papers appear within 24 – 48 h of publication.

Q: Will I get blocked? Unlikely — the actor uses official, rate-limit-friendly APIs and auto-escalates through Datacenter → Residential proxies if a host ever pushes back.

Q: Can I pass full Scholar URLs instead of keywords? Yes. URLs like https://scholar.google.com/scholar?q=... are auto-parsed for the q= term.

Q: Why two views in the output? The Overview view is great for quick scanning. The Full Details view is the complete record — same data, more columns.


🛟 Support & Feedback

Found a bug or have a feature request? Open an issue or message us through the Apify Store page. We respond fast.


  • Data is collected only from publicly available sources (OpenAlex, Semantic Scholar).
  • You are responsible for downstream use that complies with GDPR/CCPA, target ToS, and copyright.
  • Respect rate-limits and robots.txt — being a good citizen reduces blocks too.