🔍 Google Scholar Scraper
Pricing
from $3.99 / 1,000 results
🔍 Google Scholar Scraper
Pricing
from $3.99 / 1,000 results
Rating
0.0
(0)
Developer
ScraperForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
📚 Google Scholar Scraper
A blazing-fast, production-grade Apify Actor that pulls academic papers from the global Scholar knowledge graph (OpenAlex + Semantic Scholar) and delivers clean, structured JSON ready for analysis, citation review, or literature dashboards.
Bulk in. Citations out. Throw a list of keywords or Google Scholar URLs and walk away — the Actor does the heavy lifting.
🚀 Why Choose This Actor?
- 🧠 Multi-source intelligence — combines OpenAlex (250 M+ works) and Semantic Scholar so you never miss a paper.
- 🌐 Smart auto-escalating proxy — starts direct, falls back to Datacenter → Residential only when needed. You don't have to think about it.
- ⚡ Live streaming results — each paper hits the dataset the moment it's scraped. A crash mid-run still leaves you with rows.
- 🧹 Built-in deduplication, filters, and sort — citations, recency, open-access, article-type filters out of the box.
- 🪶 Light & fast — no headless browser, no Playwright overhead — just well-engineered HTTP calls.
- 💸 Pay only for what you use — no hidden compute time waste.
✨ Key Features
- 🔎 Bulk search — submit dozens of queries / Scholar URLs at once.
- 📥 Up to 5 000 papers per query with cursor-based pagination.
- 🏷️ Rich metadata — title, authors, year, citations, source, PDF link, abstract snippet, etc.
- 🛡️ Auto-rotating proxies with sticky residential mode after escalation.
- 📊 Two pre-configured dataset views — Overview (essentials) + Full Details (everything).
- 📝 Per-query sectioning — every record carries a
queryfield so you can split results by topic in seconds.
⚙️ Input
| Field | Type | Description |
|---|---|---|
searchQueries ✱ | array of strings | Search keywords or Scholar URLs (e.g. https://scholar.google.com/scholar?q=...). Required. |
maxItems | integer (1 – 5000) | Max papers per query. Default 100. |
sortBy | enum | relevance (default) | cited_by_count |
filter | enum | all (default) | has_pdf | open_access | recent_5_years |
articleType | enum | any (default) | journal | conference | book | preprint |
proxyConfiguration | object | Optional. Defaults to no proxy — the actor will auto-escalate to Datacenter/Residential on rate-limits. |
Example input
{"searchQueries": ["Tomato Shelf Life Prediction using IoT and Machine Learning","Federated learning healthcare"],"maxItems": 100,"sortBy": "cited_by_count","filter": "open_access","articleType": "journal","proxyConfiguration": { "useApifyProxy": false }}
📦 Output
Each dataset row matches the well-known Scholar / SerpAPI-style shape:
{"query": "Tomato Shelf Life Prediction using IoT and Machine Learning","cidCode": "W4409060190","didCode": "W4409060190","lidCode": "","aidCode": "W4409060190","resultIndex": 0,"type": "ARTICLE","title": "Tomato Shelf Life Prediction using IoT and Machine Learning","link": "https://doi.org/10.1109/iciset62123.2024.10939467","documentLink": "","documentType": "","fullAttribution": "Nazmul Arafin Naim, Raisul Islam, Mohammed Saifuddin, ... - , 2024","authors": "Nazmul Arafin Naim, Raisul Islam, Mohammed Saifuddin, ...","publication": "","year": 2024,"source": "","searchMatch": "Predicting tomato shelf life is crucial for ...","citations": 1,"citationsLink": "https://openalex.org/W4409060190","relatedArticlesLink": "https://openalex.org/W4409060190","versions": 1,"versionsLink": "https://openalex.org/W4409060190"}
| Field | Meaning |
|---|---|
query | Original query that produced this row (lets you group sections). |
cidCode / didCode / aidCode | Stable record identifiers (OpenAlex ID or hash). |
resultIndex | Position within that query's result set. |
title | Paper title. |
authors | Up to five lead authors. |
publication / source | Journal / venue name. |
year | Publication year. |
citations | Total citation count. |
documentLink / documentType | Direct PDF/OA URL when available. |
searchMatch | Abstract snippet (first ~300 chars). |
citationsLink / relatedArticlesLink / versionsLink | Apify-friendly clickable links. |
🚀 How to Use (Apify Console)
- Log in at https://console.apify.com → Actors.
- Open Google Scholar Scraper.
- Paste your queries (or Scholar URLs) into Search Queries.
- Tune
maxItems,sortBy,filter,articleTypeto taste. - Leave Proxy on its default (no proxy) — the Actor auto-escalates on rate-limits.
- Click ▶ Start.
- Watch the live log — every section reports progress in real time.
- Open the Output tab and export to JSON / CSV / XLSX.
🤖 Use via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/run-sync-get-dataset-items?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"searchQueries": ["Federated learning healthcare"],"maxItems": 50,"sortBy": "cited_by_count"}'
🎯 Best Use Cases
- 🔬 Literature reviews — pull a full corpus on a research topic in minutes.
- 📈 Citation tracking — monitor how a paper or author cluster grows over time.
- 🧪 Trend detection — slice by
recent_5_yearsto spot emerging directions. - 📚 Library / EdTech tools — feed clean, normalised records into your platform.
- 🤖 AI agents — give RAG/LLM pipelines high-quality academic context.
💸 Pricing
This Actor is best deployed under the Pay-per-event (PPE) model:
- One event = one paper pushed to the dataset (
apify-default-dataset-item). - No surprise compute charges, no rental — you pay for results, not waiting.
- Free 5-second startup included by Apify on every run.
Configure the exact event prices in the Apify Console → Publication → Monetization tab.
❓ Frequently Asked Questions
Q: Do I need a Google Scholar account? No. We connect to OpenAlex + Semantic Scholar — both are open scholarly knowledge graphs.
Q: How fresh is the data? OpenAlex syncs daily with Crossref, DOAJ, PubMed and others. Most papers appear within 24 – 48 h of publication.
Q: Will I get blocked? Unlikely — the actor uses official, rate-limit-friendly APIs and auto-escalates through Datacenter → Residential proxies if a host ever pushes back.
Q: Can I pass full Scholar URLs instead of keywords?
Yes. URLs like https://scholar.google.com/scholar?q=... are auto-parsed for the q= term.
Q: Why two views in the output? The Overview view is great for quick scanning. The Full Details view is the complete record — same data, more columns.
🛟 Support & Feedback
Found a bug or have a feature request? Open an issue or message us through the Apify Store page. We respond fast.
⚖️ Cautions / Legal
- Data is collected only from publicly available sources (OpenAlex, Semantic Scholar).
- You are responsible for downstream use that complies with GDPR/CCPA, target ToS, and copyright.
- Respect rate-limits and
robots.txt— being a good citizen reduces blocks too.