Pricing

from $2.00 / 1,000 paper returneds

Try for free

Go to Apify Store

arXiv Scraper

Try for free

Search arXiv via the official API and get clean, structured paper metadata: title, abstract, authors, categories, DOI, dates, and abstract + PDF links. No key, no login, no anti-bot. Uses arXiv search syntax (all:, cat:, ti:, au:).

Pricing

from $2.00 / 1,000 paper returneds

Rating

5.0

(1)

Developer

Dami's Studio

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

What it does

Given an arXiv search query, the actor fetches the Atom feed from the arXiv API, parses every <entry>, dedupes by arXiv ID, and returns one clean record per paper.

Input

Field	Type	Default	Description
`query`	string	`all:large language models`	arXiv search query (see syntax below). Required.
`sortBy`	select	`relevance`	`relevance`, `submittedDate`, or `lastUpdatedDate`.
`maxItems`	integer	`50`	Max papers to return (1–30000). Paginates 100/page.
`proxyConfiguration`	proxy	none	Optional — the public arXiv API has no anti-bot, so no proxy is used by default. Only enable one if you hit IP rate limits.

Search query syntax

arXiv uses field prefixes you can combine with AND / OR / ANDNOT:

all:transformer — search all fields
cat:cs.AI — papers in a category (e.g. cs.CL, cs.LG, stat.ML)
ti:attention — title
au:hinton — author
abs:diffusion — abstract
ti:attention AND au:vaswani — combine

Examples: all:large language models, cat:cs.CL, ti:"retrieval augmented generation".

Output

One record per paper:

{
  "ok": true,
  "arxivId": "2401.12345",
  "title": "Paper title",
  "abstract": "Whitespace-collapsed abstract…",
  "authors": ["First Author", "Second Author"],
  "primaryCategory": "cs.CL",
  "categories": ["cs.CL", "cs.AI"],
  "publishedAt": "2024-01-23T18:00:00Z",
  "updatedAt": "2024-02-01T12:00:00Z",
  "doi": "10.1000/xyz123",
  "absUrl": "http://arxiv.org/abs/2401.12345v1",
  "pdfUrl": "http://arxiv.org/pdf/2401.12345v1"
}

Nullable fields: doi is null when the paper has no registered DOI. primaryCategory, publishedAt, updatedAt, and absUrl can also be null if the arXiv feed omits them for an entry. pdfUrl is derived from the arXiv ID when the feed doesn't include a PDF link, so it is null only when the ID itself is missing.

Results are deduplicated by arxivId, falling back to absUrl and then title when a record has no arXiv ID.

Notes

Be polite: the actor pauses ~3 seconds between pages, as arXiv requests.
arXiv hard-limits the total reachable results per query to about 30000.
On an error, empty result, or missing query, the actor writes a single diagnostic row (ok: false, with an errorCode such as NO_RESULTS, BAD_INPUT, RATE_LIMITED, or NETWORK) and does not charge. The run still finishes cleanly so you can inspect the reason in the dataset.

Charging

Charges one paper unit per successfully returned paper. Diagnostic / empty rows are never charged.

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

Crawler Bros

arXiv Search & Paper Scraper

scrapeworks/arxiv-search

Search arXiv and get clean structured JSON for each paper: title, authors, abstract, categories, DOI, PDF link, and dates. Built for research, datasets, and AI pipelines.

Nicolas van Arkens

arXiv Papers Scraper

crawlerbros/arxiv-papers-scraper

Scrape academic preprints from arXiv.org by keyword, author, or category. Returns clean records with title, authors, abstract, categories, PDF URL, DOI. HTTP-only via the public arXiv API. No login, no proxy.

Crawler Bros

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Daniel

arXiv Metadata Collector— Metadata, PDF, Authors & Abstract

scrapepilot/arxiv-metadata-collector---metadata-pdf-authors-abstract

Scrape arXiv research papers with metadata including title, authors, abstract, PDF links, DOI, and categories. Supports keyword search, proxy integration, and structured dataset output for AI, ML, and academic research use

Scrape Pilot

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

Monkey Coder

arXiv Scraper: Papers, Authors, Categories & Search

perconey/arxiv-scraper

Scrape arxiv.org via the official Atom API. Full-text search, by author / title / category, paper detail by id, latest in any category. Returns title, abstract, authors, DOI, PDF link. No auth, no proxies. Pay only per result item.

Perconey

ArXiv Paper Scraper

sheshinmcfly/arxiv-paper-scraper

Search and extract scientific papers from ArXiv.org across any field. Returns title, authors, full abstract, PDF link, arXiv ID, categories, and submission date. Ideal for AI research monitoring, RAG pipelines, literature reviews, and academic trend analysis. No API key needed.

Sheshinmcfly

arXiv Scraper — Search & Export Paper Metadata

devilscrapes/arxiv-papers-scraper

Search arXiv by query, category, or author and export structured paper metadata — title, authors, abstract, primary category, DOI, PDF URL, submitted and updated timestamps — to JSON or CSV. An arXiv API wrapper that handles pagination, retries, and rate-limit pacing for your pipeline.