# PubMed Search Scraper (`easyapi/pubmed-search-scraper`) Actor

Scrape research papers and academic articles from PubMed based on search terms. Extract comprehensive article metadata including titles, authors, citations, abstracts, and more. Perfect for medical research and literature reviews.

- **URL**: https://apify.com/easyapi/pubmed-search-scraper.md
- **Developed by:** [EasyApi](https://apify.com/easyapi) (community)
- **Categories:** Integrations
- **Stats:** 70 total users, 9 monthly users, 92.5% runs succeeded, 4 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.99 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## PubMed Search Scraper 🔬

### 📋 Overview

Extract academic articles and research papers from PubMed, the world's leading database of biomedical literature. This actor allows you to scrape detailed information from search results based on your keywords.

### ✨ Features

- 🔎 Scrape articles based on custom search queries
- 📑 Extract comprehensive article metadata:
  - Title and article ID
  - Full & short author lists
  - Complete & abbreviated journal citations
  - PMID (PubMed Identifier)
  - Article tags and types
  - Full & truncated abstracts
  - Social sharing links
- ⚡ High-performance scrolling pagination
- 🛡️ Built-in anti-blocking measures
- 🎯 Configurable maximum items limit

### 💡 Use Cases

- Medical research and literature reviews
- Academic meta-analyses
- Tracking research trends
- Building research databases
- Bibliometric analysis
- Scientific data mining

### 📤 Output

The actor outputs detailed article information in JSON format, including:

- Article title and unique identifier
- Author information (full and short formats)
- Journal citation details
- PMID reference
- Article type tags
- Abstract content
- Social sharing links


### 💪 Tips for Optimal Usage

1. Use specific search terms for more targeted results
2. Consider breaking large searches into smaller queries
3. Allow sufficient run time for larger result sets
4. Monitor your usage to stay within PubMed's guidelines

### 🔗 Links

- [PubMed Home](https://pubmed.ncbi.nlm.nih.gov/)
- [Search Syntax Guide](https://pubmed.ncbi.nlm.nih.gov/help/#search-tags)


#### Input Example
  
A full explanation of an input example in JSON.  
````

{
"searchUrls": \["https://pubmed.ncbi.nlm.nih.gov/?term=rheumatoid%20arthritis"],
"maxItems": 30
}

```
  
#### Output sample
  
The results will be wrapped into a dataset which you can always find in the **Storage** tab. Here's an excerpt from the data you'd get if you apply the input parameters above:  
  
And here is the same data but in JSON. You can choose in which format to download your data: JSON, JSONL, Excel spreadsheet, HTML table, CSV, or XML.  
  
```

\[
{
"title": "Rheumatoid arthritis.",
"articleId": "27156434",
"articleUrl": "https://pubmed.ncbi.nlm.nih.gov/27156434/",
"authors": {
"full": "Smolen JS, Aletaha D, McInnes IB.",
"short": "Smolen JS, et al."
},
"citation": {
"full": "Lancet. 2016 Oct 22;388(10055):2023-2038. doi: 10.1016/S0140-6736(16)30173-8. Epub 2016 May 3.",
"short": "Lancet. 2016."
},
"pmid": "27156434",
"tags": \[
"Free article.",
"Review."
],
"abstract": {
"full": "Rheumatoid arthritis is a chronic inflammatory joint disease, which can cause cartilage and bone damage as well as disability. ...In this Seminar, we describe current insights into genetics and aetiology, pathophysiology, epidemiology, assessment, therapeutic agents …",
"short": "Rheumatoid arthritis is a chronic inflammatory joint disease, which can cause cartilage and bone damage as well as disability. …"
},
"shareLinks": {
"twitter": "http://twitter.com/intent/tweet?text=Rheumatoid%20arthritis.%20https%3A//pubmed.ncbi.nlm.nih.gov/27156434/",
"facebook": "http://www.facebook.com/sharer/sharer.php?u=https%3A//pubmed.ncbi.nlm.nih.gov/27156434/",
"permalink": "https://pubmed.ncbi.nlm.nih.gov/27156434/"
}
},
{
"title": "Management of Rheumatoid Arthritis: An Overview.",
"articleId": "34831081",
"articleUrl": "https://pubmed.ncbi.nlm.nih.gov/34831081/",
"authors": {
"full": "Radu AF, Bungau SG.",
"short": "Radu AF, et al."
},
"citation": {
"full": "Cells. 2021 Oct 23;10(11):2857. doi: 10.3390/cells10112857.",
"short": "Cells. 2021."
},
"pmid": "34831081",
"tags": \[
"Free PMC article.",
"Review."
],
"abstract": {
"full": "Rheumatoid arthritis (RA) is a multifactorial autoimmune disease of unknown etiology, primarily affecting the joints, then extra-articular manifestations can occur. ...",
"short": "Rheumatoid arthritis (RA) is a multifactorial autoimmune disease of unknown etiology, primarily affecting the joints, then ext …"
},
"shareLinks": {
"twitter": "http://twitter.com/intent/tweet?text=Management%20of%20Rheumatoid%20Arthritis%3A%20An%20Overview.%20https%3A//pubmed.ncbi.nlm.nih.gov/34831081/",
"facebook": "http://www.facebook.com/sharer/sharer.php?u=https%3A//pubmed.ncbi.nlm.nih.gov/34831081/",
"permalink": "https://pubmed.ncbi.nlm.nih.gov/34831081/"
}
},
...
]

````

# Actor input Schema

## `searchUrls` (type: `array`):

Array of PubMed search URLs to scrape
## `maxItems` (type: `integer`):

Maximum number of articles to scrape

## Actor input object example

```json
{
  "searchUrls": [
    "https://pubmed.ncbi.nlm.nih.gov/?term=cancer"
  ],
  "maxItems": 20
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("easyapi/pubmed-search-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("easyapi/pubmed-search-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call easyapi/pubmed-search-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=easyapi/pubmed-search-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "PubMed Search Scraper",
        "description": "Scrape research papers and academic articles from PubMed based on search terms. Extract comprehensive article metadata including titles, authors, citations, abstracts, and more. Perfect for medical research and literature reviews.",
        "version": "0.0",
        "x-build-id": "basylC9YeQDQZzJZ2"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/easyapi~pubmed-search-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-easyapi-pubmed-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/easyapi~pubmed-search-scraper/runs": {
            "post": {
                "operationId": "runs-sync-easyapi-pubmed-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/easyapi~pubmed-search-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-easyapi-pubmed-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "searchUrls"
                ],
                "properties": {
                    "searchUrls": {
                        "title": "PubMed Search URLs",
                        "minItems": 1,
                        "type": "array",
                        "description": "Array of PubMed search URLs to scrape",
                        "default": [
                            "https://pubmed.ncbi.nlm.nih.gov/?term=cancer"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Maximum Items",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of articles to scrape",
                        "default": 20
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
