# DataCite Metadata Scraper (`parseforge/datacite-metadata-scraper`) Actor

Comprehensive DataCite metadata scraper for extracting DOI metadata from DataCite API. Perfect for researchers, librarians, and data scientists who need automated access to scholarly publication metadata, research datasets, and digital object identifiers.

- **URL**: https://apify.com/parseforge/datacite-metadata-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Automation, Developer tools, Other
- **Stats:** 2 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 📚 DataCite Metadata Scraper

> 🚀 **Collect scholarly DOI metadata and research dataset records in seconds.** Filter by keyword, repository, publisher, resource type, and year. No coding, no DataCite account required.

> 🕒 **Last updated:** 2026-04-16 · **📊 10 fields** · **📖 Millions of DOI records** · **🔬 Academic and research data**


<table><tr>
<td style="border-left:4px solid #0F766E;padding:12px 16px;font-weight:600">Pull structured records from DataCite Metadata — clean fields ready as CSV, JSON, JSONL, Excel, or XML for downstream pipelines.</td>
</tr></table>

##### Copy to your AI assistant

Copy this block into ChatGPT, Claude, Cursor, or any LLM to start using this actor.

````

parseforge/datacite-metadata-scraper on Apify. Call: ApifyClient("TOKEN").actor("parseforge/datacite-metadata-scraper").call(run\_input={...}), then client.dataset(run\["defaultDatasetId"]).list\_items().items for results. Key inputs: maxItems (integer, default 10), query (string, default "climate"), doi (string), repositoryId (string), publisher (string), resourceType (string). Full actor spec: fetch build via GET https://api.apify.com/v2/acts/parseforge~datacite-metadata-scraper (Bearer TOKEN). Get token: https://console.apify.com/account/integrations

````

The DataCite Metadata Scraper retrieves Digital Object Identifier (DOI) metadata from the DataCite registry, which indexes over **45 million DOIs** across academic publications, research datasets, software, and other scholarly outputs. Each record includes the DOI, title, publisher, publication year, resource type, creation date, update date, and a resolvable URL. You can filter by **keyword**, **specific DOI**, **repository** (Zenodo, Dryad, Figshare, Dataverse), **publisher**, **resource type**, and **publication year**. Free users can collect up to **10 records** per run, while paid users can retrieve up to **1,000,000**.

Whether you are building a literature database for a systematic review, analyzing publication trends across institutions, tracking open data availability in your research field, or monitoring repository output over time, this tool replaces hours of manual DOI lookups with a single automated query. Results export to **JSON, CSV, or Excel** for immediate use in citation managers, bibliometric tools, or data analysis pipelines. The scraper handles pagination and rate limiting automatically, letting you focus on research instead of data collection.

| Target Audience | Use Cases |
|---|---|
| Academic Researchers | Build literature databases and track publications in specific fields |
| Research Librarians | Catalog DOI records and monitor repository output |
| Data Scientists | Analyze publication trends and research metadata at scale |
| Institutional Analysts | Track publication volume and output across departments |
| Science Policy Analysts | Study open data availability and repository growth |
| Bibliometric Researchers | Collect DOI metadata for citation and impact analysis |

---

### 📋 What the DataCite Metadata Scraper does

- 📚 **DOI records** - retrieve the full Digital Object Identifier for each scholarly output, ready for citation or resolution
- 🏷️ **Titles** - extract publication or dataset titles for cataloging and search
- 📰 **Publishers** - capture the organization or institution that registered the DOI
- 📅 **Publication years** - filter and sort by year to focus on recent research or historical trends
- 🗂️ **Resource types** - classify records as datasets, articles, software, images, or other scholarly object types
- 🔗 **Resolvable URLs** - get working DOI links that resolve to the full publication or dataset landing page

The scraper queries the DataCite REST API and iterates through paginated results using your specified filters. Each record is normalized with consistent field names and pushed to an Apify dataset in real time. You can look up a single DOI or search across the entire DataCite registry with keyword and faceted filters.

> 💡 **Why it matters:** DataCite indexes DOIs from over 2,000 data centers worldwide. Manually searching and downloading metadata is tedious. This scraper gives you structured, filterable access to the registry in minutes.

---

### 🎬 Full Demo

_🚧 Coming soon..._

---

### ⚙️ Input

<table>
<tr><th>Field</th><th>Type</th><th>Required</th><th>Description</th></tr>
<tr><td>maxItems</td><td>integer</td><td>No</td><td>Maximum records to collect. Free: 10. Paid: up to 1,000,000.</td></tr>
<tr><td>query</td><td>string</td><td>No</td><td>Search term to find DOIs (e.g., "climate change", "machine learning").</td></tr>
<tr><td>doi</td><td>string</td><td>No</td><td>Specific DOI to retrieve (e.g., 10.5281/zenodo.1234567). Returns only this record.</td></tr>
<tr><td>repositoryId</td><td>string</td><td>No</td><td>Filter by repository identifier (e.g., Zenodo, Dryad, Figshare).</td></tr>
<tr><td>publisher</td><td>string</td><td>No</td><td>Filter by publisher name.</td></tr>
<tr><td>resourceType</td><td>string</td><td>No</td><td>Filter by type: Dataset, Article, Software, Image, etc.</td></tr>
<tr><td>year</td><td>integer</td><td>No</td><td>Filter by publication year (4-digit, e.g., 2023).</td></tr>
<tr><td>sort</td><td>string</td><td>No</td><td>Sort order: by creation date, update date, or publication year.</td></tr>
</table>

**Example 1: Climate research datasets**
```json
{
  "query": "climate",
  "maxItems": 50,
  "resourceType": "Dataset",
  "year": 2023,
  "sort": "-created"
}
````

**Example 2: Look up a specific DOI**

```json
{
  "doi": "10.5281/zenodo.1234567",
  "maxItems": 1
}
```

> ⚠️ **Good to Know:** Free users are automatically limited to 10 items per run. When a specific DOI is provided, only that single record is returned. Leave the query field empty to browse all records with other filters applied.

***

### 📊 Output

#### 🧾 Schema

| Emoji | Field | Type | Description |
|---|---|---|---|
| 📚 | doi | string | Digital Object Identifier for the record |
| 🔗 | doiUrl | string | Resolvable URL (https://doi.org/...) |
| 🏷️ | title | string | Title of the publication or dataset |
| 📰 | publisher | string | Organization that registered the DOI |
| 📅 | publicationYear | integer | Year of publication |
| 🗂️ | resourceType | string | Specific resource type (e.g., "Dataset") |
| 📊 | resourceTypeGeneral | string | General resource category |
| 🕐 | createdDate | string | Date the DOI was created in the registry |
| 🔄 | updatedDate | string | Date the record was last updated |
| ⚠️ | error | string | Error message if processing failed |

#### 📦 Sample records

<details>
<summary>📝 Research dataset from Zenodo</summary>

```json
{
  "doi": "10.5281/zenodo.7654321",
  "doiUrl": "https://doi.org/10.5281/zenodo.7654321",
  "title": "Global Surface Temperature Anomalies 1880-2023",
  "publisher": "Zenodo",
  "publicationYear": 2023,
  "resourceType": "Dataset",
  "resourceTypeGeneral": "Dataset",
  "createdDate": "2023-06-15T10:30:00.000Z",
  "updatedDate": "2023-08-20T14:15:00.000Z"
}
```

</details>

<details>
<summary>📝 Software package from Dryad</summary>

```json
{
  "doi": "10.5061/dryad.abc123",
  "doiUrl": "https://doi.org/10.5061/dryad.abc123",
  "title": "Statistical Analysis Toolkit for Environmental Monitoring",
  "publisher": "Dryad",
  "publicationYear": 2024,
  "resourceType": "Software",
  "resourceTypeGeneral": "Software",
  "createdDate": "2024-01-10T08:00:00.000Z",
  "updatedDate": "2024-03-05T12:45:00.000Z"
}
```

</details>

<details>
<summary>📝 Academic article with DOI</summary>

```json
{
  "doi": "10.1234/journal.pone.9876543",
  "doiUrl": "https://doi.org/10.1234/journal.pone.9876543",
  "title": "Machine Learning Applications in Genomic Variant Detection",
  "publisher": "Public Library of Science (PLoS)",
  "publicationYear": 2023,
  "resourceType": "JournalArticle",
  "resourceTypeGeneral": "Text",
  "createdDate": "2023-09-01T09:00:00.000Z",
  "updatedDate": "2023-09-01T09:00:00.000Z"
}
```

</details>

***

### ✨ Why choose this Actor

| Feature | This Actor | Alternatives |
|---|---|---|
| Repository-specific filtering (Zenodo, Dryad, Figshare) | Yes | No |
| Resource type filtering (dataset, article, software) | Yes | Limited |
| Publication year filtering | Yes | Yes |
| Publisher filtering | Yes | Rarely available |
| Single DOI lookup mode | Yes | Yes |
| Up to 1,000,000 records per run | Yes | Capped lower |
| Export to JSON, CSV, and Excel | Yes | JSON only |

> 📊 **DataCite indexes over 45 million DOIs from 2,000+ data centers. This scraper lets you query the entire registry with keyword and faceted filters in a single run.**

***

### 📈 How it compares to alternatives

| Capability | This Actor | Manual DOI Lookups | Generic API Scripts |
|---|---|---|---|
| Bulk metadata retrieval | Yes | One at a time | Requires coding |
| Faceted filtering (type, year, publisher, repo) | Yes | Limited | Manual implementation |
| Automatic pagination and rate limiting | Yes | N/A | Manual implementation |
| Scheduled recurring runs | Yes | No | Requires infrastructure |
| No coding required | Yes | Yes | No |
| Export to CSV, Excel, JSON | Yes | No | JSON only |

This scraper wraps the DataCite API with a user-friendly interface, automatic pagination, and built-in export options.

***

### 🚀 How to use

1. **Sign up** - [Create a free Apify account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp)
2. **Find the Actor** - Search for "DataCite Metadata Scraper" in the Apify Store
3. **Set your search criteria** - Enter keywords, resource type, year, or a specific DOI
4. **Start the run** - Click "Start" and watch results appear in real time
5. **Export your data** - Download as JSON, CSV, or Excel from the dataset tab

> 🕒 **Typical run time:** 15 to 60 seconds for up to 100 records. Larger runs with 1,000+ records may take a few minutes depending on the query scope.

***

### 💼 Business use cases

<table>
<tr>
<td>

**Academic Research**

- Build literature databases for systematic reviews
- Track publication output from specific repositories
- Monitor new datasets in your research field
- Collect DOI metadata for bibliometric analysis

</td>
<td>

**Library and Information Science**

- Catalog DOI records across institutional repositories
- Monitor open data availability by subject area
- Track publisher output and growth over time
- Build metadata indexes for discovery systems

</td>
</tr>
<tr>
<td>

**Institutional Analytics**

- Track departmental publication and dataset output
- Monitor which repositories your institution uses most
- Analyze trends in resource types over time
- Build reports on open data contributions by year

</td>
<td>

**Science Policy**

- Study open data mandates and compliance rates
- Track growth of data sharing across disciplines
- Monitor repository adoption trends globally
- Analyze the distribution of resource types by field

</td>
</tr>
</table>

***

***

### 🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Empirical datasets for papers, thesis work, and coursework
- Longitudinal studies tracking changes across snapshots
- Reproducible research with cited, versioned data pulls
- Classroom exercises on data analysis and ethical scraping

</td>
<td width="50%">

#### 🎨 Personal and creative

- Side projects, portfolio demos, and indie app launches
- Data visualizations, dashboards, and infographics
- Content research for bloggers, YouTubers, and podcasters
- Hobbyist collections and personal trackers

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Transparency reporting and accountability projects
- Advocacy campaigns backed by public-interest data
- Community-run databases for local issues
- Investigative journalism on public records

</td>
<td width="50%">

#### 🧪 Experimentation

- Prototype AI and machine-learning pipelines with real data
- Validate product-market hypotheses before engineering spend
- Train small domain-specific models on niche corpora
- Test dashboard concepts with live input

</td>
</tr>
</table>

### 🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

- 💬 [**ChatGPT**](https://chat.openai.com/?q=How%20do%20I%20use%20the%20DataCite%20Metadata%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🧠 [**Claude**](https://claude.ai/new?q=How%20do%20I%20use%20the%20DataCite%20Metadata%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🔍 [**Perplexity**](https://perplexity.ai/search?q=How%20do%20I%20use%20the%20DataCite%20Metadata%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🅒 [**Copilot**](https://copilot.microsoft.com/?q=How%20do%20I%20use%20the%20DataCite%20Metadata%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)

### ❓ Frequently Asked Questions

<details>
<summary><b>💳 Do I need a paid Apify plan to run this actor?</b></summary>

No. You can start right now on the free Apify plan, which includes **$5 in free monthly credit**. That is enough to run this actor several times and explore the output before committing to anything. Paid plans unlock higher limits, more concurrent runs, and larger datasets. [Create a free Apify account here](https://console.apify.com/sign-up?fpr=vmoqkp) to get started.

</details>

<details>
<summary><b>🚨 What happens if my run fails or returns no results?</b></summary>

Failed runs are not charged. If the source site changes, proxies get rate-limited, or a specific input matches nothing, re-run the actor or open our [contact form](https://tally.so/r/BzdKgA) and we will investigate. You can also check the run log in the Apify console to see why the run stopped.

</details>

<details>
<summary><b>📏 How many items can I scrape per run?</b></summary>

Free users are limited to **10 items per run** so you can preview the output and confirm the actor works for your use case. Paid users can raise maxItems up to **1,000,000** per run. [Upgrade here](https://console.apify.com/sign-up?fpr=vmoqkp) if you need full scale.

</details>

<details>
<summary><b>🕒 How fresh is the data?</b></summary>

Every run fetches live data at the moment of execution. There is no cache or delay: the records you get reflect what the source returned at that moment. Schedule the actor to maintain a rolling snapshot of the data you need.

</details>

<details>
<summary><b>🧑‍💻 Can I call this actor from my own code?</b></summary>

Yes. Apify exposes every actor as a REST endpoint and ships first-class SDKs for [Node.js](https://docs.apify.com/sdk/js) and [Python](https://docs.apify.com/sdk/python). You can start a run, read the dataset, and handle webhooks from your own app in a few lines. All you need is your Apify API token.

</details>

<details>
<summary><b>📤 How do I export the data?</b></summary>

Every Apify dataset can be downloaded in one click from the console as CSV, JSON, JSONL, Excel, HTML, XML, or RSS. You can also pull results programmatically via the [Apify API](https://docs.apify.com/api/v2) or stream them into BigQuery, S3, and other destinations through built-in integrations.

</details>

<details>
<summary><b>📅 Can I schedule the actor to run automatically?</b></summary>

Yes. Use the Apify scheduler to run the actor on any cadence, from hourly to monthly. Results are saved to your dataset and can be delivered to webhooks, email, Slack, cloud storage, or automation tools such as Zapier and Make.

***

</details>

### 🔌 Automating DataCite Metadata Scraper

**Node.js example:**

```javascript
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('parseforge/datacite-metadata-scraper').call({
    query: 'climate change',
    maxItems: 100,
    resourceType: 'Dataset'
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);
```

**Python example:**

```python
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('parseforge/datacite-metadata-scraper').call(run_input={
    'query': 'climate change',
    'maxItems': 100,
    'resourceType': 'Dataset'
})
items = list(client.dataset(run['defaultDatasetId']).iterate_items())
print(items)
```

- [Apify API documentation](https://docs.apify.com/api/v2)
- [Node.js client docs](https://docs.apify.com/api/client/js/)
- [Python client docs](https://docs.apify.com/api/client/python/)

**Schedules:** Set up weekly or monthly runs to track new DOI registrations in your field. Combine with Google Sheets or Slack integrations to get notified when new records match your query.

### 🔌 Integrate with any app

- [Make](https://docs.apify.com/platform/integrations/make) - Automate DOI metadata workflows and sync with research databases
- [Zapier](https://docs.apify.com/platform/integrations/zapier) - Connect to 5,000+ apps and trigger actions on new DOI records
- [Slack](https://docs.apify.com/platform/integrations/slack) - Get notifications when new publications match your query
- [Airbyte](https://docs.apify.com/platform/integrations/airbyte) - Stream DOI metadata into your data warehouse
- [GitHub](https://docs.apify.com/platform/integrations/github) - Version control your scraper configurations
- [Google Drive](https://docs.apify.com/platform/integrations/drive) - Export results directly to Google Sheets

***

### 🔗 Recommended Actors

| Actor | Description |
|---|---|
| [Hugging Face Model Scraper](https://apify.com/parseforge/hugging-face-model-scraper) | Collect model metadata and download stats from Hugging Face |
| [PR Newswire Scraper](https://apify.com/parseforge/pr-newswire-scraper) | Collect press releases and research announcements |
| [GSA eLibrary Scraper](https://apify.com/parseforge/gsa-elibrary-scraper) | Collect government contractor and vendor data |
| [Greatschools Scraper](https://apify.com/parseforge/greatschools-scraper) | Extract school ratings and performance data |
| [Smart Apify Actor Scraper](https://apify.com/parseforge/smart-apify-actor-scraper) | Scrape Apify actor metadata with 70+ fields |

> 💡 **Pro Tip:** Combine the DataCite Metadata Scraper with the Hugging Face Model Scraper to cross-reference published datasets with ML models trained on them.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue. We typically respond within 24 hours.

***

> **Disclaimer:** This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by DataCite, Zenodo, Dryad, Figshare, or any data center. All trademarks mentioned are the property of their respective owners.

# Actor input Schema

## `maxItems` (type: `integer`):

Free users: Limited to 100. Paid users: Optional, max 1,000,000. Leave empty for unlimited (paid users only).

## `query` (type: `string`):

Search term to find DOIs.

## `doi` (type: `string`):

Specific Digital Object Identifier to retrieve (e.g., 10.5281/zenodo.1234567). If provided, only this DOI will be fetched.

## `repositoryId` (type: `string`):

Filter by repository identifier.

## `publisher` (type: `string`):

Filter by publisher name.

## `resourceType` (type: `string`):

Filter by resource type.

## `year` (type: `integer`):

Filter by publication year (4-digit year, e.g., 2023).

## `sort` (type: `string`):

Sort order for results.

## Actor input object example

```json
{
  "maxItems": 10,
  "query": "climate",
  "sort": "-created"
}
```

# Actor output Schema

## `dois` (type: `string`):

Complete dataset with all scraped DOI metadata including titles, creators, publishers, publication years, resource types, and comprehensive metadata

## `overview` (type: `string`):

Overview view of DOIs with key fields displayed in a table format

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "maxItems": 10,
    "query": "climate"
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/datacite-metadata-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "maxItems": 10,
    "query": "climate",
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/datacite-metadata-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "maxItems": 10,
  "query": "climate"
}' |
apify call parseforge/datacite-metadata-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/datacite-metadata-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "DataCite Metadata Scraper",
        "description": "Comprehensive DataCite metadata scraper for extracting DOI metadata from DataCite API. Perfect for researchers, librarians, and data scientists who need automated access to scholarly publication metadata, research datasets, and digital object identifiers.",
        "version": "0.1",
        "x-build-id": "QDrgVX6jJUaTolYPA"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~datacite-metadata-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-datacite-metadata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~datacite-metadata-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-datacite-metadata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~datacite-metadata-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-datacite-metadata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 100. Paid users: Optional, max 1,000,000. Leave empty for unlimited (paid users only)."
                    },
                    "query": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Search term to find DOIs."
                    },
                    "doi": {
                        "title": "DOI",
                        "type": "string",
                        "description": "Specific Digital Object Identifier to retrieve (e.g., 10.5281/zenodo.1234567). If provided, only this DOI will be fetched."
                    },
                    "repositoryId": {
                        "title": "Repository ID",
                        "enum": [
                            "zenodo",
                            "dryad",
                            "figshare",
                            "dataverse",
                            "pangaea",
                            "osf",
                            "zenodo.org",
                            "datacite"
                        ],
                        "type": "string",
                        "description": "Filter by repository identifier."
                    },
                    "publisher": {
                        "title": "Publisher",
                        "type": "string",
                        "description": "Filter by publisher name."
                    },
                    "resourceType": {
                        "title": "Resource Type",
                        "enum": [
                            "Dataset",
                            "Software",
                            "Article",
                            "Text",
                            "Image",
                            "Video",
                            "Audio",
                            "Collection",
                            "Event",
                            "PhysicalObject",
                            "Service",
                            "Other"
                        ],
                        "type": "string",
                        "description": "Filter by resource type."
                    },
                    "year": {
                        "title": "Year",
                        "minimum": 1900,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Filter by publication year (4-digit year, e.g., 2023)."
                    },
                    "sort": {
                        "title": "Sort",
                        "enum": [
                            "-created",
                            "created",
                            "-updated",
                            "updated",
                            "-publicationYear",
                            "publicationYear"
                        ],
                        "type": "string",
                        "description": "Sort order for results.",
                        "default": "-created"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
Field	Type	Required	Description
maxItems	integer	No	Maximum records to collect. Free: 10. Paid: up to 1,000,000.
query	string	No	Search term to find DOIs (e.g., "climate change", "machine learning").
doi	string	No	Specific DOI to retrieve (e.g., 10.5281/zenodo.1234567). Returns only this record.
repositoryId	string	No	Filter by repository identifier (e.g., Zenodo, Dryad, Figshare).
publisher	string	No	Filter by publisher name.
resourceType	string	No	Filter by type: Dataset, Article, Software, Image, etc.
year	integer	No	Filter by publication year (4-digit, e.g., 2023).
sort	string	No	Sort order: by creation date, update date, or publication year.
Academic Research - Build literature databases for systematic reviews - Track publication output from specific repositories - Monitor new datasets in your research field - Collect DOI metadata for bibliometric analysis	Library and Information Science - Catalog DOI records across institutional repositories - Monitor open data availability by subject area - Track publisher output and growth over time - Build metadata indexes for discovery systems
Institutional Analytics - Track departmental publication and dataset output - Monitor which repositories your institution uses most - Analyze trends in resource types over time - Build reports on open data contributions by year	Science Policy - Study open data mandates and compliance rates - Track growth of data sharing across disciplines - Monitor repository adoption trends globally - Analyze the distribution of resource types by field
#### 🎓 Research and academia - Empirical datasets for papers, thesis work, and coursework - Longitudinal studies tracking changes across snapshots - Reproducible research with cited, versioned data pulls - Classroom exercises on data analysis and ethical scraping	#### 🎨 Personal and creative - Side projects, portfolio demos, and indie app launches - Data visualizations, dashboards, and infographics - Content research for bloggers, YouTubers, and podcasters - Hobbyist collections and personal trackers
#### 🤝 Non-profit and civic - Transparency reporting and accountability projects - Advocacy campaigns backed by public-interest data - Community-run databases for local issues - Investigative journalism on public records	#### 🧪 Experimentation - Prototype AI and machine-learning pipelines with real data - Validate product-market hypotheses before engineering spend - Train small domain-specific models on niche corpora - Test dashboard concepts with live input