# ORCID Scraper — Researcher Profiles, Works & Affiliations (`openclawmara/orcid-scraper`) Actor

Scrape ORCID researcher registry. Modes: search profiles, researcher details by ORCID iD, works/publications, employment and education history. Extracts names, affiliations, DOIs, funding, peer reviews. Official Public API. For academic network analysis & research mapping.

- **URL**: https://apify.com/openclawmara/orcid-scraper.md
- **Developed by:** [OpenClaw Mara](https://apify.com/openclawmara) (community)
- **Categories:** AI, Developer tools, Other
- **Stats:** 3 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$5.00 / 1,000 profile scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## ORCID Scraper — Researcher Profiles, Works & Affiliations

**$0.005 per record** · Extract researcher profiles, publications, employment, and education history from **[ORCID](https://orcid.org)** — the global scholar identifier database with **18M+ registered researchers**. **No API key needed** (uses ORCID Public API).

Built for **academic intelligence**, **hiring & recruiting**, **collaboration discovery**, **grant tracking**, and **RAG/LLM corpora on research personas and careers**.

---

### What You Get

- **Search researchers** — by keyword, affiliation, or name (e.g. "CRISPR MIT")
- **Direct profile lookup** — by ORCID iD (e.g. `0000-0002-1825-0097`)
- **Full publication list** — works registered on ORCID with titles, DOIs, and years
- **Employment history** — past and current affiliations with dates
- **Education history** — degrees, institutions, graduation years
- **ORCID-canonical iDs** — resolvable permanent identifiers
- **Structured JSON** — ready for downstream pipelines
- **Public API** — free, no authentication required

---

### 4 Use Cases (ready-to-run JSON inputs)

#### 1. Academic recruiting — find talent by field + affiliation

```json
{
  "searchQueries": ["machine learning MIT", "computational biology Stanford"],
  "maxResults": 30,
  "includeWorks": true,
  "includeEmployment": true,
  "maxWorks": 20
}
````

30 researchers per query with recent publications and affiliation history — perfect for technical recruiting or lab profiling.

#### 2. Profile enrichment — full detail for specific researchers

```json
{
  "orcidIds": ["0000-0002-1825-0097", "0000-0003-1453-0929"],
  "includeWorks": true,
  "maxWorks": 100,
  "includeEducation": true,
  "includeEmployment": true
}
```

Complete ORCID records for specific researchers — useful for building researcher cards, author pages, or CV-extraction pipelines.

#### 3. Topic-based collaboration discovery

```json
{
  "searchQueries": ["CRISPR gene editing", "mRNA vaccine"],
  "maxResults": 50,
  "includeWorks": true,
  "maxWorks": 10
}
```

Top 50 researchers per topic with recent works — seed data for mapping research communities or finding potential collaborators.

#### 4. Lightweight directory build (no publications)

```json
{
  "searchQueries": ["quantum computing IBM"],
  "maxResults": 100,
  "includeWorks": false,
  "includeEducation": false,
  "includeEmployment": true
}
```

Fast, cheap build of a researcher directory — only names + current affiliations, no publication fetch. Great for high-volume profiling.

***

### Input Schema

| Field | Type | Default | Description |
|---|---|---|---|
| `searchQueries` | string\[] | `[]` | Keyword/affiliation/name searches |
| `orcidIds` | string\[] | `[]` | Specific ORCID iDs to fetch |
| `maxResults` | integer | `20` | Max profiles per search query |
| `includeWorks` | boolean | `true` | Fetch publication list |
| `maxWorks` | integer | `50` | Max publications per researcher |
| `includeEmployment` | boolean | `true` | Fetch employment history |
| `includeEducation` | boolean | `true` | Fetch education history |

### Output (sample — profile)

```json
{
  "orcidId": "0000-0002-1825-0097",
  "givenName": "Josiah",
  "familyName": "Carberry",
  "creditName": "Josiah S. Carberry",
  "biography": "Professor of psychoceramics, Brown University.",
  "country": "US",
  "keywords": ["psychoceramics", "ceramics", "psychology"],
  "works": [
    {
      "title": "Toward a Theory of Psychoceramics",
      "year": 2008,
      "type": "journal-article",
      "journalTitle": "J. Psychoceramics",
      "doi": "10.5555/12345678",
      "url": "https://doi.org/10.5555/12345678"
    }
  ],
  "employment": [
    {
      "organization": "Brown University",
      "department": "Psychoceramics",
      "role": "Professor",
      "startYear": 2001,
      "endYear": null
    }
  ],
  "education": [
    {"organization": "Brown University", "degree": "Ph.D. Psychology", "endYear": 1929}
  ],
  "orcidUrl": "https://orcid.org/0000-0002-1825-0097"
}
```

***

### Pricing & Performance

- **Pay-per-event:** $0.005 per researcher profile
- **Typical cost:** $0.05 for 10 profiles, $0.50 for 100, $5 for 1,000
- **Speed:** ~10 profiles/second (rate-limit-safe against ORCID Public API)
- **Free Apify tier:** $5/month credit = ~1,000 profiles/month

Compare to commercial researcher databases (Scopus Author ID, Web of Science ResearcherID): ORCID is **free, open, and researcher-maintained** — you pay only for structured extraction.

***

### Integrations

- **Zapier / Make / n8n** — new researchers matching a query → Notion / Airtable / Slack
- **Applicant Tracking Systems (Greenhouse / Lever)** — enrich candidate profiles with ORCID records
- **LangChain / LlamaIndex** — RAG over researcher bios and publications
- **Vector DBs (Pinecone / Weaviate / Qdrant)** — embed bios + works for "similar researchers"
- **Neo4j / Graphiti** — researcher → affiliation → publication → DOI graph
- **CRM (HubSpot / Salesforce)** — enrich contact records with research output
- **Python SDK**
  ```python
  from apify_client import ApifyClient
  client = ApifyClient("<APIFY_TOKEN>")
  run = client.actor("Helpermara/orcid-scraper").call(
      run_input={"searchQueries": ["protein folding DeepMind"], "maxResults": 50, "includeWorks": True, "maxWorks": 15}
  )
  for r in client.dataset(run["defaultDatasetId"]).iterate_items():
      print(r["orcidId"], r.get("creditName"), len(r.get("works", [])))
  ```

***

### FAQ

**Do I need an ORCID API key?** No — this actor uses the ORCID Public API (unauthenticated). If you need member-only fields, get an ORCID Member API token and fork the actor.

**How current is the data?** Live — every request hits `pub.orcid.org/v3.0`. ORCID data is researcher-maintained, so freshness depends on how often each scholar updates their profile.

**What if a researcher has no ORCID?** They simply won't appear. ORCID is opt-in — coverage is strongest in STEM and biomedical fields.

**Can I get email / phone from ORCID?** Only if the researcher marked those fields public — usually not. Use this actor for academic affiliation + work history, not private contact details.

**What's the difference vs `semantic-scholar-scraper`?** Semantic Scholar is paper-centric with citations. ORCID is researcher-centric with career history (employment, education, opt-in works). Complementary pair.

**Rate limits?** ORCID Public API is generous but not unlimited. The actor paces requests conservatively (200 ms between calls).

***

### Keywords

orcid scraper, orcid api, researcher profiles, academic profiles, scholar id, researcher search, academic recruiting, research cv, researcher directory, affiliation data, employment history, publication list, scholar database, academic intelligence, research personas, researcher discovery, collaboration mapping, science talent, researcher enrichment, orcid public api

***

### Companions (cross-promo)

- **[dblp-scraper](https://apify.com/Helpermara/dblp-scraper)** — CS bibliography
- **[semantic-scholar-scraper](https://apify.com/Helpermara/semantic-scholar-scraper)** — papers + citations
- **[crossref-scraper](https://apify.com/Helpermara/crossref-scraper)** — DOI metadata
- **[zenodo-scraper](https://apify.com/Helpermara/zenodo-scraper)** — research datasets

***

### Changelog

- **2026-04-24** — Extended README with use cases, integrations, and FAQ
- **2026-03** — Initial release: search by keyword + direct ORCID iD lookup, works/employment/education

# Actor input Schema

## `searchQueries` (type: `array`):

Search researchers by keyword, affiliation, or name (e.g. 'machine learning MIT', 'CRISPR')

## `orcidIds` (type: `array`):

Fetch specific profiles by ORCID ID (e.g. '0000-0002-1825-0097')

## `maxResults` (type: `integer`):

Maximum researcher profiles to return per search query

## `includeWorks` (type: `boolean`):

Fetch publication list for each researcher

## `maxWorks` (type: `integer`):

Maximum publications to include per researcher

## `includeEmployment` (type: `boolean`):

Fetch employment history

## `includeEducation` (type: `boolean`):

Fetch education history

## `includeFunding` (type: `boolean`):

Fetch funding/grants information

## Actor input object example

```json
{
  "maxResults": 20,
  "includeWorks": true,
  "maxWorks": 50,
  "includeEmployment": true,
  "includeEducation": true,
  "includeFunding": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("openclawmara/orcid-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("openclawmara/orcid-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call openclawmara/orcid-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=openclawmara/orcid-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "ORCID Scraper — Researcher Profiles, Works & Affiliations",
        "description": "Scrape ORCID researcher registry. Modes: search profiles, researcher details by ORCID iD, works/publications, employment and education history. Extracts names, affiliations, DOIs, funding, peer reviews. Official Public API. For academic network analysis & research mapping.",
        "version": "1.0",
        "x-build-id": "MZuM35K19MYFcr1Qc"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/openclawmara~orcid-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-openclawmara-orcid-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/openclawmara~orcid-scraper/runs": {
            "post": {
                "operationId": "runs-sync-openclawmara-orcid-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/openclawmara~orcid-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-openclawmara-orcid-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQueries": {
                        "title": "Search Queries",
                        "type": "array",
                        "description": "Search researchers by keyword, affiliation, or name (e.g. 'machine learning MIT', 'CRISPR')",
                        "items": {
                            "type": "string"
                        }
                    },
                    "orcidIds": {
                        "title": "ORCID IDs",
                        "type": "array",
                        "description": "Fetch specific profiles by ORCID ID (e.g. '0000-0002-1825-0097')",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxResults": {
                        "title": "Max Results per Query",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Maximum researcher profiles to return per search query",
                        "default": 20
                    },
                    "includeWorks": {
                        "title": "Include Publications",
                        "type": "boolean",
                        "description": "Fetch publication list for each researcher",
                        "default": true
                    },
                    "maxWorks": {
                        "title": "Max Publications per Profile",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum publications to include per researcher",
                        "default": 50
                    },
                    "includeEmployment": {
                        "title": "Include Employment",
                        "type": "boolean",
                        "description": "Fetch employment history",
                        "default": true
                    },
                    "includeEducation": {
                        "title": "Include Education",
                        "type": "boolean",
                        "description": "Fetch education history",
                        "default": true
                    },
                    "includeFunding": {
                        "title": "Include Funding",
                        "type": "boolean",
                        "description": "Fetch funding/grants information",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
