# Book Metadata Scraper (`datapilot/book-metadata-scraper`) Actor

Book Metadata Scraper uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.

- **URL**: https://apify.com/datapilot/book-metadata-scraper.md
- **Developed by:** [Data Pilot](https://apify.com/datapilot) (community)
- **Categories:** News, Social media, SEO tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$8.00/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Book Metadata Scraper

### Overview

The **Book Metadata Scraper** is an Apify Actor that extracts rich book metadata from the [Open Library](https://openlibrary.org) database. It accepts book titles or search queries and retrieves detailed information including author, ISBN, publisher, description, ratings, and cover images. Whether you're building a book database, conducting publishing research, or powering a recommendation engine, this actor delivers accurate and structured book metadata efficiently.

With proxy support and respectful rate limiting, it ensures reliable access to Open Library's public API without interruptions.

---

### Features

- **Multi-query Search** – Search for multiple book titles or keywords in a single run.
- **Deep Metadata Fetching** – Retrieves extended details (description, work data) from the Open Library Works API when search results are incomplete.
- **Cover Image Links** – Returns direct URLs to book cover images via Open Library's cover service.
- **ISBN Extraction** – Prioritizes 13-digit ISBNs for each book result.
- **Rating & Review Data** – Includes average ratings and total ratings count where available.
- **Proxy Support** – Optionally uses Apify residential proxies to avoid IP blocking.
- **Rate-Limit Friendly** – Adds random delays between requests to respect API limits.
- **Dataset Integration** – Automatically pushes all book metadata to your Apify dataset for easy export.

---

### How It Works

1. **Input** – Provide a list of book titles or search queries (e.g., `"Atomic Habits"`, `"Stephen King"`).
2. **Search** – The actor queries the Open Library search API (`/search.json`) for each query.
3. **Deep Fetch** – For each result, it fetches the Work detail page (`/works/{key}.json`) to retrieve missing fields like description.
4. **Build Output** – Structures all available metadata into a clean record and pushes it to the dataset.
5. **Repeat** – Processes all queries with a random delay between requests.

---

### Input

| Field                | Type             | Default              | Description                                                        |
|----------------------|------------------|----------------------|--------------------------------------------------------------------|
| `queries`            | Array of strings | `["Atomic Habits"]`  | List of book titles or search keywords.                            |
| `max_results`        | Integer          | `10`                 | Maximum number of results to return per query.                     |
| `proxyConfiguration` | Object           | `{}`                 | Apify proxy configuration (e.g., `{ "proxyGroups": ["RESIDENTIAL"] }`). |

**Example input:**

```json
{
  "queries": ["Atomic Habits", "The Alchemist", "Stephen King"],
  "max_results": 5,
  "proxyConfiguration": {
    "proxyGroups": ["RESIDENTIAL"],
    "apifyProxyCountry": "US"
  }
}
````

***

### Output

Each book is pushed as a separate dataset record with the following fields:

| Field            | Type    | Description                                              |
|------------------|---------|----------------------------------------------------------|
| `title`          | string  | Book title.                                              |
| `author`         | string  | Author name(s), comma-separated.                         |
| `isbn`           | string  | ISBN-13 (preferred) or first available ISBN.             |
| `publisher`      | string  | Publisher name.                                          |
| `published_date` | string  | First publish year.                                      |
| `language`       | string  | Language code (uppercase, e.g., `"ENG"`).                |
| `pages`          | integer | Median page count (if available).                        |
| `categories`     | array   | Up to 5 subject/category tags.                           |
| `description`    | string  | First 500 characters of the book description.            |
| `average_rating` | float   | Average reader rating (if available).                    |
| `ratings_count`  | integer | Total number of ratings.                                 |
| `price`          | string  | Price (empty by default — Open Library is free).         |
| `currency`       | string  | Currency code (default: `"USD"`).                        |
| `availability`   | string  | Availability status (default: `"In Stock"`).             |
| `cover_image`    | string  | Direct URL to the book's cover image (large size).       |
| `preview_link`   | string  | URL to the book's Open Library page.                     |
| `source`         | string  | Data source (always `"Open Library"`).                   |

**Example output:**

```json
{
  "title": "Atomic Habits",
  "author": "James Clear",
  "isbn": "9780735211292",
  "publisher": "Avery",
  "published_date": "2018",
  "language": "ENG",
  "pages": 320,
  "categories": ["Self-Help", "Habits", "Psychology", "Productivity", "Nonfiction"],
  "description": "No matter your goals, Atomic Habits offers a proven framework for improving every day...",
  "average_rating": 4.4,
  "ratings_count": 28000,
  "price": "",
  "currency": "USD",
  "availability": "In Stock",
  "cover_image": "https://covers.openlibrary.org/b/id/10527843-L.jpg",
  "preview_link": "https://openlibrary.org/works/OL17930368W",
  "source": "Open Library"
}
```

***

### Use Cases

- **Book Databases** – Build and maintain a structured catalog of books and metadata.
- **Recommendation Engines** – Power book recommendation systems with rich metadata.
- **Publishing Research** – Analyze publishing trends, authors, and categories.
- **E-commerce** – Enrich product listings with book descriptions, covers, and ISBNs.
- **Academic Research** – Collect structured book data for literature or data science projects.
- **Content Aggregation** – Aggregate book information for blogs, apps, or reading platforms.

***

### Quick Start

1. **Open on Apify** – Visit the actor page and click **Try for free**.
2. **Set Input** – Enter book titles or search keywords in the `queries` field.
3. **Adjust Settings** – Set `max_results` and optional proxy configuration.
4. **Run the Actor** – Start the run and monitor progress in the logs.
5. **Download Results** – Export the dataset as JSON, CSV, or Excel once finished.

***

### Technical Stack

- **Data Source** – [Open Library API](https://openlibrary.org/developers/api) (free, public)
- **HTTP Client** – `requests` with custom headers and optional proxy support
- **Proxy** – Apify Proxy (residential or datacenter)
- **Platform** – Apify Actor — serverless, scalable, integrated with Dataset and Key-Value Store

***

### Related Tools

| Actor | Description |
|-------|-------------|
| [Goodreads Scraper](https://apify.com/store) | Extracts book ratings, reviews, and reading lists from Goodreads. |
| [Amazon Book Scraper](https://apify.com/store) | Scrapes book listings, prices, and reviews from Amazon. |
| [Google Books Scraper](https://apify.com/store) | Fetches book metadata and previews via the Google Books API. |
| [ISBN Lookup Tool](https://apify.com/store) | Looks up detailed book info by ISBN from multiple data sources. |
| [Book Price Comparator](https://apify.com/store) | Compares book prices across major online retailers. |

***

### Changelog

**v1.0.0 – Initial Release**

- Multi-query search via Open Library API
- Deep metadata fetching from Works endpoint
- ISBN-13 prioritization
- Cover image and preview link generation
- Proxy configuration support
- Dataset integration with random request delays

***

### Pricing

- **Free** for basic usage on Apify (up to certain compute limits).
- **Paid plans** available for higher volume, priority support, and longer runs.
- Proxy credits consumed if residential proxies are enabled.

***

### Support & Feedback

- **Issues & Ideas** – Open a ticket on the Apify Actor issue tracker.
- **Documentation** – Visit [Apify Docs](https://docs.apify.com) for platform guides.
- **API Notes** – This actor uses the Open Library public API. Please use responsibly and avoid excessive request rates.

***

> **Disclaimer:** This actor accesses publicly available data from Open Library. Please ensure your usage complies with Open Library's terms of service. This actor is intended for research and informational purposes only.

# Actor input Schema

## `queries` (type: `array`):

List of books or keywords to search for.

## `max_results` (type: `integer`):

Max results per search query.

## `proxyConfiguration` (type: `object`):

Recommended to avoid API rate limits.

## Actor input object example

```json
{
  "queries": [
    "Atomic Habits"
  ],
  "max_results": 10,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "queries": [
        "Atomic Habits"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("datapilot/book-metadata-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "queries": ["Atomic Habits"] }

# Run the Actor and wait for it to finish
run = client.actor("datapilot/book-metadata-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "queries": [
    "Atomic Habits"
  ]
}' |
apify call datapilot/book-metadata-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=datapilot/book-metadata-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Book Metadata Scraper",
        "description": "Book Metadata Scraper uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.",
        "version": "0.0",
        "x-build-id": "R6VaAXNZRmhcke9gx"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/datapilot~book-metadata-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-datapilot-book-metadata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/datapilot~book-metadata-scraper/runs": {
            "post": {
                "operationId": "runs-sync-datapilot-book-metadata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/datapilot~book-metadata-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-datapilot-book-metadata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "queries"
                ],
                "properties": {
                    "queries": {
                        "title": "Book Titles / Authors",
                        "type": "array",
                        "description": "List of books or keywords to search for.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "max_results": {
                        "title": "Max Results",
                        "type": "integer",
                        "description": "Max results per search query.",
                        "default": 10
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Recommended to avoid API rate limits.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
