# Goodreads Scraper (`datapilot/goodreads-scraper`) Actor

Goodreads Scraper r uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.

- **URL**: https://apify.com/datapilot/goodreads-scraper.md
- **Developed by:** [Data Pilot](https://apify.com/datapilot) (community)
- **Categories:** Other, Social media
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$8.00/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Goodreads Scraper

### Overview

The **Goodreads Scraper** is an Apify Actor that extracts book metadata directly from [Goodreads](https://www.goodreads.com) book pages. Provide one or more Goodreads URLs and the actor returns structured data including title, author, rating, description, and cover image. Whether you're building a book database, analyzing reader sentiment, or powering a recommendation engine, this actor delivers accurate Goodreads data efficiently.

With proxy support and built-in anti-blocking delays, it ensures reliable access to Goodreads pages without interruptions.

---

### Features

- **Direct Page Scraping** – Extracts data straight from Goodreads book pages using HTML parsing.
- **Rich Metadata** – Returns title, author, rating, description, and cover image for each book.
- **Batch Processing** – Processes multiple Goodreads URLs in a single run.
- **Proxy Support** – Optionally uses Apify residential proxies to avoid IP blocking.
- **Anti-Blocking Delays** – Adds random delays between requests to mimic human browsing.
- **Error Handling** – Logs errors and continues processing remaining URLs.
- **Dataset Integration** – Automatically pushes all scraped data to your Apify dataset for easy export.

---

### How It Works

1. **Input** – Provide a list of Goodreads book page URLs.
2. **Fetch Page** – The actor requests each URL with browser-like headers and optional proxy.
3. **Parse HTML** – It uses BeautifulSoup to extract book metadata from the page structure.
4. **Build Output** – Structures all available data into a clean record and pushes it to the dataset.
5. **Repeat** – Processes all URLs with a random delay between requests.

---

### Input

| Field                | Type             | Default | Description                                                              |
|----------------------|------------------|---------|--------------------------------------------------------------------------|
| `urls`               | Array of strings | `[]`    | **Required.** List of Goodreads book page URLs to scrape.                |
| `proxyConfiguration` | Object           | `{}`    | Apify proxy configuration (e.g., `{ "proxyGroups": ["RESIDENTIAL"] }`). |

**Example input:**

```json
{
  "urls": [
    "https://www.goodreads.com/book/show/40121378-atomic-habits",
    "https://www.goodreads.com/book/show/865.The_Alchemist"
  ],
  "proxyConfiguration": {
    "proxyGroups": ["RESIDENTIAL"],
    "apifyProxyCountry": "US"
  }
}
````

***

### Output

Each book is pushed as a separate dataset record with the following fields:

| Field          | Type   | Description                                      |
|----------------|--------|--------------------------------------------------|
| `title`        | string | Book title.                                      |
| `authorName`   | string | Author's full name.                              |
| `rating`       | string | Average Goodreads rating (e.g., `"4.37"`).       |
| `description`  | string | Book description/synopsis.                       |
| `language`     | string | Language code (default: `"ENG"`).                |
| `currency`     | string | Currency code (default: `"USD"`).                |
| `cover_image`  | string | Direct URL to the book's cover image.            |
| `source`       | string | Data source (always `"Goodreads"`).              |
| `preview_link` | string | The original Goodreads page URL.                 |
| `url`          | string | The original Goodreads page URL.                 |

**Example output:**

```json
{
  "title": "Atomic Habits",
  "authorName": "James Clear",
  "rating": "4.37",
  "description": "No matter your goals, Atomic Habits offers a proven framework for improving every day...",
  "language": "ENG",
  "currency": "USD",
  "cover_image": "https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1655988385l/40121378.jpg",
  "source": "Goodreads",
  "preview_link": "https://www.goodreads.com/book/show/40121378-atomic-habits",
  "url": "https://www.goodreads.com/book/show/40121378-atomic-habits"
}
```

***

### Use Cases

- **Book Databases** – Build and maintain a structured catalog with Goodreads metadata.
- **Recommendation Engines** – Power book recommendation systems using ratings and descriptions.
- **Publishing Research** – Analyze Goodreads ratings and reader trends across genres.
- **E-commerce Enrichment** – Enrich product listings with Goodreads descriptions and cover images.
- **Academic Research** – Collect structured Goodreads data for literature or data science projects.
- **Content Aggregation** – Aggregate Goodreads book data for blogs, apps, or reading platforms.

***

### Quick Start

1. **Open on Apify** – Visit the actor page and click **Try for free**.
2. **Set Input** – Paste your Goodreads book page URLs into the `urls` field.
3. **Enable Proxy (Optional)** – Configure proxy groups to avoid rate limiting.
4. **Run the Actor** – Start the run and monitor progress in the logs.
5. **Download Results** – Export the dataset as JSON, CSV, or Excel once finished.

***

### Technical Stack

- **Data Source** – [Goodreads](https://www.goodreads.com) (HTML scraping)
- **HTML Parser** – `BeautifulSoup` with `lxml` backend
- **HTTP Client** – `requests` with browser-like headers and optional proxy support
- **Proxy** – Apify Proxy (residential or datacenter)
- **Platform** – Apify Actor — serverless, scalable, integrated with Dataset and Key-Value Store

***

### Related Tools

| Actor | Description |
|-------|-------------|
| [Book Metadata Scraper](https://apify.com/store) | Extracts rich book metadata from the Open Library database. |
| [Amazon Book Scraper](https://apify.com/store) | Scrapes book listings, prices, and reviews from Amazon. |
| [Google Books Scraper](https://apify.com/store) | Fetches book metadata and previews via the Google Books API. |
| [ISBN Lookup Tool](https://apify.com/store) | Looks up detailed book info by ISBN from multiple data sources. |
| [Book Price Comparator](https://apify.com/store) | Compares book prices across major online retailers. |

***

### Changelog

**v1.0.0 – Initial Release**

- Direct HTML scraping of Goodreads book pages
- Title, author, rating, description, and cover image extraction
- Proxy configuration support
- Anti-blocking random delays
- Dataset integration with error handling

***

### Pricing

- **Free** for basic usage on Apify (up to certain compute limits).
- **Paid plans** available for higher volume, priority support, and longer runs.
- Proxy credits consumed if residential proxies are enabled.

***

### Support & Feedback

- **Issues & Ideas** – Open a ticket on the Apify Actor issue tracker.
- **Documentation** – Visit [Apify Docs](https://docs.apify.com) for platform guides.
- **Scraping Notes** – Use proxies and keep request rates low to avoid blocks from Goodreads.

***

> **Disclaimer:** This actor scrapes publicly visible data from Goodreads. Please ensure your usage complies with Goodreads' terms of service. This actor is intended for research and informational purposes only.

# Actor input Schema

## `urls` (type: `array`):

List of Goodreads book page URLs to scrape.

## `proxyConfiguration` (type: `object`):

Recommended to bypass Goodreads bot detection.

## Actor input object example

```json
{
  "urls": [
    "https://www.goodreads.com/book/show/40605285-atomic-habits"
  ],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://www.goodreads.com/book/show/40605285-atomic-habits"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("datapilot/goodreads-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "urls": ["https://www.goodreads.com/book/show/40605285-atomic-habits"] }

# Run the Actor and wait for it to finish
run = client.actor("datapilot/goodreads-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://www.goodreads.com/book/show/40605285-atomic-habits"
  ]
}' |
apify call datapilot/goodreads-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=datapilot/goodreads-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Goodreads Scraper",
        "description": "Goodreads Scraper r uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.",
        "version": "0.0",
        "x-build-id": "M8PugFBcd0pRccUF1"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/datapilot~goodreads-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-datapilot-goodreads-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/datapilot~goodreads-scraper/runs": {
            "post": {
                "operationId": "runs-sync-datapilot-goodreads-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/datapilot~goodreads-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-datapilot-goodreads-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "Goodreads URLs",
                        "type": "array",
                        "description": "List of Goodreads book page URLs to scrape.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Recommended to bypass Goodreads bot detection.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
