# Movie Script Finder & Extractor (`thescrapelab/screenplay-script-scraper`) Actor

Find publicly accessible movie scripts and screenplays, extract clean metadata, and output script text in separate chunk rows for research, indexing, and analysis.

- **URL**: https://apify.com/thescrapelab/screenplay-script-scraper.md
- **Developed by:** [Inus Grobler](https://apify.com/thescrapelab) (community)
- **Categories:** Developer tools, Automation, AI
- **Stats:** 6 total users, 2 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $12.00 / 1,000 per movie scripts

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Movie Script Finder & Extractor

At a glance: what it does is find public movie scripts and extract screenplay metadata and text chunks; input examples include one movie title or multiple search terms; output examples are metadata rows and screenplay chunk rows; use cases include research, indexing, and LLM workflows; limitations, troubleshooting, and pricing/cost notes are covered below.

Find publicly available movie scripts and screenplays by title, extract clean metadata, and return screenplay text in structured dataset rows that are ready for research, indexing, enrichment, and analysis workflows.

This Actor is designed for clients who need script data without building and maintaining their own crawler. It searches supported public screenplay sources automatically, emits one metadata row per matched script, and streams script text as chunk rows while the run is still in progress.

### What You Get

- Public screenplay discovery from supported script sources
- Movie title, writers, genres, source URLs, format, draft details when available
- Plain-text screenplay chunks for sources that expose readable HTML or TXT script text
- Compact metadata rows for PDF, external, or metadata-only matches
- Error rows for unsupported inputs, extraction failures, or no-result searches
- Low-cost defaults: no browser, no proxy by default, 128 MB for single-title runs

### Best For

- Screenplay research datasets
- Movie script search and cataloging
- LLM or vector-index preparation
- Writer, genre, and structure analysis
- Building internal screenplay reference tools
- Finding public source links for scripts at scale

### Supported Sources

The Actor automatically checks supported public sources. You do not need to choose a source.

| Source | Support |
| --- | --- |
| IMSDb | Metadata and HTML script text |
| The Daily Script | Metadata, HTML text, and TXT text |
| SimplyScripts | Metadata, TXT links, PDF links, and conservative external-link handling |
| Script Slug | Metadata and public PDF links when available |

PDF text extraction is not enabled by default. PDF-only matches are returned as metadata/link rows.

### Input

Use one of the two public input fields.

#### One Movie

Use `movieName` when you want one best-match screenplay.

```json
{
  "movieName": "The Matrix"
}
````

#### Multiple Searches

Use `searches` when you want results for multiple movie titles or search terms.

```json
{
  "searches": ["The Matrix", "Alien", "Terminator"]
}
```

#### Input Notes

- If `movieName` and `searches` are both filled, `movieName` takes priority.
- Keep movie titles specific for best matching.
- Results are pushed to the dataset as they are scraped, not only after the run finishes.
- Single-title runs use the cheapest defaults. Multi-search runs use more memory because they can return many scripts and chunks.

### Output

Results are available in the default dataset. The Actor emits these row types:

| Type | Meaning |
| --- | --- |
| `script_metadata` | One summary row for each matched script |
| `script_chunk` | Plain-text screenplay content split into ordered chunks |
| `script_analysis` | Optional analysis row in advanced runs |
| `error` | Invalid input, no results, unsupported source, or extraction failure |

Unknown or unavailable success fields are omitted instead of filled with `null`.

### Metadata Row Example

```json
{
  "type": "script_metadata",
  "source": "imsdb",
  "scrapedAt": "2026-06-08T07:00:00.000Z",
  "scriptId": "imsdb-the-matrix",
  "scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
  "title": "The Matrix",
  "writers": ["Larry Wachowski", "Andy Wachowski"],
  "genres": ["Action", "Sci-Fi", "Thriller"],
  "scriptFormat": "html",
  "hasScriptText": true,
  "chunkCount": 8,
  "wordCount": 23137,
  "characterCount": 143493,
  "sceneCount": 119
}
```

The metadata row does not contain the full script text.

### Chunk Row Example

```json
{
  "type": "script_chunk",
  "source": "imsdb",
  "scrapedAt": "2026-06-08T07:00:00.000Z",
  "scriptId": "imsdb-the-matrix",
  "scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
  "title": "The Matrix",
  "chunkIndex": 1,
  "chunkMode": "fixed_size",
  "chunkTitle": "Chunk 1",
  "chunkText": "THE MATRIX\\n\\nWritten by Larry and Andy Wachowski...",
  "chunkCharacterCount": 19995,
  "chunkWordCount": 3300,
  "nextChunkIndex": 2
}
```

The default chunking is optimized for cost by using larger chunks, so fewer dataset rows are created while preserving the full extracted script text.

### Error Row Example

```json
{
  "type": "error",
  "source": "unknown",
  "scrapedAt": "2026-06-08T07:00:00.000Z",
  "url": "https://apify.com/actors/thescrapelab/screenplay-script-scraper",
  "status": "failed",
  "errorType": "NO_RESULTS",
  "errorMessage": "No matching screenplay results found for: Example Missing Movie",
  "retryable": false
}
```

### How To Use The Results

1. Start the Actor from Apify Console.
2. Enter either a single `movieName` or a `searches` list.
3. Open the dataset while the run is active to see rows appear during scraping.
4. Use `script_metadata` rows for cataloging and filtering.
5. Use `script_chunk` rows for text indexing, search, LLM workflows, or downstream analysis.

### Python API Example

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run_input = {
    "movieName": "The Matrix",
}

run = client.actor("thescrapelab/screenplay-script-scraper").call(run_input=run_input)

dataset_id = run["defaultDatasetId"]
items = client.dataset(dataset_id).list_items(clean=True).items

metadata_rows = [item for item in items if item.get("type") == "script_metadata"]
chunk_rows = [item for item in items if item.get("type") == "script_chunk"]

print(f"Scripts found: {len(metadata_rows)}")
print(f"Text chunks: {len(chunk_rows)}")

for row in metadata_rows:
    print(row.get("title"), row.get("scriptUrl"), row.get("wordCount"))
```

For multiple searches:

```python
run_input = {
    "searches": ["The Matrix", "Alien", "Terminator"],
}
```

### Cost And Performance

The Actor is tuned to keep run costs low:

- Uses lightweight HTTP crawling, not a browser
- Uses direct public requests by default, not a proxy
- Uses 128 MB memory for single-title runs
- Uses larger text chunks by default to reduce dataset item count
- Streams rows as they are found

For a typical single-title screenplay such as `The Matrix`, the Actor returns one metadata row plus a small number of chunk rows while preserving the full extracted script text.

### Practical Tips

- Use `movieName` for the cheapest, most focused run.
- Use `searches` when you want broader discovery across multiple titles.
- Prefer exact titles over broad words.
- Expect metadata-only rows for PDF-only or external sources.
- Check `hasScriptText` and `chunkCount` to identify rows with extracted screenplay text.

### Limitations

- The Actor only uses publicly accessible pages.
- It does not bypass paywalls, logins, CAPTCHAs, or access controls.
- Source websites can change their layout, availability, or robots rules.
- Some public sources expose only PDF or external links; those may return metadata rows rather than script text.
- Search matching is title-oriented and may return related sequels, remakes, or same-franchise scripts.
- Word counts, scene counts, and draft detection are approximate.

### Legal And Ethical Notice

Movie scripts and screenplays may be copyrighted. This Actor is intended for indexing, metadata extraction, research, discovery, and analysis of publicly available pages.

You are responsible for ensuring that your use complies with copyright law, source website terms, robots.txt, and applicable regulations. The Actor is not a piracy tool and does not bypass access controls.

### Support

If a title does not return the expected script, try a more exact movie title. If a source changes or a result looks wrong, rerun with a narrower query and review the `source`, `scriptUrl`, `errorType`, and `errorMessage` fields in the dataset.

# Actor input Schema

## `movieName` (type: `string`):

Enter one movie title to get the single best screenplay match with script chunks. If this is filled, it takes priority over searches.

## `searches` (type: `array`):

Enter several movie titles or search terms to find multiple matching scripts. When public script text is available, the actor also returns script\_chunk rows for each matched script. If a source only exposes metadata or a public file link, that result is still returned as a metadata row. If movieName is also filled, this list will be ignored.

## Actor input object example

```json
{
  "movieName": "The Matrix"
}
```

# Actor output Schema

## `results` (type: `string`):

Overview dataset rows containing screenplay metadata, script text chunks, and error rows when applicable.

## `debugHtml` (type: `string`):

Optional debug HTML records saved for troubleshooting extraction from supported sources.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "movieName": "The Matrix"
};

// Run the Actor and wait for it to finish
const run = await client.actor("thescrapelab/screenplay-script-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "movieName": "The Matrix" }

# Run the Actor and wait for it to finish
run = client.actor("thescrapelab/screenplay-script-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "movieName": "The Matrix"
}' |
apify call thescrapelab/screenplay-script-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=thescrapelab/screenplay-script-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Movie Script Finder & Extractor",
        "description": "Find publicly accessible movie scripts and screenplays, extract clean metadata, and output script text in separate chunk rows for research, indexing, and analysis.",
        "version": "0.4",
        "x-build-id": "EYROYZLGLpRxmFriS"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/thescrapelab~screenplay-script-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-thescrapelab-screenplay-script-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/thescrapelab~screenplay-script-scraper/runs": {
            "post": {
                "operationId": "runs-sync-thescrapelab-screenplay-script-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/thescrapelab~screenplay-script-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-thescrapelab-screenplay-script-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "movieName": {
                        "title": "Movie Name",
                        "type": "string",
                        "description": "Enter one movie title to get the single best screenplay match with script chunks. If this is filled, it takes priority over searches."
                    },
                    "searches": {
                        "title": "Search List",
                        "type": "array",
                        "description": "Enter several movie titles or search terms to find multiple matching scripts. When public script text is available, the actor also returns script_chunk rows for each matched script. If a source only exposes metadata or a public file link, that result is still returned as a metadata row. If movieName is also filled, this list will be ignored.",
                        "items": {
                            "type": "string"
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
