Movie Script Finder & Extractor
Pricing
from $12.00 / 1,000 per movie scripts
Movie Script Finder & Extractor
Find publicly accessible movie scripts and screenplays, extract clean metadata, and output script text in separate chunk rows for research, indexing, and analysis.
Pricing
from $12.00 / 1,000 per movie scripts
Rating
0.0
(0)
Developer
Inus Grobler
Maintained by CommunityActor stats
0
Bookmarked
6
Total users
2
Monthly active users
4 days ago
Last modified
Categories
Share
At a glance: what it does is find public movie scripts and extract screenplay metadata and text chunks; input examples include one movie title or multiple search terms; output examples are metadata rows and screenplay chunk rows; use cases include research, indexing, and LLM workflows; limitations, troubleshooting, and pricing/cost notes are covered below.
Find publicly available movie scripts and screenplays by title, extract clean metadata, and return screenplay text in structured dataset rows that are ready for research, indexing, enrichment, and analysis workflows.
This Actor is designed for clients who need script data without building and maintaining their own crawler. It searches supported public screenplay sources automatically, emits one metadata row per matched script, and streams script text as chunk rows while the run is still in progress.
What You Get
- Public screenplay discovery from supported script sources
- Movie title, writers, genres, source URLs, format, draft details when available
- Plain-text screenplay chunks for sources that expose readable HTML or TXT script text
- Compact metadata rows for PDF, external, or metadata-only matches
- Error rows for unsupported inputs, extraction failures, or no-result searches
- Low-cost defaults: no browser, no proxy by default, 128 MB for single-title runs
Best For
- Screenplay research datasets
- Movie script search and cataloging
- LLM or vector-index preparation
- Writer, genre, and structure analysis
- Building internal screenplay reference tools
- Finding public source links for scripts at scale
Supported Sources
The Actor automatically checks supported public sources. You do not need to choose a source.
| Source | Support |
|---|---|
| IMSDb | Metadata and HTML script text |
| The Daily Script | Metadata, HTML text, and TXT text |
| SimplyScripts | Metadata, TXT links, PDF links, and conservative external-link handling |
| Script Slug | Metadata and public PDF links when available |
PDF text extraction is not enabled by default. PDF-only matches are returned as metadata/link rows.
Input
Use one of the two public input fields.
One Movie
Use movieName when you want one best-match screenplay.
{"movieName": "The Matrix"}
Multiple Searches
Use searches when you want results for multiple movie titles or search terms.
{"searches": ["The Matrix", "Alien", "Terminator"]}
Input Notes
- If
movieNameandsearchesare both filled,movieNametakes priority. - Keep movie titles specific for best matching.
- Results are pushed to the dataset as they are scraped, not only after the run finishes.
- Single-title runs use the cheapest defaults. Multi-search runs use more memory because they can return many scripts and chunks.
Output
Results are available in the default dataset. The Actor emits these row types:
| Type | Meaning |
|---|---|
script_metadata | One summary row for each matched script |
script_chunk | Plain-text screenplay content split into ordered chunks |
script_analysis | Optional analysis row in advanced runs |
error | Invalid input, no results, unsupported source, or extraction failure |
Unknown or unavailable success fields are omitted instead of filled with null.
Metadata Row Example
{"type": "script_metadata","source": "imsdb","scrapedAt": "2026-06-08T07:00:00.000Z","scriptId": "imsdb-the-matrix","scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html","title": "The Matrix","writers": ["Larry Wachowski", "Andy Wachowski"],"genres": ["Action", "Sci-Fi", "Thriller"],"scriptFormat": "html","hasScriptText": true,"chunkCount": 8,"wordCount": 23137,"characterCount": 143493,"sceneCount": 119}
The metadata row does not contain the full script text.
Chunk Row Example
{"type": "script_chunk","source": "imsdb","scrapedAt": "2026-06-08T07:00:00.000Z","scriptId": "imsdb-the-matrix","scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html","title": "The Matrix","chunkIndex": 1,"chunkMode": "fixed_size","chunkTitle": "Chunk 1","chunkText": "THE MATRIX\\n\\nWritten by Larry and Andy Wachowski...","chunkCharacterCount": 19995,"chunkWordCount": 3300,"nextChunkIndex": 2}
The default chunking is optimized for cost by using larger chunks, so fewer dataset rows are created while preserving the full extracted script text.
Error Row Example
{"type": "error","source": "unknown","scrapedAt": "2026-06-08T07:00:00.000Z","url": "https://apify.com/actors/thescrapelab/screenplay-script-scraper","status": "failed","errorType": "NO_RESULTS","errorMessage": "No matching screenplay results found for: Example Missing Movie","retryable": false}
How To Use The Results
- Start the Actor from Apify Console.
- Enter either a single
movieNameor asearcheslist. - Open the dataset while the run is active to see rows appear during scraping.
- Use
script_metadatarows for cataloging and filtering. - Use
script_chunkrows for text indexing, search, LLM workflows, or downstream analysis.
Python API Example
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run_input = {"movieName": "The Matrix",}run = client.actor("thescrapelab/screenplay-script-scraper").call(run_input=run_input)dataset_id = run["defaultDatasetId"]items = client.dataset(dataset_id).list_items(clean=True).itemsmetadata_rows = [item for item in items if item.get("type") == "script_metadata"]chunk_rows = [item for item in items if item.get("type") == "script_chunk"]print(f"Scripts found: {len(metadata_rows)}")print(f"Text chunks: {len(chunk_rows)}")for row in metadata_rows:print(row.get("title"), row.get("scriptUrl"), row.get("wordCount"))
For multiple searches:
run_input = {"searches": ["The Matrix", "Alien", "Terminator"],}
Cost And Performance
The Actor is tuned to keep run costs low:
- Uses lightweight HTTP crawling, not a browser
- Uses direct public requests by default, not a proxy
- Uses 128 MB memory for single-title runs
- Uses larger text chunks by default to reduce dataset item count
- Streams rows as they are found
For a typical single-title screenplay such as The Matrix, the Actor returns one metadata row plus a small number of chunk rows while preserving the full extracted script text.
Practical Tips
- Use
movieNamefor the cheapest, most focused run. - Use
searcheswhen you want broader discovery across multiple titles. - Prefer exact titles over broad words.
- Expect metadata-only rows for PDF-only or external sources.
- Check
hasScriptTextandchunkCountto identify rows with extracted screenplay text.
Limitations
- The Actor only uses publicly accessible pages.
- It does not bypass paywalls, logins, CAPTCHAs, or access controls.
- Source websites can change their layout, availability, or robots rules.
- Some public sources expose only PDF or external links; those may return metadata rows rather than script text.
- Search matching is title-oriented and may return related sequels, remakes, or same-franchise scripts.
- Word counts, scene counts, and draft detection are approximate.
Legal And Ethical Notice
Movie scripts and screenplays may be copyrighted. This Actor is intended for indexing, metadata extraction, research, discovery, and analysis of publicly available pages.
You are responsible for ensuring that your use complies with copyright law, source website terms, robots.txt, and applicable regulations. The Actor is not a piracy tool and does not bypass access controls.
Support
If a title does not return the expected script, try a more exact movie title. If a source changes or a result looks wrong, rerun with a narrower query and review the source, scriptUrl, errorType, and errorMessage fields in the dataset.