Movie Script Finder & Extractor avatar

Movie Script Finder & Extractor

Pricing

from $12.00 / 1,000 per movie scripts

Go to Apify Store
Movie Script Finder & Extractor

Movie Script Finder & Extractor

Find publicly accessible movie scripts and screenplays, extract clean metadata, and output script text in separate chunk rows for research, indexing, and analysis.

Pricing

from $12.00 / 1,000 per movie scripts

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

2

Monthly active users

4 days ago

Last modified

Share

At a glance: what it does is find public movie scripts and extract screenplay metadata and text chunks; input examples include one movie title or multiple search terms; output examples are metadata rows and screenplay chunk rows; use cases include research, indexing, and LLM workflows; limitations, troubleshooting, and pricing/cost notes are covered below.

Find publicly available movie scripts and screenplays by title, extract clean metadata, and return screenplay text in structured dataset rows that are ready for research, indexing, enrichment, and analysis workflows.

This Actor is designed for clients who need script data without building and maintaining their own crawler. It searches supported public screenplay sources automatically, emits one metadata row per matched script, and streams script text as chunk rows while the run is still in progress.

What You Get

  • Public screenplay discovery from supported script sources
  • Movie title, writers, genres, source URLs, format, draft details when available
  • Plain-text screenplay chunks for sources that expose readable HTML or TXT script text
  • Compact metadata rows for PDF, external, or metadata-only matches
  • Error rows for unsupported inputs, extraction failures, or no-result searches
  • Low-cost defaults: no browser, no proxy by default, 128 MB for single-title runs

Best For

  • Screenplay research datasets
  • Movie script search and cataloging
  • LLM or vector-index preparation
  • Writer, genre, and structure analysis
  • Building internal screenplay reference tools
  • Finding public source links for scripts at scale

Supported Sources

The Actor automatically checks supported public sources. You do not need to choose a source.

SourceSupport
IMSDbMetadata and HTML script text
The Daily ScriptMetadata, HTML text, and TXT text
SimplyScriptsMetadata, TXT links, PDF links, and conservative external-link handling
Script SlugMetadata and public PDF links when available

PDF text extraction is not enabled by default. PDF-only matches are returned as metadata/link rows.

Input

Use one of the two public input fields.

One Movie

Use movieName when you want one best-match screenplay.

{
"movieName": "The Matrix"
}

Multiple Searches

Use searches when you want results for multiple movie titles or search terms.

{
"searches": ["The Matrix", "Alien", "Terminator"]
}

Input Notes

  • If movieName and searches are both filled, movieName takes priority.
  • Keep movie titles specific for best matching.
  • Results are pushed to the dataset as they are scraped, not only after the run finishes.
  • Single-title runs use the cheapest defaults. Multi-search runs use more memory because they can return many scripts and chunks.

Output

Results are available in the default dataset. The Actor emits these row types:

TypeMeaning
script_metadataOne summary row for each matched script
script_chunkPlain-text screenplay content split into ordered chunks
script_analysisOptional analysis row in advanced runs
errorInvalid input, no results, unsupported source, or extraction failure

Unknown or unavailable success fields are omitted instead of filled with null.

Metadata Row Example

{
"type": "script_metadata",
"source": "imsdb",
"scrapedAt": "2026-06-08T07:00:00.000Z",
"scriptId": "imsdb-the-matrix",
"scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
"title": "The Matrix",
"writers": ["Larry Wachowski", "Andy Wachowski"],
"genres": ["Action", "Sci-Fi", "Thriller"],
"scriptFormat": "html",
"hasScriptText": true,
"chunkCount": 8,
"wordCount": 23137,
"characterCount": 143493,
"sceneCount": 119
}

The metadata row does not contain the full script text.

Chunk Row Example

{
"type": "script_chunk",
"source": "imsdb",
"scrapedAt": "2026-06-08T07:00:00.000Z",
"scriptId": "imsdb-the-matrix",
"scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
"title": "The Matrix",
"chunkIndex": 1,
"chunkMode": "fixed_size",
"chunkTitle": "Chunk 1",
"chunkText": "THE MATRIX\\n\\nWritten by Larry and Andy Wachowski...",
"chunkCharacterCount": 19995,
"chunkWordCount": 3300,
"nextChunkIndex": 2
}

The default chunking is optimized for cost by using larger chunks, so fewer dataset rows are created while preserving the full extracted script text.

Error Row Example

{
"type": "error",
"source": "unknown",
"scrapedAt": "2026-06-08T07:00:00.000Z",
"url": "https://apify.com/actors/thescrapelab/screenplay-script-scraper",
"status": "failed",
"errorType": "NO_RESULTS",
"errorMessage": "No matching screenplay results found for: Example Missing Movie",
"retryable": false
}

How To Use The Results

  1. Start the Actor from Apify Console.
  2. Enter either a single movieName or a searches list.
  3. Open the dataset while the run is active to see rows appear during scraping.
  4. Use script_metadata rows for cataloging and filtering.
  5. Use script_chunk rows for text indexing, search, LLM workflows, or downstream analysis.

Python API Example

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run_input = {
"movieName": "The Matrix",
}
run = client.actor("thescrapelab/screenplay-script-scraper").call(run_input=run_input)
dataset_id = run["defaultDatasetId"]
items = client.dataset(dataset_id).list_items(clean=True).items
metadata_rows = [item for item in items if item.get("type") == "script_metadata"]
chunk_rows = [item for item in items if item.get("type") == "script_chunk"]
print(f"Scripts found: {len(metadata_rows)}")
print(f"Text chunks: {len(chunk_rows)}")
for row in metadata_rows:
print(row.get("title"), row.get("scriptUrl"), row.get("wordCount"))

For multiple searches:

run_input = {
"searches": ["The Matrix", "Alien", "Terminator"],
}

Cost And Performance

The Actor is tuned to keep run costs low:

  • Uses lightweight HTTP crawling, not a browser
  • Uses direct public requests by default, not a proxy
  • Uses 128 MB memory for single-title runs
  • Uses larger text chunks by default to reduce dataset item count
  • Streams rows as they are found

For a typical single-title screenplay such as The Matrix, the Actor returns one metadata row plus a small number of chunk rows while preserving the full extracted script text.

Practical Tips

  • Use movieName for the cheapest, most focused run.
  • Use searches when you want broader discovery across multiple titles.
  • Prefer exact titles over broad words.
  • Expect metadata-only rows for PDF-only or external sources.
  • Check hasScriptText and chunkCount to identify rows with extracted screenplay text.

Limitations

  • The Actor only uses publicly accessible pages.
  • It does not bypass paywalls, logins, CAPTCHAs, or access controls.
  • Source websites can change their layout, availability, or robots rules.
  • Some public sources expose only PDF or external links; those may return metadata rows rather than script text.
  • Search matching is title-oriented and may return related sequels, remakes, or same-franchise scripts.
  • Word counts, scene counts, and draft detection are approximate.

Movie scripts and screenplays may be copyrighted. This Actor is intended for indexing, metadata extraction, research, discovery, and analysis of publicly available pages.

You are responsible for ensuring that your use complies with copyright law, source website terms, robots.txt, and applicable regulations. The Actor is not a piracy tool and does not bypass access controls.

Support

If a title does not return the expected script, try a more exact movie title. If a source changes or a result looks wrong, rerun with a narrower query and review the source, scriptUrl, errorType, and errorMessage fields in the dataset.