YouTube Transcript Scraper
Pricing
Pay per event
YouTube Transcript Scraper
Extract clean transcript text, timestamps, captions, and public video metadata from YouTube URLs or video IDs for AI, SEO, and research workflows.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Hanna Nosova
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
Extract clean transcripts, captions, timestamps, and basic metadata from public YouTube videos.
What does YouTube Transcript Scraper do?
YouTube Transcript Scraper turns public YouTube captions into structured data.
It accepts YouTube watch URLs, Shorts URLs, youtu.be links, embed URLs, live URLs, or raw video IDs.
For every video, it returns a dataset row with the video ID, URL, title, channel name, caption language, transcript text, and optional timestamped segments.
Videos without public captions are handled gracefully with a clear error message instead of failing the whole run.
Who is it for?
SEO and content teams
Use transcripts to repurpose videos into briefs, blog drafts, quote libraries, keyword research, and content audits.
Researchers and analysts
Collect spoken content from public videos for media monitoring, qualitative research, public-interest analysis, or education datasets.
LLM and RAG builders
Create clean text chunks from public video captions for search, summarization, classification, embeddings, and retrieval workflows.
Sales and marketing teams
Extract talks, interviews, demos, webinars, and competitor videos into searchable text for faster review.
Journalists and fact checkers
Create searchable transcript records for public videos, speeches, interviews, and announcements.
Why use this actor?
- ✅ Structured output with one row per video
- ✅ Transcript text plus timestamped segments
- ✅ Public caption language selection
- ✅ Graceful handling for videos with no captions
- ✅ Works with video URLs and video IDs
- ✅ Low-cost runs for captioned public videos
- ✅ Output ready for spreadsheets, APIs, and AI workflows
What data can I extract?
| Field | Description |
|---|---|
videoId | YouTube video ID |
videoUrl | Canonical watch URL |
title | Public video title when available |
channelName | Public channel name when available |
language | Caption language used |
isAutoGenerated | Whether the selected captions appear auto-generated |
transcriptText | Full transcript as one clean text string |
segments | Timestamped transcript segments |
duration | Video duration in seconds when available |
thumbnailUrl | Public thumbnail URL when available |
captionsAvailable | Whether public caption tracks were found |
error | Explanation for unavailable captions or failed videos |
How much does it cost to extract YouTube transcripts?
The actor uses pay-per-event pricing.
There is a small start event for each run and a per-transcript event for every successful transcript extracted.
A first test with one or two videos is inexpensive.
For high-volume work, run batches of known captioned public videos to keep cost predictable.
Final tiered pricing is set on Apify before publication and is visible on the actor page.
How to use YouTube Transcript Scraper
- Open the actor on Apify.
- Paste one or more public YouTube video URLs.
- Optionally set a preferred caption language such as
en,es, orde. - Choose whether to include timestamped segments.
- Click Start.
- Download the dataset as JSON, CSV, Excel, XML, or HTML.
Input example
{"videoUrls": [{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }],"language": "en","includeTimestamps": true,"includeMetadata": true,"maxVideos": 5}
Output example
{"videoId": "dQw4w9WgXcQ","videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","title": "Example video title","channelName": "Example channel","language": "en","isAutoGenerated": false,"transcriptText": "We're no strangers to love...","segments": [{ "start": 0.0, "duration": 2.1, "text": "We're no strangers to love" }],"duration": 213,"thumbnailUrl": "https://i.ytimg.com/...","captionsAvailable": true}
Supported YouTube URL formats
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_IDhttps://www.youtube.com/shorts/VIDEO_IDhttps://www.youtube.com/embed/VIDEO_IDhttps://www.youtube.com/live/VIDEO_ID- Raw
VIDEO_IDvalues
Caption language selection
Set language to your preferred caption language code.
If that exact language is not available, the actor falls back to a related language variant or the first public caption track.
For example, en may match English captions when present.
The selected language is returned in the language field.
Timestamped transcript segments
Enable includeTimestamps to receive segment-level timing.
Each segment can include:
start— segment start in secondsduration— segment length in secondstext— spoken caption text
Disable timestamps when you only need the combined transcript text.
Metadata fields
Enable includeMetadata to include public video details when available.
Metadata can include title, channel name, duration, and thumbnail URL.
Some unavailable or restricted videos may return less metadata.
Handling videos with no captions
Not every public YouTube video has public captions.
When captions are unavailable, the actor still saves a row with:
captionsAvailable: false- the video ID and URL
- an
errormessage explaining what happened
This makes batch runs easier to audit because one bad video does not stop the rest of the run.
Tips for best results
- ✅ Use public videos with captions enabled.
- ✅ Start with a small batch to confirm your input format.
- ✅ Use
languagewhen you need a specific caption language. - ✅ Keep
maxVideoslow for testing and increase it after validation. - ✅ Check
captionsAvailablebefore using transcript text in automated workflows.
Integrations
You can connect the dataset to:
- Google Sheets for editorial review
- Zapier or Make for automations
- Vector databases for embeddings and retrieval
- BI tools for media analysis
- Internal dashboards for monitoring public video content
- LLM workflows for summarization, tagging, and question answering
API usage with Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('fetch_cat/youtube-transcript-scraper').call({videoUrls: [{ url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' }],language: 'en',});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items[0].transcriptText);
API usage with Python
from apify_client import ApifyClientimport osclient = ApifyClient(os.environ['APIFY_TOKEN'])run = client.actor('fetch_cat/youtube-transcript-scraper').call(run_input={'videoUrls': [{'url': 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'}],'language': 'en',})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items[0].get('transcriptText'))
API usage with cURL
curl -X POST "https://api.apify.com/v2/acts/fetch_cat~youtube-transcript-scraper/runs?token=$APIFY_TOKEN" \-H 'Content-Type: application/json' \-d '{"videoUrls":[{"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],"language":"en"}'
MCP and AI agent usage
Use this actor through Apify MCP when you want an AI assistant to fetch public video transcripts.
MCP server URL pattern:
https://mcp.apify.com/?tools=fetch_cat/youtube-transcript-scraper
Claude Code setup:
$claude mcp add apify-youtube-transcripts --transport http --url "https://mcp.apify.com/?tools=fetch_cat/youtube-transcript-scraper"
Claude Desktop JSON config:
{"mcpServers": {"apify-youtube-transcripts": {"url": "https://mcp.apify.com/?tools=fetch_cat/youtube-transcript-scraper"}}}
Example prompts:
- "Extract the transcript from this public YouTube video and summarize the key claims."
- "Get transcripts for these five public webinar URLs and make a topic table."
- "Find quotes in this public interview transcript about pricing."
Common use cases
- Video-to-blog repurposing
- Public webinar transcript extraction
- Research corpus creation
- Podcast-style YouTube episode analysis
- Competitive content monitoring
- Training data preparation from public captions
- Subtitle QA and language availability checks
Limitations
This actor extracts public captions only.
It cannot access private videos, members-only videos, deleted videos, region-blocked content unavailable to the runner, or videos without public caption tracks.
Transcript quality depends on the caption track provided for the public video.
Auto-generated captions may contain recognition errors.
Legality and responsible use
Use this actor only for content you are allowed to access and process.
YouTube videos and captions may be protected by copyright or platform terms.
You are responsible for ensuring that your use case, storage, redistribution, and analysis comply with applicable laws, platform rules, and rights-holder requirements.
FAQ
Does it work without a YouTube account?
Yes, the actor is designed for public videos and public caption tracks.
Can it extract transcripts from private videos?
No. Private, members-only, deleted, or otherwise inaccessible videos are outside scope.
Why did a video return captionsAvailable=false?
The video may not have public captions, the video may be unavailable, or YouTube may not expose captions for that video.
Can I choose a language?
Yes. Use the language input with a language code such as en, es, fr, or de.
Are timestamps included?
Yes, when includeTimestamps is enabled.
Why are captions imperfect?
Some videos use auto-generated captions. These can include speech-recognition mistakes.
Troubleshooting
My run succeeded but transcript text is empty
Check the error and captionsAvailable fields. The video probably has no public caption track.
My preferred language was not returned
The requested language may not be available. The actor falls back to another public caption track when needed.
Related scrapers
Explore related actors from this account:
reddit-scrapertiktok-comments-scraperwebsite-change-monitor
Changelog
0.1
Initial version with public YouTube transcript extraction, timestamped segments, language selection, metadata fields, and graceful no-caption handling.
Support
If a public captioned video fails unexpectedly, provide the video URL, input JSON, and run ID so the issue can be reproduced.