YouTube Transcript Scraper avatar

YouTube Transcript Scraper

Pricing

Pay per event

Go to Apify Store
YouTube Transcript Scraper

YouTube Transcript Scraper

Extract clean transcript text, timestamps, captions, and public video metadata from YouTube URLs or video IDs for AI, SEO, and research workflows.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Hanna Nosova

Hanna Nosova

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

2 days ago

Last modified

Categories

Share

Extract clean transcripts, captions, timestamps, and basic metadata from public YouTube videos.

What does YouTube Transcript Scraper do?

YouTube Transcript Scraper turns public YouTube captions into structured data.

It accepts YouTube watch URLs, Shorts URLs, youtu.be links, embed URLs, live URLs, or raw video IDs.

For every video, it returns a dataset row with the video ID, URL, title, channel name, caption language, transcript text, and optional timestamped segments.

Videos without public captions are handled gracefully with a clear error message instead of failing the whole run.

Who is it for?

SEO and content teams

Use transcripts to repurpose videos into briefs, blog drafts, quote libraries, keyword research, and content audits.

Researchers and analysts

Collect spoken content from public videos for media monitoring, qualitative research, public-interest analysis, or education datasets.

LLM and RAG builders

Create clean text chunks from public video captions for search, summarization, classification, embeddings, and retrieval workflows.

Sales and marketing teams

Extract talks, interviews, demos, webinars, and competitor videos into searchable text for faster review.

Journalists and fact checkers

Create searchable transcript records for public videos, speeches, interviews, and announcements.

Why use this actor?

  • ✅ Structured output with one row per video
  • ✅ Transcript text plus timestamped segments
  • ✅ Public caption language selection
  • ✅ Graceful handling for videos with no captions
  • ✅ Works with video URLs and video IDs
  • ✅ Low-cost runs for captioned public videos
  • ✅ Output ready for spreadsheets, APIs, and AI workflows

What data can I extract?

FieldDescription
videoIdYouTube video ID
videoUrlCanonical watch URL
titlePublic video title when available
channelNamePublic channel name when available
languageCaption language used
isAutoGeneratedWhether the selected captions appear auto-generated
transcriptTextFull transcript as one clean text string
segmentsTimestamped transcript segments
durationVideo duration in seconds when available
thumbnailUrlPublic thumbnail URL when available
captionsAvailableWhether public caption tracks were found
errorExplanation for unavailable captions or failed videos

How much does it cost to extract YouTube transcripts?

The actor uses pay-per-event pricing.

There is a small start event for each run and a per-transcript event for every successful transcript extracted.

A first test with one or two videos is inexpensive.

For high-volume work, run batches of known captioned public videos to keep cost predictable.

Final tiered pricing is set on Apify before publication and is visible on the actor page.

How to use YouTube Transcript Scraper

  1. Open the actor on Apify.
  2. Paste one or more public YouTube video URLs.
  3. Optionally set a preferred caption language such as en, es, or de.
  4. Choose whether to include timestamped segments.
  5. Click Start.
  6. Download the dataset as JSON, CSV, Excel, XML, or HTML.

Input example

{
"videoUrls": [
{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }
],
"language": "en",
"includeTimestamps": true,
"includeMetadata": true,
"maxVideos": 5
}

Output example

{
"videoId": "dQw4w9WgXcQ",
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"title": "Example video title",
"channelName": "Example channel",
"language": "en",
"isAutoGenerated": false,
"transcriptText": "We're no strangers to love...",
"segments": [
{ "start": 0.0, "duration": 2.1, "text": "We're no strangers to love" }
],
"duration": 213,
"thumbnailUrl": "https://i.ytimg.com/...",
"captionsAvailable": true
}

Supported YouTube URL formats

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://youtu.be/VIDEO_ID
  • https://www.youtube.com/shorts/VIDEO_ID
  • https://www.youtube.com/embed/VIDEO_ID
  • https://www.youtube.com/live/VIDEO_ID
  • Raw VIDEO_ID values

Caption language selection

Set language to your preferred caption language code.

If that exact language is not available, the actor falls back to a related language variant or the first public caption track.

For example, en may match English captions when present.

The selected language is returned in the language field.

Timestamped transcript segments

Enable includeTimestamps to receive segment-level timing.

Each segment can include:

  • start — segment start in seconds
  • duration — segment length in seconds
  • text — spoken caption text

Disable timestamps when you only need the combined transcript text.

Metadata fields

Enable includeMetadata to include public video details when available.

Metadata can include title, channel name, duration, and thumbnail URL.

Some unavailable or restricted videos may return less metadata.

Handling videos with no captions

Not every public YouTube video has public captions.

When captions are unavailable, the actor still saves a row with:

  • captionsAvailable: false
  • the video ID and URL
  • an error message explaining what happened

This makes batch runs easier to audit because one bad video does not stop the rest of the run.

Tips for best results

  • ✅ Use public videos with captions enabled.
  • ✅ Start with a small batch to confirm your input format.
  • ✅ Use language when you need a specific caption language.
  • ✅ Keep maxVideos low for testing and increase it after validation.
  • ✅ Check captionsAvailable before using transcript text in automated workflows.

Integrations

You can connect the dataset to:

  • Google Sheets for editorial review
  • Zapier or Make for automations
  • Vector databases for embeddings and retrieval
  • BI tools for media analysis
  • Internal dashboards for monitoring public video content
  • LLM workflows for summarization, tagging, and question answering

API usage with Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('fetch_cat/youtube-transcript-scraper').call({
videoUrls: [{ url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' }],
language: 'en',
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].transcriptText);

API usage with Python

from apify_client import ApifyClient
import os
client = ApifyClient(os.environ['APIFY_TOKEN'])
run = client.actor('fetch_cat/youtube-transcript-scraper').call(run_input={
'videoUrls': [{'url': 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'}],
'language': 'en',
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items[0].get('transcriptText'))

API usage with cURL

curl -X POST "https://api.apify.com/v2/acts/fetch_cat~youtube-transcript-scraper/runs?token=$APIFY_TOKEN" \
-H 'Content-Type: application/json' \
-d '{"videoUrls":[{"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],"language":"en"}'

MCP and AI agent usage

Use this actor through Apify MCP when you want an AI assistant to fetch public video transcripts.

MCP server URL pattern:

https://mcp.apify.com/?tools=fetch_cat/youtube-transcript-scraper

Claude Code setup:

$claude mcp add apify-youtube-transcripts --transport http --url "https://mcp.apify.com/?tools=fetch_cat/youtube-transcript-scraper"

Claude Desktop JSON config:

{
"mcpServers": {
"apify-youtube-transcripts": {
"url": "https://mcp.apify.com/?tools=fetch_cat/youtube-transcript-scraper"
}
}
}

Example prompts:

  • "Extract the transcript from this public YouTube video and summarize the key claims."
  • "Get transcripts for these five public webinar URLs and make a topic table."
  • "Find quotes in this public interview transcript about pricing."

Common use cases

  • Video-to-blog repurposing
  • Public webinar transcript extraction
  • Research corpus creation
  • Podcast-style YouTube episode analysis
  • Competitive content monitoring
  • Training data preparation from public captions
  • Subtitle QA and language availability checks

Limitations

This actor extracts public captions only.

It cannot access private videos, members-only videos, deleted videos, region-blocked content unavailable to the runner, or videos without public caption tracks.

Transcript quality depends on the caption track provided for the public video.

Auto-generated captions may contain recognition errors.

Legality and responsible use

Use this actor only for content you are allowed to access and process.

YouTube videos and captions may be protected by copyright or platform terms.

You are responsible for ensuring that your use case, storage, redistribution, and analysis comply with applicable laws, platform rules, and rights-holder requirements.

FAQ

Does it work without a YouTube account?

Yes, the actor is designed for public videos and public caption tracks.

Can it extract transcripts from private videos?

No. Private, members-only, deleted, or otherwise inaccessible videos are outside scope.

Why did a video return captionsAvailable=false?

The video may not have public captions, the video may be unavailable, or YouTube may not expose captions for that video.

Can I choose a language?

Yes. Use the language input with a language code such as en, es, fr, or de.

Are timestamps included?

Yes, when includeTimestamps is enabled.

Why are captions imperfect?

Some videos use auto-generated captions. These can include speech-recognition mistakes.

Troubleshooting

My run succeeded but transcript text is empty

Check the error and captionsAvailable fields. The video probably has no public caption track.

My preferred language was not returned

The requested language may not be available. The actor falls back to another public caption track when needed.

Explore related actors from this account:

  • reddit-scraper
  • tiktok-comments-scraper
  • website-change-monitor

Changelog

0.1

Initial version with public YouTube transcript extraction, timestamped segments, language selection, metadata fields, and graceful no-caption handling.

Support

If a public captioned video fails unexpectedly, provide the video URL, input JSON, and run ID so the issue can be reproduced.