# YouTube Transcript & Subtitles Scraper - No API Key Required (`george.the.developer/youtube-transcript-scraper`) Actor

Download YouTube video transcripts and subtitles in bulk. No API key needed. Supports any language and auto-generated captions.

自动提取YouTube视频字幕和转录文本。支持多语言，自动生成字幕，包含时间戳。154位用户，1,700+次运行。适合研究人员、内容创作者和AI训练数据收集。每次提取仅需$0.005。

- **URL**: https://apify.com/george.the.developer/youtube-transcript-scraper.md
- **Developed by:** [George Kioko](https://apify.com/george.the.developer) (community)
- **Categories:** AI, Videos
- **Stats:** 363 total users, 77 monthly users, 99.8% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript & Subtitles Scraper

### What does it do?

The YouTube Transcript Scraper extracts transcripts, subtitles, and captions from any YouTube video in bulk -- no API key or YouTube Data API quota required. It supports auto-generated captions, manually uploaded subtitles, and any language, making it the most reliable tool for extracting spoken content from YouTube videos, playlists, channels, and Shorts at scale. Feed the output directly into LLMs, RAG pipelines, or content repurposing workflows.

### 简介

自动提取YouTube视频字幕和转录文本。支持多语言，自动生成字幕，包含时间戳。154位用户。适合研究人员、内容创作者和AI训练数据收集。

### What data does it extract?

- **Full transcript text** -- complete spoken content of the video as plain text
- **Timed segments** -- individual caption segments with start time, end time, and duration
- **Video title** -- the title of the YouTube video
- **Channel name** -- the channel that published the video
- **Video URL** -- direct link to the source video
- **View count** -- total number of views
- **Upload date** -- when the video was published
- **Video duration** -- total length of the video
- **Language** -- detected or selected transcript language
- **Thumbnail URL** -- video thumbnail image
- **Description** -- video description text

### Use cases

1. **RAG and LLM fine-tuning** -- Extract transcripts from hundreds of educational or domain-specific YouTube videos to build a knowledge base for retrieval-augmented generation (RAG). Use the structured text to fine-tune language models on specialized topics like finance, medicine, or engineering.

2. **Content repurposing at scale** -- Convert YouTube video content into blog posts, social media threads, newsletters, or podcast show notes. Marketing teams use this to transform a single video into 10+ pieces of written content across platforms.

3. **Video SEO and competitor analysis** -- Analyze the spoken content of top-ranking YouTube videos in your niche. Identify keyword patterns, topic coverage, and content gaps to optimize your own video scripts and descriptions for better search rankings.

### How to use

1. Navigate to the [YouTube Transcript Scraper](https://apify.com/george.the.developer/youtube-transcript-scraper) on Apify Store and click "Try for free."
2. In the **URLs** field, paste YouTube video URLs, playlist URLs, channel URLs, or raw video IDs. You can mix and match formats.
3. Select your preferred **Language** (default: English). The scraper will fall back to available languages if your preference is not available.
4. Choose an **Output Format**: `full-text` (plain text block), `segments` (timestamped chunks), or `both`.
5. Toggle **Include Timestamps** and **Include Metadata** as needed.
6. Click **Start**. Transcripts are extracted and saved to the Dataset tab.
7. Export results in JSON, CSV, or Excel, or integrate via API for automated pipelines.

### Input parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `urls` | Array | Yes | YouTube video URLs, playlist URLs, channel URLs, or video IDs |
| `language` | String | No | Preferred transcript language code (default: `en`) |
| `outputFormat` | Enum | No | `full-text`, `segments`, or `both` (default: `both`) |
| `includeTimestamps` | Boolean | No | Include start/end times for each segment (default: `true`) |
| `maxVideos` | Integer | No | Maximum videos to process, up to 5,000 (default: 50) |
| `includeMetadata` | Boolean | No | Include video title, channel, views, etc. (default: `true`) |
| `maxConcurrency` | Integer | No | Concurrent requests, 1-20 (default: 5) |
| `proxyConfiguration` | Object | No | Apify Proxy country routing only. The actor always uses `BUYPROXIES94952`; custom proxy URLs and alternate groups are ignored |

### Output example

```json
{
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "title": "How to Build a RAG Pipeline in 2026",
  "channel": "AI Engineering Academy",
  "viewCount": 245000,
  "uploadDate": "2026-02-15",
  "duration": "14:32",
  "language": "en",
  "fullText": "Welcome to this tutorial on building a retrieval-augmented generation pipeline. Today we'll cover vector databases, embedding models, and...",
  "segments": [
    {
      "text": "Welcome to this tutorial on building a retrieval-augmented generation pipeline.",
      "start": 0.0,
      "end": 4.2,
      "duration": 4.2
    },
    {
      "text": "Today we'll cover vector databases, embedding models, and chunking strategies.",
      "start": 4.2,
      "end": 8.8,
      "duration": 4.6
    }
  ],
  "thumbnailUrl": "https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg"
}
````

### Pricing

- **Start event**: $0.005 per run
- **Per transcript**: $0.004 per video transcript extracted

Approximate cost: **$4 per 1,000 transcripts**. No API key, no YouTube Data API quota, and no monthly subscription -- pay only for successful extractions.

### FAQ

**Q: Do I need a YouTube Data API key?**
A: No. This scraper works without any API key or Google account. It extracts transcripts directly, bypassing YouTube API quotas entirely.

**Q: Does it work with auto-generated captions?**
A: Yes. The scraper handles both manually uploaded subtitles and YouTube's auto-generated captions in any language.

**Q: Can I scrape entire playlists or channels?**
A: Yes. Pass a playlist URL or channel URL and the scraper will automatically discover and process all videos, up to your configured `maxVideos` limit.

**Q: What languages are supported?**
A: All languages that YouTube provides transcripts for are supported. Set your preferred language code and the scraper will use it if available, or fall back to the best available alternative.

**Q: How do I handle geo-restricted videos?**
A: Use the `proxyConfiguration.countryCode` parameter to route requests through Apify Proxy in the appropriate country. The actor always enforces the `BUYPROXIES94952` proxy group for reliability.

**Q: Can I use this for LLM training data?**
A: Yes. The full-text output format is ideal for LLM fine-tuning datasets. Process up to 5,000 videos per run to build large-scale training corpora.

### Why choose this over alternatives?

- **No API key needed** -- Zero setup friction. No Google Cloud project, no API quota limits, no OAuth tokens.
- **Massive scale** -- Process up to 5,000 videos per run with configurable concurrency up to 20 parallel requests.
- **Multiple input formats** -- Videos, playlists, channels, Shorts, and raw video IDs all accepted in a single run.
- **LLM-ready output** -- Full-text and segmented formats designed for direct ingestion into RAG pipelines and fine-tuning workflows.
- **97%+ success rate** -- Proven across 327+ runs with 52 users and 28 active users in the last 7 days.
- **Auto-generated caption support** -- Works even when video creators haven't uploaded manual subtitles.

# Actor input Schema

## `urls` (type: `array`):

List of YouTube video URLs, playlist URLs, or channel URLs. Supports: youtube.com/watch?v=, youtu.be/, youtube.com/playlist?list=, youtube.com/@channel, youtube.com/shorts/

## `language` (type: `string`):

Preferred transcript language code (e.g., 'en', 'es', 'fr', 'de', 'ja'). Falls back to auto-generated captions if manual captions unavailable. Leave empty for default language.

## `includeTimestamps` (type: `boolean`):

Include start time and duration for each transcript segment. Useful for building video indexes or jumping to specific parts.

## `outputFormat` (type: `string`):

How to format the transcript text. 'full-text' = single concatenated string (best for AI/LLM input). 'segments' = array of timestamped segments. 'both' = includes both formats.

## `maxVideos` (type: `integer`):

Maximum number of videos to process. Useful when scraping playlists or channels to limit costs.

## `includeMetadata` (type: `boolean`):

Include video title, channel name, description, view count, publish date, duration, and tags alongside the transcript.

## `maxConcurrency` (type: `integer`):

Maximum number of concurrent requests. Higher = faster but more likely to hit rate limits.

## `proxyConfiguration` (type: `object`):

Apify Proxy country routing only. This actor always uses BUYPROXIES94952; custom proxy URLs and alternate groups are ignored.

## Actor input object example

```json
{
  "urls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "language": "en",
  "includeTimestamps": true,
  "outputFormat": "both",
  "maxVideos": 50,
  "includeMetadata": true,
  "maxConcurrency": 5,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "BUYPROXIES94952"
    ]
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    ],
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "BUYPROXIES94952"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("george.the.developer/youtube-transcript-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["BUYPROXIES94952"],
    },
}

# Run the Actor and wait for it to finish
run = client.actor("george.the.developer/youtube-transcript-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "BUYPROXIES94952"
    ]
  }
}' |
apify call george.the.developer/youtube-transcript-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=george.the.developer/youtube-transcript-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript & Subtitles Scraper - No API Key Required",
        "description": "Download YouTube video transcripts and subtitles in bulk. No API key needed. Supports any language and auto-generated captions.\n\n自动提取YouTube视频字幕和转录文本。支持多语言，自动生成字幕，包含时间戳。154位用户，1,700+次运行。适合研究人员、内容创作者和AI训练数据收集。每次提取仅需$0.005。",
        "version": "1.0",
        "x-build-id": "zxGUTNFsseG5YwD7g"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/george.the.developer~youtube-transcript-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-george.the.developer-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/george.the.developer~youtube-transcript-scraper/runs": {
            "post": {
                "operationId": "runs-sync-george.the.developer-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/george.the.developer~youtube-transcript-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-george.the.developer-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "YouTube URLs",
                        "type": "array",
                        "description": "List of YouTube video URLs, playlist URLs, or channel URLs. Supports: youtube.com/watch?v=, youtu.be/, youtube.com/playlist?list=, youtube.com/@channel, youtube.com/shorts/",
                        "items": {
                            "type": "string"
                        }
                    },
                    "language": {
                        "title": "Preferred Language",
                        "type": "string",
                        "description": "Preferred transcript language code (e.g., 'en', 'es', 'fr', 'de', 'ja'). Falls back to auto-generated captions if manual captions unavailable. Leave empty for default language.",
                        "default": "en"
                    },
                    "includeTimestamps": {
                        "title": "Include Timestamps",
                        "type": "boolean",
                        "description": "Include start time and duration for each transcript segment. Useful for building video indexes or jumping to specific parts.",
                        "default": true
                    },
                    "outputFormat": {
                        "title": "Output Format",
                        "enum": [
                            "full-text",
                            "segments",
                            "both"
                        ],
                        "type": "string",
                        "description": "How to format the transcript text. 'full-text' = single concatenated string (best for AI/LLM input). 'segments' = array of timestamped segments. 'both' = includes both formats.",
                        "default": "both"
                    },
                    "maxVideos": {
                        "title": "Max Videos",
                        "minimum": 1,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Maximum number of videos to process. Useful when scraping playlists or channels to limit costs.",
                        "default": 50
                    },
                    "includeMetadata": {
                        "title": "Include Video Metadata",
                        "type": "boolean",
                        "description": "Include video title, channel name, description, view count, publish date, duration, and tags alongside the transcript.",
                        "default": true
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrency",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Maximum number of concurrent requests. Higher = faster but more likely to hit rate limits.",
                        "default": 5
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Apify Proxy country routing only. This actor always uses BUYPROXIES94952; custom proxy URLs and alternate groups are ignored."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```