# Youtube Transcript Scraper (`thedoor/youtube-transcript-scraper`) Actor

Extract full YouTube transcripts instantly. Bulk video support, precise timestamps, and multiple export formats (CSV, Excel, JSON). Perfect for AI training, SEO, and content analysis.

- **URL**: https://apify.com/thedoor/youtube-transcript-scraper.md
- **Developed by:** [TheDoor](https://apify.com/thedoor) (community)
- **Categories:** Videos, Social media, News
- **Stats:** 26 total users, 4 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $0.70 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript Scraper

Fetch transcripts and captions from YouTube videos using Apify Proxy with session management and automatic retry logic.

### Input

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `videos` | array | ✅ | - | List of videos with URLs and preferred languages |
| `includeTimestamps` | boolean | ❌ | `true` | Include start time and duration for each snippet |

#### Video Object

Each video in the `videos` array has:

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `url` | string | ✅ | - | YouTube video URL (supports regular videos, shorts, youtu.be) |
| `languages` | array | ❌ | `["en"]` | Preferred transcript languages for this video |

#### Example Input

```json
{
    "videos": [
        {
            "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
            "languages": ["en"]
        },
        {
            "url": "https://www.youtube.com/watch?v=9bZkp7q19f0",
            "languages": ["ko", "en"]
        },
        {
            "url": "https://www.youtube.com/shorts/97IwoIqBCZk",
            "languages": ["en"]
        },
        {
            "url": "https://youtu.be/kJQP7kiw5Fk",
            "languages": ["es", "en"]
        }
    ],
    "includeTimestamps": true
}
````

### Supported URL Formats

- Regular: `https://www.youtube.com/watch?v=VIDEO_ID`
- Shorts: `https://www.youtube.com/shorts/VIDEO_ID`
- Short link: `https://youtu.be/VIDEO_ID`

### Supported Languages

The actor supports any language available on the YouTube video. Common language codes:

| Code | Language |
|------|----------|
| `en` | English |
| `es` | Spanish |
| `fr` | French |
| `de` | German |
| `pt` | Portuguese |
| `ja` | Japanese |
| `ko` | Korean |
| `zh` | Chinese |
| `ar` | Arabic |
| `hi` | Hindi |
| `vi` | Vietnamese |

If the preferred language is not available, the actor will fall back to the next language in the list or the video's default transcript.

### Output Format

The actor outputs a JSON object for each video with the following fields:

| Field | Type | Description |
|-------|------|-------------|
| `videoId` | string | YouTube video ID extracted from URL |
| `videoUrl` | string | Full YouTube video URL |
| `language` | string | Language of the fetched transcript |
| `isGenerated` | boolean | `true` if auto-generated captions, `false` if manually created |
| `transcriptText` | string | Full transcript text (with timestamps if enabled) |
| `snippetCount` | number | Total number of transcript snippets |
| `snippets` | array | Array of snippet objects (only when `includeTimestamps: true`) |
| `success` | boolean | `true` if transcript was fetched successfully |
| `error` | string | Error message (only present when `success: false`) |

#### Example Output (with timestamps)

```json
{
    "videoId": "dQw4w9WgXcQ",
    "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "language": "en",
    "isGenerated": true,
    "transcriptText": "[0.00s] Hello world\n[2.50s] Welcome to the video",
    "snippetCount": 2,
    "snippets": [
        {"text": "Hello world", "start": 0.0, "duration": 2.5},
        {"text": "Welcome to the video", "start": 2.5, "duration": 3.0}
    ],
    "success": true
}
```

#### Example Output (without timestamps)

```json
{
    "videoId": "dQw4w9WgXcQ",
    "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "language": "en",
    "isGenerated": false,
    "transcriptText": "Hello world Welcome to the video",
    "snippetCount": 2,
    "snippets": null,
    "success": true
}
```

### Recommended Memory

512 MB (default)

### Scraping Policy

This actor respects YouTube's terms of service and is designed for legitimate use cases:

- ✅ Fetches only publicly available transcripts/captions
- ✅ Does not bypass any authentication or access controls
- ✅ Uses rate limiting and proxy rotation to avoid overloading servers
- ✅ Intended for research, accessibility, content analysis, and archival purposes

**Do not use this actor to:**

- ❌ Scrape private or unlisted video transcripts without permission
- ❌ Violate YouTube's Terms of Service
- ❌ Redistribute copyrighted content without authorization

### Apify Platform Policy

This actor runs on the [Apify platform](https://apify.com) and adheres to:

- [Apify Terms of Service](https://apify.com/terms-of-service)
- [Apify Acceptable Use Policy](https://apify.com/acceptable-use-policy)
- [Apify Privacy Policy](https://apify.com/privacy-policy)

Users are responsible for ensuring their use of this actor complies with all applicable laws and the terms of service of both Apify and YouTube.

### License

This project is licensed under the **MIT License**.

```
MIT License

Copyright (c) 2024

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```

### Support

For issues or feature requests, please open an issue on the actor's GitHub repository or contact the author through Apify Console.

# Actor input Schema

## `videos` (type: `array`):

List of videos with their URLs and preferred languages

## `includeTimestamps` (type: `boolean`):

Include start time and duration for each transcript snippet

## Actor input object example

```json
{
  "videos": [
    {
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
      "languages": [
        "en"
      ]
    },
    {
      "url": "https://www.youtube.com/watch?v=9bZkp7q19f0",
      "languages": [
        "ko",
        "en"
      ]
    },
    {
      "url": "https://www.youtube.com/shorts/97IwoIqBCZk",
      "languages": [
        "en"
      ]
    },
    {
      "url": "https://www.youtube.com/watch?v=JGwWNGJdvx8",
      "languages": [
        "en",
        "es"
      ]
    },
    {
      "url": "https://youtu.be/kJQP7kiw5Fk",
      "languages": [
        "es",
        "en"
      ]
    }
  ],
  "includeTimestamps": true
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "videos": [
        {
            "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
            "languages": [
                "en"
            ]
        },
        {
            "url": "https://www.youtube.com/watch?v=9bZkp7q19f0",
            "languages": [
                "ko",
                "en"
            ]
        },
        {
            "url": "https://www.youtube.com/shorts/97IwoIqBCZk",
            "languages": [
                "en"
            ]
        },
        {
            "url": "https://www.youtube.com/watch?v=JGwWNGJdvx8",
            "languages": [
                "en",
                "es"
            ]
        },
        {
            "url": "https://youtu.be/kJQP7kiw5Fk",
            "languages": [
                "es",
                "en"
            ]
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("thedoor/youtube-transcript-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "videos": [
        {
            "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
            "languages": ["en"],
        },
        {
            "url": "https://www.youtube.com/watch?v=9bZkp7q19f0",
            "languages": [
                "ko",
                "en",
            ],
        },
        {
            "url": "https://www.youtube.com/shorts/97IwoIqBCZk",
            "languages": ["en"],
        },
        {
            "url": "https://www.youtube.com/watch?v=JGwWNGJdvx8",
            "languages": [
                "en",
                "es",
            ],
        },
        {
            "url": "https://youtu.be/kJQP7kiw5Fk",
            "languages": [
                "es",
                "en",
            ],
        },
    ] }

# Run the Actor and wait for it to finish
run = client.actor("thedoor/youtube-transcript-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "videos": [
    {
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
      "languages": [
        "en"
      ]
    },
    {
      "url": "https://www.youtube.com/watch?v=9bZkp7q19f0",
      "languages": [
        "ko",
        "en"
      ]
    },
    {
      "url": "https://www.youtube.com/shorts/97IwoIqBCZk",
      "languages": [
        "en"
      ]
    },
    {
      "url": "https://www.youtube.com/watch?v=JGwWNGJdvx8",
      "languages": [
        "en",
        "es"
      ]
    },
    {
      "url": "https://youtu.be/kJQP7kiw5Fk",
      "languages": [
        "es",
        "en"
      ]
    }
  ]
}' |
apify call thedoor/youtube-transcript-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=thedoor/youtube-transcript-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Youtube Transcript Scraper",
        "description": "Extract full YouTube transcripts instantly. Bulk video support, precise timestamps, and multiple export formats (CSV, Excel, JSON). Perfect for AI training, SEO, and content analysis.",
        "version": "1.0",
        "x-build-id": "VnNYrwn9Q60iEdXSb"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/thedoor~youtube-transcript-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-thedoor-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/thedoor~youtube-transcript-scraper/runs": {
            "post": {
                "operationId": "runs-sync-thedoor-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/thedoor~youtube-transcript-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-thedoor-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "videos"
                ],
                "properties": {
                    "videos": {
                        "title": "Videos",
                        "type": "array",
                        "description": "List of videos with their URLs and preferred languages",
                        "items": {
                            "type": "object",
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "Video URL",
                                    "description": "YouTube video URL"
                                },
                                "languages": {
                                    "type": "array",
                                    "title": "Languages",
                                    "description": "Preferred languages for this video",
                                    "items": {
                                        "type": "string"
                                    }
                                }
                            },
                            "required": [
                                "url"
                            ]
                        }
                    },
                    "includeTimestamps": {
                        "title": "Include Timestamps",
                        "type": "boolean",
                        "description": "Include start time and duration for each transcript snippet",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
