# YouTube Transcript Scraper (Multiple Language) (`dead00/youtube-transcript-scraper-multiple-language`) Actor

A powerful actor that extracts transcripts/captions from YouTube videos with built-in translation support for 100+ languages.

- **URL**: https://apify.com/dead00/youtube-transcript-scraper-multiple-language.md
- **Developed by:** [Dead](https://apify.com/dead00) (community)
- **Categories:** Videos, Developer tools, AI
- **Stats:** 3 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$20.00/month + usage

To use this Actor, you pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period.You also pay for the Apify platform usage, which gets cheaper the higher Apify subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#rental-actors

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript Scraper

A powerful actor that extracts transcripts/captions from YouTube videos with built-in translation support for 100+ languages.

### 🌟 Features

- **Extract YouTube Transcripts**: Get captions/subtitles from any YouTube video
- **Multi-Language Translation**: Translate transcripts to 100+ languages using free Google Translate
- **Batch Processing**: Process multiple videos in a single run
- **Smart Caption Selection**: Automatically finds the best available captions
- **Multiple Output Formats**: Get results in JSON or plain text format
- **Proxy Support**: Built-in Apify proxy support to avoid IP blocking
- **Fast Translation**: Optimized batch translation for speed (5-10x faster than individual translation)
- **Progress Tracking**: Clean progress indicators to monitor translation status

### 📋 Input

The actor accepts the following input parameters:

#### Required Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `videos` | Array of strings | List of YouTube video URLs or video IDs |

#### Optional Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `translate_to` | String | `""` (none) | Target language code for translation (e.g., "en", "es", "hi"). Leave empty for original language |
| `output_format` | String | `"json"` | Output format: `"json"` (structured) or `"txt"` (plain text) |
| `proxyConfiguration` | Object | Residential proxy enabled | Proxy settings for YouTube requests |
| `delay_seconds` | Integer | `2` | Delay in seconds between processing videos (0-60) |

### 🚀 Usage

#### Input Example

```json
{
  "videos": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/jNQXAC9IVRw",
    "dQw4w9WgXcQ"
  ],
  "translate_to": "en",
  "output_format": "json",
  "delay_seconds": 2
}
````

#### Output Example (JSON Format)

```json
{
  "video_id": "dQw4w9WgXcQ",
  "video_url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
  "transcript": [
    {
      "text": "We're no strangers to love",
      "start": 0.0,
      "duration": 3.5
    },
    {
      "text": "You know the rules and so do I",
      "start": 3.5,
      "duration": 4.2
    }
  ],
  "output_format": "json",
  "metadata": {
    "available_languages": [
      {
        "language": "English",
        "code": "en",
        "type": "auto-generated"
      }
    ],
    "selected_language": "en",
    "translated_to": "es",
    "translation_attempted": true,
    "translation_success": true,
    "translation_method": "Google Translate (deep-translator)"
  },
  "status": "success"
}
```

#### Output Example (Text Format)

```json
{
  "video_id": "dQw4w9WgXcQ",
  "video_url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
  "transcript": "We're no strangers to love\nYou know the rules and so do I\nA full commitment's what I'm thinking of...",
  "output_format": "txt",
  "metadata": { ... },
  "status": "success"
}
```

#### And 80+ more languages!

\[See full language list in the input schema]

### 🔧 How It Works

1. **Video Processing**: The actor extracts video IDs from URLs or accepts direct video IDs
2. **Caption Discovery**: Searches for available captions/transcripts in the video
3. **Caption Retrieval**: Fetches the best available caption (auto-generated or manual)
4. **Translation** (if enabled): Translates captions using Google Translate API
   - Uses batch translation for speed (20 segments per batch)
   - Progress tracking at 10%, 25%, 50%, 75%, 90%, 100%
   - Fallback to individual translation if batch fails
5. **Output Formatting**: Returns data in JSON or text format

### ⚡ Performance

- **Without Translation**: ~1-2 seconds per video
- **With Translation**:
  - Small videos (50-100 segments): ~5-10 seconds
  - Medium videos (500-1000 segments): ~30-60 seconds
  - Large videos (2000+ segments): ~2-3 minutes

**Optimization**: Batch translation makes it **5-10x faster** than translating individual segments!

### 🛡️ Proxy Configuration

#### Recommended Settings (Default)

```json
{
  "useApifyProxy": true,
  "apifyProxyGroups": ["RESIDENTIAL"],
  "apifyProxyCountry": "US"
}
```

**Why use proxies?**

- YouTube may block IP addresses making too many requests
- Residential proxies are recommended to avoid detection
- The actor rotates proxies between videos automatically

### 📊 Use Cases

- **Content Analysis**: Analyze video content at scale
- **Accessibility**: Create captions for videos that don't have them
- **Translation**: Translate video content to reach global audiences
- **Research**: Extract data from educational or documentary videos
- **SEO**: Generate text content from video for search optimization
- **Subtitles**: Create subtitle files for videos
- **Data Mining**: Extract information from video tutorials or courses

### ⚠️ Limitations

1. **Caption Availability**: Videos must have captions/subtitles available (auto-generated or manual)
2. **Translation Quality**: Uses Google Translate - quality varies by language pair
3. **Rate Limiting**: Free Google Translate may have rate limits for heavy usage
4. **Video Access**: Cannot access private or age-restricted videos
5. **Disabled Captions**: Some videos have captions disabled by the creator

### 🐛 Error Handling

The actor handles various error cases:

| Error | Reason | Solution |
|-------|--------|----------|
| "Transcripts disabled" | Video creator disabled captions | Try another video |
| "Video unavailable" | Video is private/deleted | Check video URL |
| "No transcripts available" | No captions exist for this video | YouTube may add auto-captions later |
| "Translation failed" | Translation service error | Original captions will be returned |

### 💡 Tips for Best Results

1. **Use Residential Proxies**: Prevents IP blocking from YouTube
2. **Add Delay Between Videos**: Set `delay_seconds` to 2-5 for better reliability
3. **Batch Processing**: Process multiple videos in one run for efficiency
4. **Check Metadata**: The output includes info about available languages and translation status
5. **JSON Format**: Use JSON format if you need timestamps and structured data
6. **Text Format**: Use text format for simple transcript reading

# Actor input Schema

## `videos` (type: `array`):

List of YouTube video URLs or video IDs to scrape transcripts from

## `output_format` (type: `string`):

Format for the transcript output

## `translate_to` (type: `string`):

Select a language to translate captions to. Leave as 'None' to get captions in their original language.

## `proxyConfiguration` (type: `object`):

Select proxies to use for YouTube requests. Residential proxies are highly recommended to avoid YouTube IP blocking.

## `delay_seconds` (type: `integer`):

Delay in seconds between processing videos to avoid rate limiting

## Actor input object example

```json
{
  "videos": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/jNQXAC9IVRw",
    "dQw4w9WgXcQ"
  ],
  "output_format": "txt",
  "translate_to": "",
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "US"
  },
  "delay_seconds": 2
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "videos": [
        "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    ],
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ],
        "apifyProxyCountry": "US"
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("dead00/youtube-transcript-scraper-multiple-language").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "videos": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
        "apifyProxyCountry": "US",
    },
}

# Run the Actor and wait for it to finish
run = client.actor("dead00/youtube-transcript-scraper-multiple-language").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "videos": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "US"
  }
}' |
apify call dead00/youtube-transcript-scraper-multiple-language --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=dead00/youtube-transcript-scraper-multiple-language",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript Scraper (Multiple Language)",
        "description": "A powerful actor that extracts transcripts/captions from YouTube videos with built-in translation support for 100+ languages.",
        "version": "0.0",
        "x-build-id": "A2FIltE1ldppUSpa9"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/dead00~youtube-transcript-scraper-multiple-language/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-dead00-youtube-transcript-scraper-multiple-language",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/dead00~youtube-transcript-scraper-multiple-language/runs": {
            "post": {
                "operationId": "runs-sync-dead00-youtube-transcript-scraper-multiple-language",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/dead00~youtube-transcript-scraper-multiple-language/run-sync": {
            "post": {
                "operationId": "run-sync-dead00-youtube-transcript-scraper-multiple-language",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "videos"
                ],
                "properties": {
                    "videos": {
                        "title": "YouTube Videos",
                        "type": "array",
                        "description": "List of YouTube video URLs or video IDs to scrape transcripts from",
                        "items": {
                            "type": "string"
                        }
                    },
                    "output_format": {
                        "title": "Output Format",
                        "enum": [
                            "txt"
                        ],
                        "type": "string",
                        "description": "Format for the transcript output",
                        "default": "txt"
                    },
                    "translate_to": {
                        "title": "Translate To Language",
                        "enum": [
                            "",
                            "en",
                            "es",
                            "hi",
                            "fr",
                            "de",
                            "pt",
                            "it",
                            "ru",
                            "ja",
                            "ko",
                            "zh-CN",
                            "zh-TW",
                            "ar",
                            "bn",
                            "nl",
                            "tr",
                            "pl",
                            "vi",
                            "th",
                            "id",
                            "uk",
                            "ro",
                            "el",
                            "cs",
                            "sv",
                            "hu",
                            "da",
                            "fi",
                            "no",
                            "sk",
                            "bg",
                            "hr",
                            "ms",
                            "fa",
                            "he",
                            "ur",
                            "ta",
                            "te",
                            "mr",
                            "ml",
                            "kn",
                            "gu",
                            "pa",
                            "af",
                            "sq",
                            "am",
                            "hy",
                            "az",
                            "eu",
                            "be",
                            "bs",
                            "ca",
                            "ceb",
                            "ny",
                            "co",
                            "eo",
                            "et",
                            "tl",
                            "fy",
                            "gl",
                            "ka",
                            "ht",
                            "ha",
                            "haw",
                            "hmn",
                            "is",
                            "ig",
                            "ga",
                            "jw",
                            "kk",
                            "km",
                            "rw",
                            "ku",
                            "ky",
                            "lo",
                            "la",
                            "lv",
                            "lt",
                            "lb",
                            "mk",
                            "mg",
                            "mt",
                            "mi",
                            "mn",
                            "my",
                            "ne",
                            "ps",
                            "sm",
                            "gd",
                            "sr",
                            "st",
                            "sn",
                            "sd",
                            "si",
                            "sl",
                            "so",
                            "su",
                            "sw",
                            "tg",
                            "tt",
                            "tk",
                            "cy",
                            "ug",
                            "uz",
                            "xh",
                            "yi",
                            "yo",
                            "zu"
                        ],
                        "type": "string",
                        "description": "Select a language to translate captions to. Leave as 'None' to get captions in their original language.",
                        "default": ""
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Select proxies to use for YouTube requests. Residential proxies are highly recommended to avoid YouTube IP blocking.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    },
                    "delay_seconds": {
                        "title": "Delay Between Videos (seconds)",
                        "minimum": 0,
                        "maximum": 60,
                        "type": "integer",
                        "description": "Delay in seconds between processing videos to avoid rate limiting",
                        "default": 2
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
