# YouTube Transcript, Comment, and Metadata Scraper (`visita/youtube-scraper`) Actor

This actor scrapes YouTube videos for full transcripts (captions), the first page of comments, and key metadata (title, channel, views, and likes). It can discover videos based on search queries or scrape a specific list of video IDs.

- **URL**: https://apify.com/visita/youtube-scraper.md
- **Developed by:** [Visita Intelligence](https://apify.com/visita) (community)
- **Categories:** Videos
- **Stats:** 91 total users, 11 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $12.00 / 1,000 captions, comments & metadata retrieveds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript, Comment, and Metadata Scraper

This actor scrapes YouTube videos for full transcripts (captions), the first page of comments, and key metadata (title, channel, views, and likes). It can discover videos based on search queries or scrape a specific list of video IDs.

This actor uses a robust hybrid approach:
* **Playwright** is used to load the page, handle popups, scroll, and scrape metadata and comments.
* **`youtube-caption-extractor` library** is used to reliably fetch transcripts directly, avoiding common browser-based scraping failures.

### Features

* Scrapes full video transcripts (captions) in your chosen language.
* Scrapes the first page of comments (approx. 20 comments).
* Scrapes metadata: title, channel, view count, and like count.
* **Discover Mode:** Finds videos to scrape based on search queries.
* **Scrape Mode:** Scrapes a specific, user-provided list of video IDs.

### Input Configuration

The actor's behavior is controlled by the input, which has the following fields:

| Field | Type | Description |
| :--- | :--- | :--- |
| `runMode` | String | **Required.** Choose the actor's operating mode.<br>• **`discover`**: Find new videos using search.<br>• **`scrape`**: Scrape specific videos from `videoIDs`. |
| `discoverConfig` | Object | Configuration for **Discover Mode**. |
| `scrapeConfig` | Object | Configuration for **Scrape Mode**. |
| `lang` | String | The language code for the transcript you want (e.g., `en`, `es`, `fr`). Defaults to `en`. |

#### `discoverConfig` Settings

| Field | Type | Description |
| :--- | :--- | :--- |
| `searchQueries` | Array | **Required.** A list of search terms to find videos. The actor will use the first one. |
| `searchCategory` | String | *Optional.* A category keyword (e.g., "Sport", "News") to append to the search. |
| `uploadDate` | String | *Optional. This filter is not yet implemented in the code.* |
| `videoDuration` | String | *Optional. This filter is not yet implemented in the code.* |
| `maxResultsPerQuery` | Integer | The maximum number of videos to find for the search query. Defaults to `5`. |

#### `scrapeConfig` Settings

| Field | Type | Description |
| :--- | :--- | :--- |
| `videoIDs` | Array | **Required.** A list of YouTube video IDs (e.g., `xZCbAki4puY`) to scrape. |

### Output Structure

The actor saves its results to the dataset, which will be displayed in the **Output** tab. Each item represents one scraped video.

| Field | Type | Description |
| :--- | :--- | :--- |
| `videoId` | String | The unique ID of the scraped video. |
| `title` | String | The full title of the video. |
| `channel` | String | The name of the YouTube channel. |
| `views` | String | The view count (e.g., "1.2M views"). |
| `likes` | String | The like count (e.g., "10K likes"). |
| `transcriptMerged` | String | The full, merged transcript as a single block of text. |
| `comments` | String | A JSON string containing an array of comment objects. Each object has `{ author, text, likes }`. |
| `_chargeStatus` | String | A status message showing what you were charged for (e.g., "Metadata: Charged, Captions: Charged..."). |
| `error` | String | If an error occurred for this video, it will be noted here. |

### Limitations

* **Comments:** The actor currently scrapes only the first page of comments (approx. 20). It does not perform infinite scrolling to load all comments.
* **Discover Filters:** The `uploadDate` and `videoDuration` filters in "Discover Mode" are not yet implemented. The actor will find the top results regardless of these settings.

### 💰 Pricing (Pay-Per-Event)

This actor uses a **Pay-Per-Event (PPE)** pricing model. You pay a tiny fee to start the actor, and then a separate, small fee for each piece of data you *successfully* retrieve for each video.

This gives you granular control over your costs. If you only scrape metadata, you only pay for metadata.

| Event Name | Title | Description |
| :--- | :--- | :--- |
| **`video-data-retrieved`** | Video Data Retrieved | Charged **per video** if metadata, captions (if available), and comments (if available) are successfully scraped. |

# Actor input Schema

## `runMode` (type: `string`):

Choose whether to discover videos by search or scrape specific video URLs.
## `discoverConfig` (type: `object`):

Settings used when discovering videos via YouTube search.
## `scrapeConfig` (type: `object`):

Settings used when scraping known YouTube videos directly.
## `lang` (type: `string`):

Preferred language code for transcript extraction (e.g. en, fr, es).

## Actor input object example

```json
{
  "runMode": "discover",
  "lang": "en"
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("visita/youtube-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("visita/youtube-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call visita/youtube-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=visita/youtube-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript, Comment, and Metadata Scraper",
        "description": "This actor scrapes YouTube videos for full transcripts (captions), the first page of comments, and key metadata (title, channel, views, and likes). It can discover videos based on search queries or scrape a specific list of video IDs.",
        "version": "0.0",
        "x-build-id": "SM2lHQa0wPOIFjifv"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/visita~youtube-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-visita-youtube-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/visita~youtube-scraper/runs": {
            "post": {
                "operationId": "runs-sync-visita-youtube-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/visita~youtube-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-visita-youtube-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "runMode"
                ],
                "properties": {
                    "runMode": {
                        "title": "Run Mode",
                        "enum": [
                            "discover",
                            "scrape"
                        ],
                        "type": "string",
                        "description": "Choose whether to discover videos by search or scrape specific video URLs.",
                        "default": "discover"
                    },
                    "discoverConfig": {
                        "title": "Discover Mode Settings",
                        "type": "object",
                        "description": "Settings used when discovering videos via YouTube search.",
                        "properties": {
                            "searchQueries": {
                                "title": "🔍 Search Queries",
                                "type": "array",
                                "editor": "stringList",
                                "prefill": [
                                    "rugby highlights"
                                ],
                                "description": "List of search terms to find videos on YouTube."
                            },
                            "searchCategory": {
                                "title": "🏷️ Append Category",
                                "type": "string",
                                "enum": [
                                    "",
                                    "News",
                                    "Sport",
                                    "Music",
                                    "Gaming",
                                    "Education"
                                ],
                                "default": "Sport",
                                "description": "Optional category keyword to append to the search query."
                            },
                            "uploadDate": {
                                "title": "📅 Upload Date",
                                "type": "string",
                                "enum": [
                                    "",
                                    "Hour",
                                    "Today",
                                    "This week",
                                    "This month",
                                    "This year"
                                ],
                                "default": "",
                                "description": "Filter videos by upload date."
                            },
                            "videoDuration": {
                                "title": "⏱️ Duration",
                                "type": "string",
                                "enum": [
                                    "",
                                    "Short (<4 mins)",
                                    "Medium (4–20 mins)",
                                    "Long (>20 mins)"
                                ],
                                "default": "",
                                "description": "Filter videos by approximate duration."
                            },
                            "maxResultsPerQuery": {
                                "title": "📊 Video Limit",
                                "type": "integer",
                                "default": 5,
                                "description": "Maximum number of videos to collect per search query."
                            }
                        }
                    },
                    "scrapeConfig": {
                        "title": "Scrape Mode Settings",
                        "type": "object",
                        "description": "Settings used when scraping known YouTube videos directly.",
                        "properties": {
                            "videoIDs": {
                                "title": "📹 YouTube Video IDs",
                                "type": "array",
                                "editor": "stringList",
                                "prefill": [
                                    "xZCbAki4puY"
                                ],
                                "description": "Provide specific YouTube video IDs to scrape directly."
                            }
                        }
                    },
                    "lang": {
                        "title": "🗣️ Transcript Language",
                        "type": "string",
                        "description": "Preferred language code for transcript extraction (e.g. en, fr, es).",
                        "default": "en"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
