# Bluesky Jetstream Scraper (`skyscraping/bluesky-jetstream-scraper`) Actor

Bluesky Social Feed Scraper collects posts from Bluesky's Jetstream API. Filter by hashtags, usernames, or languages to gather targeted data. Includes media attachments, user profiles, and reply context. Perfect for social research, trend analysis, and content monitoring on the platform.

- **URL**: https://apify.com/skyscraping/bluesky-jetstream-scraper.md
- **Developed by:** [june](https://apify.com/skyscraping) (community)
- **Categories:** Social media, Developer tools, Other
- **Stats:** 15 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: 1.00 out of 5 stars

## Pricing

$0.07 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🌊 Bluesky Jetstream Scraper

The Bluesky Jetstream Scraper is a tool built for Apify to collect and analyze real-time data from the Bluesky social network using the ATProto Firehose (Jetstream). This scraper allows you to filter posts by various criteria and customize the output format.

> **🔄 Jetstream vs. Crawling**: This scraper uses Bluesky's Jetstream (firehose) API, which provides a continuous stream of real-time data directly from Bluesky's servers. Unlike traditional crawling methods that make numerous API requests to gather posts (which face rate limits and higher resource usage), the Jetstream approach is more efficient, providing access to the full stream of content as it's created without the limitations of crawling individual endpoints. This makes it ideal for large-scale data collection, trend analysis, and real-time monitoring.
>
> **⚠️ Real-Time vs. Historical**: The Jetstream approach is designed for collecting **current, real-time data only** and is not suitable for historical data collection or analyzing posts over extended periods of time. It captures the content stream as it happens but cannot access posts from the past. If you need historical data analysis or content from specific time periods in the past, you would need to use different methods such as the Bluesky Query API (with appropriate rate limiting).
>
> **📣 Platform Notice**: It's important to note that Bluesky and its API infrastructure are still evolving platforms. API specifications, data formats, and endpoints may change over time. While we strive to keep this scraper up-to-date with any platform changes, users should be aware that occasional updates may be necessary to maintain compatibility as the Bluesky ecosystem continues to develop.

---

### 📋 Input Schema Parameters

This section describes in detail how each input parameter affects the behavior of the scraper and the resulting output.

#### 🔍 Filtering Parameters

##### `hashtags`
- **Type**: Array of strings
- **Description**: A list of hashtags to filter posts by (without the # symbol)
- **Behavior**: The scraper will only collect posts that contain at least one of the specified hashtags. When multiple hashtags are provided, posts matching ANY of these hashtags will be included (OR logic).
- **Example**: If you set `["apify", "scraping"]`, the output will include all posts containing either #apify OR #scraping.

##### `usernames`
- **Type**: Array of strings
- **Description**: A list of Bluesky usernames to filter posts by (will be resolved to DIDs for efficient filtering)
- **Behavior**: The scraper will only collect posts authored by the specified users. When multiple usernames are provided, posts from ANY of these users will be included (OR logic).
- **Example**: If you set `["user1.bsky.social", "user2.bsky.social"]`, the output will include all posts from either user1 OR user2.

##### `languages`
- **Type**: Array of strings
- **Description**: Languages to filter posts by (multiple selection allowed)
- **Behavior**: The scraper will only collect posts in the specified languages. When multiple languages are provided, posts in ANY of these languages will be included (OR logic). If a post doesn't have a language field, the scraper can auto-detect its language (if `detectLanguage` is enabled).
- **Example**: If you set `["en", "pt"]` (English and Portuguese), the output will include all posts in either English OR Portuguese.

##### `wantedCollections`
- **Type**: Array of strings
- **Description**: Specific Bluesky collections to filter from Jetstream (defaults to feed posts)
- **Behavior**: Controls what types of content are collected from the Bluesky firehose. Options include:
  - `app.bsky.feed.post`: Regular posts
  - `app.bsky.feed.like`: Like interactions
  - `app.bsky.feed.repost`: Repost interactions
  - `app.bsky.graph.follow`: Follow relationships
  - `app.bsky.graph.block`: Block relationships
  - `app.bsky.actor.profile`: Profile updates
- **Example**: If you set `["app.bsky.feed.post", "app.bsky.feed.repost"]`, the output will include both original posts AND reposts.

#### 📊 Content Inclusion Parameters

##### `includeMedia`
- **Type**: Boolean
- **Description**: Whether to include URLs for media attachments
- **Behavior**: When set to `true`, the output will include media URLs from posts. When set to `false`, media URLs will be excluded, and `mediaUrl`, `mediaThumbnailUrl` fields will be empty, `hasMedia` will be false, and `mediaCount` will be 0.
- **Example**: If set to `false` with a language filter of `["pt"]`, the output will include Portuguese-language posts but without any media URLs or media-related fields populated.

##### `includeImages`
- **Type**: Boolean
- **Description**: Whether to include URLs for images in the output
- **Behavior**: When set to `true`, the output will include image URLs from posts. When set to `false`, image URLs will be excluded, and `imageUrl` field will be empty, and `hasImages` will be false.
- **Example**: If set to `false`, posts with images will still be included in the output, but image URLs won't be extracted or included in the result fields.

##### `includeReplies`
- **Type**: Boolean
- **Description**: Whether to include reply information in collected posts
- **Behavior**: When set to `true`, the output will include information about which posts are replies, and to which posts they are replying. When set to `false`, this information will be excluded.
- **Example**: If set to `true`, posts that are replies will have `isReply` set to true, along with `replyToRoot` and `replyToParent` fields containing the URIs of the root and parent posts.

#### 🗣️ Language Settings

##### `detectLanguage`
- **Type**: Boolean
- **Description**: Whether to automatically detect the language of posts that don't specify one
- **Behavior**: When set to `true`, the scraper will use language detection to determine the language of posts that don't include language metadata. This is particularly useful when filtering by language. When set to `false`, posts without language metadata will not match any language filter.
- **Example**: If filtering for Japanese posts and this is set to `true`, posts without explicit language metadata might still be included if they contain Japanese text.

#### 👤 User Profile Settings

##### `enrichUserProfiles`
- **Type**: Boolean
- **Description**: Whether to fetch additional user profile information for post authors
- **Behavior**: When set to `true`, the output will include extended information about post authors, such as their description, follower/following counts, post counts, and avatar URLs. When set to `false`, only basic author information (DID, handle, name) will be included.
- **Example**: If set to `true`, each post in the output will include additional fields like `authorDescription`, `authorFollowersCount`, etc.

#### ⏱️ Data Collection Parameters

##### `maxPosts`
- **Type**: Integer
- **Description**: Maximum number of posts to collect (0 for unlimited)
- **Behavior**: Controls how many posts will be collected before the scraper stops. Setting to 0 means the scraper will continue until the time limit is reached.
- **Example**: If set to `100`, the scraper will stop after collecting 100 posts that match the filter criteria.

##### `timeLimit`
- **Type**: Integer
- **Description**: Maximum time to run the scraper in minutes
- **Behavior**: Controls how long the scraper will run before stopping, regardless of how many posts have been collected.
- **Example**: If set to `30`, the scraper will stop after 30 minutes, even if it hasn't reached the `maxPosts` limit.

#### 🔌 Connection Settings

##### `region`
- **Type**: String enum ("us-east" or "us-west")
- **Description**: Region for the Jetstream server
- **Behavior**: Controls which regional Bluesky Jetstream server the scraper connects to. This can affect latency and potentially the volume of data received.
- **Example**: If you're collecting data from the US West Coast, selecting `us-west` might provide lower latency.

##### `instance`
- **Type**: Integer (1 or 2)
- **Description**: Instance number for the Jetstream server
- **Behavior**: Selects which specific Jetstream instance to connect to within the selected region.
- **Example**: If experiencing connection issues with instance 1, switching to instance 2 might help.

##### `autoReconnect`
- **Type**: Boolean
- **Description**: Whether to automatically reconnect if the connection is lost
- **Behavior**: When set to `true`, the scraper will attempt to reconnect to Jetstream if the connection drops. When set to `false`, the scraper will terminate on connection loss.
- **Example**: For long-running data collection jobs, setting this to `true` helps ensure continuous data collection despite temporary network issues.

##### `maxRetries`
- **Type**: Integer
- **Description**: Maximum number of reconnection attempts
- **Behavior**: Controls how many times the scraper will try to reconnect before giving up.
- **Example**: If set to `5`, the scraper will make up to 5 reconnection attempts before terminating.

#### ⚙️ Advanced Settings

##### `saveCheckpoints`
- **Type**: Boolean
- **Description**: Whether to periodically save collected data to prevent loss on errors
- **Behavior**: When set to `true`, the scraper will periodically save collected data to disk, allowing recovery from a checkpoint if the process is interrupted.
- **Example**: If set to `true` and the scraper crashes after collecting 400 posts, you might be able to recover 350 of them from the last checkpoint.

##### `proxy`
- **Type**: Object
- **Description**: Proxy configuration for the scraper
- **Behavior**: Controls whether and how the scraper uses Apify proxies for connections.
- **Example**: Setting `useApifyProxy` to `true` allows the scraper to use Apify's proxy infrastructure, which can help avoid rate limiting.

##### `debugMode`
- **Type**: Boolean
- **Description**: Whether to enable detailed logging for troubleshooting
- **Behavior**: When set to `true`, the scraper will output more detailed logs about its operation, which can help diagnose issues.
- **Example**: If you're not seeing the expected output, setting this to `true` can provide insights into what's happening.

##### `verboseDebug`
- **Type**: Boolean
- **Description**: Whether to enable extremely detailed logging for message format diagnostics
- **Behavior**: When set to `true`, the scraper will output extremely detailed logs, including raw message contents. This generates large log files.
- **Example**: Useful only for advanced debugging when developing or modifying the scraper.

---

### 🎨 Customizing Output Format

The scraper allows you to customize the data fields included in the output through several parameters:

#### Field Selection Controls

These parameters control which data fields are included in the output:

- **`includeMedia`**: Controls whether media URLs and related fields are included
- **`includeImages`**: Controls whether image URLs and related fields are included  
- **`includeReplies`**: Controls whether reply information fields are included
- **`enrichUserProfiles`**: Controls whether extended author profile fields are included

#### Output Format Options

On the Apify platform, you can download your dataset in several formats:

1. **JSON**: The default format with complete data structure
2. **CSV**: Tabular format suitable for spreadsheet applications
3. **Excel**: Direct Excel file download
4. **RSS**: For feed readers
5. **HTML**: For web viewing

To change the download format:
1. Navigate to the "Storage" tab in your Apify account
2. Select the dataset from your actor run
3. Click the "Download" dropdown menu
4. Choose your preferred format

For customized data processing, you can also use the Apify API to retrieve the data programmatically in your preferred format.

---

### 🔄 Combining Filters

When multiple filter types are used together (hashtags, usernames, languages), the scraper applies AND logic between different filter types:

- If you set both hashtags and languages, posts must match BOTH criteria (contain one of the hashtags AND be in one of the languages).
- If you set both usernames and languages, posts must be authored by one of the specified users AND be in one of the specified languages.

---

### ⚪ Default Behavior (No Filters)

When no filter options (hashtags, usernames, languages) are selected:

- The scraper will collect **all posts** from the Bluesky Jetstream without any filtering
- All posts will match the filter criteria automatically
- The only limits will be the `maxPosts` parameter and/or the `timeLimit` parameter
- You'll get a diverse, unfiltered stream of Bluesky content
- Other inclusion settings like `includeMedia` and `includeImages` will still be applied
- Collection types will be limited to what's specified in `wantedCollections` (defaults to feed posts)

This approach is useful for general data collection when you want to analyze the overall Bluesky content without focusing on specific topics, users, or languages.

---

### 📝 Example Scenarios

<details>
<summary><b>Scenario 1: Language Filtering with Media Exclusion</b></summary>
<br>
If you set:

- `languages` to `["pt"]` (Portuguese)
- `includeMedia` to `false`
- `includeImages` to `false`

The output will include all Portuguese-language posts, but all media-related fields (imageUrl, mediaUrl, etc.) will be empty, and `hasMedia`/`hasImages` flags will be false.
</details>

<details>
<summary><b>Scenario 2: Hashtag and Username Filtering</b></summary>
<br>
If you set:

- `hashtags` to `["tech", "AI"]`
- `usernames` to `["user1.bsky.social", "user2.bsky.social"]`

The output will include only posts that:
1. Contain either #tech OR #AI hashtags, AND
2. Are authored by either user1 OR user2
</details>

<details>
<summary><b>Scenario 3: Multiple Collections</b></summary>
<br>
If you set:

- `wantedCollections` to `["app.bsky.feed.post", "app.bsky.feed.like"]`

The output will include both original posts and like interactions from the Bluesky network.
</details>

---

### 🤝 Bluesky Firehose Scraping Etiquette

When using the Bluesky Jetstream (firehose), it's important to follow these ethical guidelines and best practices:

#### 📜 Official Guidelines

- **Respect the Terms of Service**: Always adhere to Bluesky's official [Terms of Service](https://bsky.app/support/tos) and [API Usage Guidelines](https://docs.bsky.app).
- **Attribution**: When publishing research or analysis based on Bluesky data, properly attribute the source.
- **Privacy Awareness**: Though the data is publicly available, be mindful that users may not expect their content to be analyzed at scale.

#### 🔧 Technical Best Practices

- **Rate Limiting**: The scraper already implements rate limiting, but be cautious about running multiple instances simultaneously.
- **Efficient Filtering**: Use the filtering options to collect only the data you need rather than scraping everything.
- **Connection Management**: Use the `autoReconnect` and `maxRetries` settings responsibly to avoid creating excessive connection attempts.
- **Data Storage**: Handle collected data securely and in compliance with relevant privacy regulations like GDPR.

#### 🔍 Responsible Usage

- **Research Purpose**: Clearly define your research or business purpose before collecting data.
- **Minimize Collection**: Only collect the data fields necessary for your analysis.
- **Respect Boundaries**: Avoid excessive scraping that might impact the platform's performance.
- **Consider Opt-Out**: When presenting results, consider providing ways for users to opt-out of having their content included.

#### ⚖️ Legal Considerations

- **Data Protection**: Comply with applicable data protection laws in your jurisdiction.
- **User Privacy**: Even though posts are public, respect user privacy by anonymizing data when possible.
- **Terms Changes**: Regularly check for updates to Bluesky's terms as the platform is evolving.

Following these guidelines ensures ethical use of the Bluesky firehose while maintaining a positive relationship with the platform and its community.

# Actor input Schema

## `hashtags` (type: `array`):

List of hashtags to filter posts (without the # symbol)
## `usernames` (type: `array`):

List of Bluesky usernames to filter posts (will be resolved to DIDs for efficient filtering)
## `languages` (type: `array`):

Select languages to filter posts (multiple selection allowed)
## `wantedCollections` (type: `array`):

Specific collections to filter from Jetstream (defaults to feed posts)
## `region` (type: `string`):

Region for the Jetstream server
## `instance` (type: `integer`):

Instance number for the Jetstream server (1 or 2)
## `includeMedia` (type: `boolean`):

Whether to include URLs for media attachments
## `includeImages` (type: `boolean`):

Whether to include URLs for images in the output
## `detectLanguage` (type: `boolean`):

Automatically detect language for posts that don't specify one
## `enrichUserProfiles` (type: `boolean`):

Fetch additional user profile information for authors
## `includeReplies` (type: `boolean`):

Include reply information in collected posts
## `maxPosts` (type: `integer`):

Maximum number of posts to collect (0 for unlimited)
## `timeLimit` (type: `integer`):

Maximum time to run the scraper in minutes
## `autoReconnect` (type: `boolean`):

Automatically reconnect if connection is lost
## `maxRetries` (type: `integer`):

Maximum number of times to attempt reconnection
## `saveCheckpoints` (type: `boolean`):

Periodically save data to prevent loss on errors
## `proxy` (type: `object`):

Proxy settings for the scraper
## `debugMode` (type: `boolean`):

Enable detailed logging for troubleshooting
## `verboseDebug` (type: `boolean`):

Enable extremely detailed logging for message format diagnostics (warning: generates large logs)

## Actor input object example

```json
{
  "wantedCollections": [
    "app.bsky.feed.post"
  ],
  "region": "us-east",
  "instance": 2,
  "includeMedia": true,
  "includeImages": true,
  "detectLanguage": true,
  "enrichUserProfiles": true,
  "includeReplies": true,
  "maxPosts": 500,
  "timeLimit": 60,
  "autoReconnect": true,
  "maxRetries": 5,
  "saveCheckpoints": true,
  "proxy": {
    "useApifyProxy": true
  },
  "debugMode": false,
  "verboseDebug": false
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("skyscraping/bluesky-jetstream-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("skyscraping/bluesky-jetstream-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call skyscraping/bluesky-jetstream-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=skyscraping/bluesky-jetstream-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Bluesky Jetstream Scraper",
        "description": "Bluesky Social Feed Scraper collects posts from Bluesky's Jetstream API. Filter by hashtags, usernames, or languages to gather targeted data. Includes media attachments, user profiles, and reply context. Perfect for social research, trend analysis, and content monitoring on the platform.",
        "version": "0.0",
        "x-build-id": "Xu8sMzu1SvkSeMlMU"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/skyscraping~bluesky-jetstream-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-skyscraping-bluesky-jetstream-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/skyscraping~bluesky-jetstream-scraper/runs": {
            "post": {
                "operationId": "runs-sync-skyscraping-bluesky-jetstream-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/skyscraping~bluesky-jetstream-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-skyscraping-bluesky-jetstream-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "timeLimit"
                ],
                "properties": {
                    "hashtags": {
                        "title": "Hashtags",
                        "type": "array",
                        "description": "List of hashtags to filter posts (without the # symbol)",
                        "items": {
                            "type": "string"
                        }
                    },
                    "usernames": {
                        "title": "Usernames",
                        "type": "array",
                        "description": "List of Bluesky usernames to filter posts (will be resolved to DIDs for efficient filtering)",
                        "items": {
                            "type": "string"
                        }
                    },
                    "languages": {
                        "title": "Languages",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Select languages to filter posts (multiple selection allowed)",
                        "items": {
                            "type": "string",
                            "enum": [
                                "en",
                                "ar",
                                "cs",
                                "zh",
                                "da",
                                "nl",
                                "fi",
                                "fr",
                                "de",
                                "el",
                                "he",
                                "hi",
                                "hu",
                                "id",
                                "it",
                                "ja",
                                "ko",
                                "no",
                                "pl",
                                "pt",
                                "ru",
                                "es",
                                "sv",
                                "th",
                                "tr",
                                "uk",
                                "vi"
                            ],
                            "enumTitles": [
                                "English",
                                "Arabic",
                                "Czech",
                                "Chinese",
                                "Danish",
                                "Dutch",
                                "Finnish",
                                "French",
                                "German",
                                "Greek",
                                "Hebrew",
                                "Hindi",
                                "Hungarian",
                                "Indonesian",
                                "Italian",
                                "Japanese",
                                "Korean",
                                "Norwegian",
                                "Polish",
                                "Portuguese",
                                "Russian",
                                "Spanish",
                                "Swedish",
                                "Thai",
                                "Turkish",
                                "Ukrainian",
                                "Vietnamese"
                            ]
                        }
                    },
                    "wantedCollections": {
                        "title": "Wanted Collections",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Specific collections to filter from Jetstream (defaults to feed posts)",
                        "items": {
                            "type": "string",
                            "enum": [
                                "app.bsky.feed.post",
                                "app.bsky.feed.like",
                                "app.bsky.feed.repost",
                                "app.bsky.graph.follow",
                                "app.bsky.graph.block",
                                "app.bsky.actor.profile"
                            ],
                            "enumTitles": [
                                "Posts",
                                "Likes",
                                "Reposts",
                                "Follows",
                                "Blocks",
                                "Profiles"
                            ]
                        },
                        "default": [
                            "app.bsky.feed.post"
                        ]
                    },
                    "region": {
                        "title": "Jetstream Region",
                        "enum": [
                            "us-east",
                            "us-west"
                        ],
                        "type": "string",
                        "description": "Region for the Jetstream server",
                        "default": "us-east"
                    },
                    "instance": {
                        "title": "Jetstream Instance",
                        "minimum": 1,
                        "maximum": 2,
                        "type": "integer",
                        "description": "Instance number for the Jetstream server (1 or 2)",
                        "default": 2
                    },
                    "includeMedia": {
                        "title": "Include Media Links",
                        "type": "boolean",
                        "description": "Whether to include URLs for media attachments",
                        "default": true
                    },
                    "includeImages": {
                        "title": "Include Image Links",
                        "type": "boolean",
                        "description": "Whether to include URLs for images in the output",
                        "default": true
                    },
                    "detectLanguage": {
                        "title": "Auto-detect Language",
                        "type": "boolean",
                        "description": "Automatically detect language for posts that don't specify one",
                        "default": true
                    },
                    "enrichUserProfiles": {
                        "title": "Enrich User Profiles",
                        "type": "boolean",
                        "description": "Fetch additional user profile information for authors",
                        "default": true
                    },
                    "includeReplies": {
                        "title": "Include Replies",
                        "type": "boolean",
                        "description": "Include reply information in collected posts",
                        "default": true
                    },
                    "maxPosts": {
                        "title": "Maximum Posts",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of posts to collect (0 for unlimited)",
                        "default": 500
                    },
                    "timeLimit": {
                        "title": "Time Limit (minutes)",
                        "minimum": 1,
                        "maximum": 180,
                        "type": "integer",
                        "description": "Maximum time to run the scraper in minutes",
                        "default": 60
                    },
                    "autoReconnect": {
                        "title": "Auto Reconnect",
                        "type": "boolean",
                        "description": "Automatically reconnect if connection is lost",
                        "default": true
                    },
                    "maxRetries": {
                        "title": "Max Reconnection Attempts",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Maximum number of times to attempt reconnection",
                        "default": 5
                    },
                    "saveCheckpoints": {
                        "title": "Save Checkpoints",
                        "type": "boolean",
                        "description": "Periodically save data to prevent loss on errors",
                        "default": true
                    },
                    "proxy": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings for the scraper",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "debugMode": {
                        "title": "Debug Mode",
                        "type": "boolean",
                        "description": "Enable detailed logging for troubleshooting",
                        "default": false
                    },
                    "verboseDebug": {
                        "title": "Verbose Debug",
                        "type": "boolean",
                        "description": "Enable extremely detailed logging for message format diagnostics (warning: generates large logs)",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
