# S3 Uploader (`apify/s3-uploader`) Actor

Upload data from an Apify dataset to an Amazon S3 bucket. Providing various filters and transformation options, this Actor allows precise control over data structure, formatting, and upload settings to ensure seamless integration into your data pipeline.

- **URL**: https://apify.com/apify/s3-uploader.md
- **Developed by:** [Apify](https://apify.com/apify) (Apify)
- **Categories:** Integrations, Automation, Developer tools
- **Stats:** 50 total users, 14 monthly users, 100.0% runs succeeded, 3 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

This integration-ready Apify Actor uploads the content of an Apify dataset to an Amazon S3 bucket. You can use it to store data extracted by other Actors as either an integration or a standalone Actor.

### Features
- Uploads data in various formats (JSON, CSV, XML, etc.).
- Supports variables for dynamic S3 object keys.
- Supports various filtering and transformation options (select, omit, unwind, flatten, offset, limit, clean only, ...).

### AWS IAM User Requirement
To use this Actor, you will need an AWS IAM user with the necessary permissions. If you do not have one already, you can create a new IAM user by following the [official AWS guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html).

### Input Parameters

| Parameter       | Type    | Required | Description |
|---------------|---------|----------|-------------|
| `accessKeyId`    | string  | ✅        | Your AWS access key ID used for authorization of the upload. |
| `secretAccessKey`| string  | ✅        | Your AWS secret access key used for authorization of the upload. |
| `region`         | string  | ✅        | The AWS region where the target S3 bucket is located. |
| `bucket`         | string  | ✅        | The name of the target S3 bucket. |
| `key`            | string  | ✅        | The object key, which serves as an identifier for the uploaded data in the S3 bucket. It can include an optional prefix. If an object with the same key already exists, it will be overwritten with the uploaded data. |
| `datasetId`      | string  | ✅        | The Apify dataset ID from which data will be retrieved for the upload. |
| `format`         | string  | ❌        | The format of the uploaded data. Options: `json`, `jsonl`, `html`, `csv`, `xml`, `xlsx`, `rss`. Default: `json`. |
| `fields`         | array   | ❌        | Fields to include in the output. If not specified, all fields will be included. |
| `omit`           | array   | ❌        | Fields to exclude from the output. |
| `unwind`         | array   | ❌        | Fields to unwind. If the field is an array, every element will become a separate record and merged with the parent object. If the unwound field is an object, it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object, it cannot be merged with a parent object, and the item gets preserved as is. If you specify multiple fields, they are unwound in the order you specify. |
| `flatten`        | array   | ❌        | Fields to transform from nested objects into a flat structure. |
| `offset`         | integer | ❌        | Number of items to skip from the beginning of the dataset. Minimum: `0`. |
| `limit`          | integer | ❌        | Maximum number of items to upload. Minimum: `1`. |
| `clean`          | boolean | ❌        | If enabled, only clean dataset items and their non-hidden fields will be uploaded. See the [documentation](https://docs.apify.com/platform/storage/dataset#hidden-fields) for details. Default: `true`. |

### How It Works
1. The Actor retrieves the specified dataset from Apify, transformed based on the provided input parameters (format, clean only, etc.).
2. The data is uploaded to the specified S3 bucket, as an object of the provided key.
3. If an object with the same key already exists, it is replaced with the new upload.

### Error Handling
If the Actor encounters an issue, it will log an error and fail. Possible issues include:
- Invalid AWS credentials.
- Incorrect bucket name or permissions.
- Nonexistent Apify dataset ID.

### Help & Support
The S3 Uploader is actively maintained.
If you have any feedback or feature ideas, feel free to [submit an issue](https://console.apify.com/actors/.../issues).

# Actor input Schema

## `accessKeyId` (type: `string`):

Your AWS access key ID used for authorization of the upload. You can get it from *AWS Console* -> *IAM* -> *Users* -> *Create user / Select existing user* -> *Security credentials* -> *Access keys*.
## `secretAccessKey` (type: `string`):

Your AWS secret access key used for authorization of the upload. You can get it from *AWS Console* -> *IAM* -> *Users* -> *Create user / Select existing user* -> *Security credentials* -> *Access keys* -> *Create access key*. The secret access key will be displayed only once, upon creation of the access key.
## `region` (type: `string`):

The AWS region where the target S3 bucket is located. You can get it from *AWS Console* -> *S3* -> *Create bucket / Select existing bucket* -> *Properties*.
## `bucket` (type: `string`):

The name of the target S3 bucket.
## `key` (type: `string`):

The object key, which serves as an identifier for the uploaded data in the S3 bucket. It can include an optional prefix. If an object with the same key already exists, it will be overwritten with the uploaded data.
## `datasetId` (type: `string`):

The Apify dataset ID from which data will be retrieved for the upload.
## `format` (type: `string`):

The format of the uploaded data.
## `fields` (type: `array`):

Fields to include in the output. If not specified, all fields will be included.
## `omit` (type: `array`):

Fields to exclude from the output.
## `unwind` (type: `array`):

Fields to unwind. If the field is an array, every element will become a separate record and merged with the parent object. If the unwound field is an object, it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object, it cannot be merged with a parent object, and the item gets preserved as is. If you specify multiple fields, they are unwound in the order you specify.
## `flatten` (type: `array`):

Fields to transform from nested objects into a flat structure.
## `offset` (type: `integer`):

Number of items to skip from the beginning of the dataset.
## `limit` (type: `integer`):

Maximum number of items to upload.
## `clean` (type: `boolean`):

If enabled, only clean dataset items and their non-hidden fields will be uploaded.

## Actor input object example

```json
{
  "format": "json",
  "clean": true
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("apify/s3-uploader").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("apify/s3-uploader").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call apify/s3-uploader --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=apify/s3-uploader",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "S3 Uploader",
        "description": "Upload data from an Apify dataset to an Amazon S3 bucket. Providing various filters and transformation options, this Actor allows precise control over data structure, formatting, and upload settings to ensure seamless integration into your data pipeline.",
        "version": "0.0",
        "x-build-id": "UkWwyMJI88NAoH8Kx"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/apify~s3-uploader/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-apify-s3-uploader",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/apify~s3-uploader/runs": {
            "post": {
                "operationId": "runs-sync-apify-s3-uploader",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/apify~s3-uploader/run-sync": {
            "post": {
                "operationId": "run-sync-apify-s3-uploader",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "accessKeyId",
                    "secretAccessKey",
                    "region",
                    "bucket",
                    "datasetId",
                    "key"
                ],
                "properties": {
                    "accessKeyId": {
                        "title": "Access key ID",
                        "type": "string",
                        "description": "Your AWS access key ID used for authorization of the upload. You can get it from *AWS Console* -> *IAM* -> *Users* -> *Create user / Select existing user* -> *Security credentials* -> *Access keys*."
                    },
                    "secretAccessKey": {
                        "title": "Secret access key",
                        "type": "string",
                        "description": "Your AWS secret access key used for authorization of the upload. You can get it from *AWS Console* -> *IAM* -> *Users* -> *Create user / Select existing user* -> *Security credentials* -> *Access keys* -> *Create access key*. The secret access key will be displayed only once, upon creation of the access key."
                    },
                    "region": {
                        "title": "Region",
                        "enum": [
                            "af-south-1",
                            "ap-east-1",
                            "ap-northeast-1",
                            "ap-northeast-2",
                            "ap-northeast-3",
                            "ap-south-1",
                            "ap-south-2",
                            "ap-southeast-1",
                            "ap-southeast-2",
                            "ap-southeast-3",
                            "ap-southeast-4",
                            "ap-southeast-5",
                            "ap-southeast-7",
                            "ca-central-1",
                            "ca-west-1",
                            "cn-north-1",
                            "cn-northwest-1",
                            "eu-central-1",
                            "eu-central-2",
                            "eu-north-1",
                            "eu-south-1",
                            "eu-south-2",
                            "eu-west-1",
                            "eu-west-2",
                            "eu-west-3",
                            "il-central-1",
                            "me-central-1",
                            "me-south-1",
                            "mx-central-1",
                            "sa-east-1",
                            "us-east-1",
                            "us-east-2",
                            "us-gov-east-1",
                            "us-gov-west-1",
                            "us-west-1",
                            "us-west-2"
                        ],
                        "type": "string",
                        "description": "The AWS region where the target S3 bucket is located. You can get it from *AWS Console* -> *S3* -> *Create bucket / Select existing bucket* -> *Properties*."
                    },
                    "bucket": {
                        "title": "Bucket",
                        "type": "string",
                        "description": "The name of the target S3 bucket."
                    },
                    "key": {
                        "title": "Key",
                        "type": "string",
                        "description": "The object key, which serves as an identifier for the uploaded data in the S3 bucket. It can include an optional prefix. If an object with the same key already exists, it will be overwritten with the uploaded data."
                    },
                    "datasetId": {
                        "title": "Dataset ID",
                        "type": "string",
                        "description": "The Apify dataset ID from which data will be retrieved for the upload."
                    },
                    "format": {
                        "title": "Format",
                        "enum": [
                            "json",
                            "jsonl",
                            "html",
                            "csv",
                            "xml",
                            "xlsx",
                            "rss"
                        ],
                        "type": "string",
                        "description": "The format of the uploaded data.",
                        "default": "json"
                    },
                    "fields": {
                        "title": "Select",
                        "type": "array",
                        "description": "Fields to include in the output. If not specified, all fields will be included."
                    },
                    "omit": {
                        "title": "Omit",
                        "type": "array",
                        "description": "Fields to exclude from the output."
                    },
                    "unwind": {
                        "title": "Unwind fields",
                        "type": "array",
                        "description": "Fields to unwind. If the field is an array, every element will become a separate record and merged with the parent object. If the unwound field is an object, it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object, it cannot be merged with a parent object, and the item gets preserved as is. If you specify multiple fields, they are unwound in the order you specify."
                    },
                    "flatten": {
                        "title": "Flatten fields",
                        "type": "array",
                        "description": "Fields to transform from nested objects into a flat structure."
                    },
                    "offset": {
                        "title": "Offset",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Number of items to skip from the beginning of the dataset."
                    },
                    "limit": {
                        "title": "Limit",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of items to upload."
                    },
                    "clean": {
                        "title": "Clean only",
                        "type": "boolean",
                        "description": "If enabled, only clean dataset items and their non-hidden fields will be uploaded.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
