# Save To S3 (`drinksight/save-to-s3`) Actor

Designed to be run from an ACTOR.RUN.SUCCEEDED webhook, this actor downloads a task run's default dataset and saves it to an S3 bucket.

- **URL**: https://apify.com/drinksight/save-to-s3.md
- **Developed by:** [Richard Weaver](https://apify.com/drinksight) (community)
- **Categories:** Automation, Open source
- **Stats:** 101 total users, 12 monthly users, 14.2% runs succeeded, 3 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## save-to-s3

An [Apify](https://apify.com) actor to save the default dataset of a run to an S3 bucket.

It is designed to be called from the ACTOR.RUN.SUCCEEDED webhook of the actor that has generated the dataset.

This actor is compatible with API v2 - I made it because I couldn't get the [Crawler Results To S3](https://apify.com/apify/crawler-results-to-s3) actor to work with v2 actors.

### Usage

AWS credentials and options for fomatting the data set are set on this actor's input, which are merged with the webhook's post data. You'll therefore need to create a task for your uploads so you can save common config such as your AWS credentials and dataset format details.

#### 1. Create the task

Create a new task using the save-to-3 actor. This allows you to specify input to use every time the task is run. The webhook's post data will be merged with this at runtime - the values are those from the [get actor run API endpoint](https://apify.com/docs/api/v2#/reference/actors/run-object/get-run), all grouped under a `resource` property.

The properties you can specify in your Input for the task:

| Property          | Description                                                                                                                                                                                                                                                                                                                                                                                      |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `accessKeyId`     | The access key for the AWS user to connect with                                                                                                                                                                                                                                                                                                                                                  |
| `secretAccessKey` | The secret access key for the AWS user to connect with                                                                                                                                                                                                                                                                                                                                           |
| `region`          | The AWS region your bucket is located in (eg `eu-west-2`)                                                                                                                                                                                                                                                                                                                                        |
| `bucket`          | The bucket name to save files to                                                                                                                                                                                                                                                                                                                                                                 |
| `objectKeyFormat` | A string to specify the key (i.e. filename) for the S3 object you will save. You can specify any property from the `input` object using dot notation in a syntax similar to JavaScript template literals. For example, the defauult value `${resource.id}_${resource.startedAt}.${format}` will yield an S3 object with a name something like `SBNgQGmp87LtspHF1_2019-05-15T07:25:00.414Z.json`. |
| `format`          | Maps to the `format` parameter of the [get dataset items API endpoint](https://apify.com/docs/api/v2#/reference/datasets/item-collection) and accepts any of the valid string values                                                                                                                                                                                                             |
| `clean`           | Maps to the `clean` parameter of the [get dataset items API endpoint](https://apify.com/docs/api/v2#/reference/datasets/item-collection)                                                                                                                                                                                                                                                         |
| `datasetOptions`  | An object that allows you to specify any of the other parameters of the [get dataset items API endpoint](https://apify.com/docs/api/v2#/reference/datasets/item-collection), for example `{ "offset": "10" }` is the equivalent of settings `?offset=10` in the API call                                                                                                                         |
| `debugLog`        | A `boolean` indicating whether to use debug level logging                                                                                                                                                                                                                                                                                                                                        |

#### 2. Create the webhook

Go to your save-to-s3 task's API tab and copy the URL for the Run Task endpoint, which will be in the format: `https://api.apify.com/v2/actor-tasks/TASK_NAME_HERE/runs?token=YOUR_TOKEN_HERE`

Go to either the actor or (more likely) the actor task you want to add save-to-s3 functionality to. In the Webhooks tab, add a webhook with the URL you just copied. For Event types, select ACTOR.RUN.SUCCEEDED. Then Save.

### Security

Because you store yoour AWS user's key and secret as part of this actor's input, it is _strongly recommended_ that you create an AWS IAM user specifically for Apify, and only grant access to the specific buckey you are using.

An example policy:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetBucketLocation", "s3:ListAllMyBuckets"],
      "Resource": "arn:aws:s3:::*"
    },
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": ["arn:aws:s3:::YOUR-BUCKET", "arn:aws:s3:::YOUR-BUCKET/*"]
    }
  ]
}
````

# Actor input Schema

## `accessKeyId` (type: `string`):

Enter the access key ID for the AWS user

## `secretAccessKey` (type: `string`):

Enter the secret access key for the AWS user

## `region` (type: `string`):

Enter the AWS region your S3 bucket is located in

## `bucket` (type: `string`):

Enter the name of the S3 bucket to use

## `objectKeyFormat` (type: `string`):

The key to use for the filename

## `format` (type: `string`):

The data format to download the dataset in

## `clean` (type: `boolean`):

Crawler will ignore SSL certificate errors.

## `datasetOptions` (type: `object`):

An object whose properties will be enumerated and added to the dataset get items API request. See https://apify.com/docs/api/v2#/reference/datasets/item-collection/get-items.

## `debugLog` (type: `boolean`):

Debug messages will be included in the log. Use <code>context.log.debug('message')</code> to log your own debug messages.

## Actor input object example

```json
{
  "objectKeyFormat": "${resource.id}_${resource.startedAt}.${format}",
  "format": "json",
  "clean": false,
  "datasetOptions": {},
  "debugLog": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "accessKeyId": "",
    "secretAccessKey": "",
    "region": "",
    "bucket": "",
    "objectKeyFormat": "${resource.id}_${resource.startedAt}.${format}",
    "datasetOptions": {}
};

// Run the Actor and wait for it to finish
const run = await client.actor("drinksight/save-to-s3").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "accessKeyId": "",
    "secretAccessKey": "",
    "region": "",
    "bucket": "",
    "objectKeyFormat": "${resource.id}_${resource.startedAt}.${format}",
    "datasetOptions": {},
}

# Run the Actor and wait for it to finish
run = client.actor("drinksight/save-to-s3").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "accessKeyId": "",
  "secretAccessKey": "",
  "region": "",
  "bucket": "",
  "objectKeyFormat": "${resource.id}_${resource.startedAt}.${format}",
  "datasetOptions": {}
}' |
apify call drinksight/save-to-s3 --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=drinksight/save-to-s3",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Save To S3",
        "description": "Designed to be run from an ACTOR.RUN.SUCCEEDED webhook, this actor downloads a task run's default dataset and saves it to an S3 bucket.",
        "version": "0.1",
        "x-build-id": "K6mwi4jeedzGpb3Tp"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/drinksight~save-to-s3/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-drinksight-save-to-s3",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/drinksight~save-to-s3/runs": {
            "post": {
                "operationId": "runs-sync-drinksight-save-to-s3",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/drinksight~save-to-s3/run-sync": {
            "post": {
                "operationId": "run-sync-drinksight-save-to-s3",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "accessKeyId": {
                        "title": "AWS Key ID",
                        "type": "string",
                        "description": "Enter the access key ID for the AWS user"
                    },
                    "secretAccessKey": {
                        "title": "AWS Secret Access Key",
                        "type": "string",
                        "description": "Enter the secret access key for the AWS user"
                    },
                    "region": {
                        "title": "AWS region",
                        "type": "string",
                        "description": "Enter the AWS region your S3 bucket is located in"
                    },
                    "bucket": {
                        "title": "AWS bucket",
                        "type": "string",
                        "description": "Enter the name of the S3 bucket to use"
                    },
                    "objectKeyFormat": {
                        "title": "Object key format",
                        "type": "string",
                        "description": "The key to use for the filename",
                        "default": "${resource.id}_${resource.startedAt}.${format}"
                    },
                    "format": {
                        "title": "Data format",
                        "enum": [
                            "json",
                            "jsonl",
                            "xml",
                            "html",
                            "csv",
                            "xslx",
                            "rss"
                        ],
                        "type": "string",
                        "description": "The data format to download the dataset in",
                        "default": "json"
                    },
                    "clean": {
                        "title": "Clean items only",
                        "type": "boolean",
                        "description": "Crawler will ignore SSL certificate errors.",
                        "default": false
                    },
                    "datasetOptions": {
                        "title": "Dataset options",
                        "type": "object",
                        "description": "An object whose properties will be enumerated and added to the dataset get items API request. See https://apify.com/docs/api/v2#/reference/datasets/item-collection/get-items.",
                        "default": {}
                    },
                    "debugLog": {
                        "title": "Debug log",
                        "type": "boolean",
                        "description": "Debug messages will be included in the log. Use <code>context.log.debug('message')</code> to log your own debug messages.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
