# RAG Website Crawler - Clean Markdown for LLMs & AI (`themineworks/rag-crawler`) Actor

Crawl any website and extract clean, chunked Markdown ready for RAG pipelines and LLM context. Returns page text, titles and URLs. No API key. Works in Claude, ChatGPT & any MCP-compatible AI agent.

- **URL**: https://apify.com/themineworks/rag-crawler.md
- **Developed by:** [The Mine Works](https://apify.com/themineworks) (community)
- **Categories:** Business, Developer tools, MCP servers
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## RAG-Ready Website Crawler — Pre-Chunked Markdown with Token Counts

Turn any website into a clean, chunked, token-counted markdown corpus — ready to drop into a **RAG pipeline**, vector database, or LLM context window. No wasted spend: you are only charged for pages that successfully crawl and produce usable content.

---

### What It Does

Most website-to-markdown scrapers dump raw HTML noise into your pipeline and charge you regardless of output quality. RAG-Crawler is designed from the ground up for **AI-native ingestion workflows**:

- Crawls one or more seed URLs and follows internal links up to your page limit.
- Strips boilerplate — nav, header, footer, ads, sidebars — using **Mozilla Readability** before converting to clean Markdown.
- Splits each page into **heading-based chunks** with an approximate **token count** per chunk, so you can feed them directly into embedding or retrieval pipelines without a pre-processing step.
- Returns structured JSON ready for **vector databases** (Pinecone, Weaviate, Qdrant, pgvector) or **MCP tool servers**.
- Supports **SPA / JavaScript-rendered** pages via Playwright and static HTML via Cheerio.
- **Never charges for a failed page.** If a page times out, errors, or returns no content, it is recorded in the dataset as `status: "failed"` with a `charged: false` flag.

---

### Key Features

- **SPA / JS rendering** — Playwright handles React, Vue, Next.js and other JS-rendered sites. Disable for pure static HTML to cut cost and latency.
- **Heading-based chunking** — Splits content along `#` / `##` / `###` hierarchy. Oversized sections are further split by paragraph to respect your token limit. Chunk heading path is preserved so retrievers know the document context.
- **Token counts per chunk** — Every chunk carries a `token_count` field (word-count × 1.35 approximation — no external tokenizer dependency). Accurate to ±8% on English prose.
- **Boilerplate stripping** — Mozilla Readability extracts the article body before conversion. CSS selector override available for non-article pages (docs, product pages).
- **Configurable output formats** — `chunks` (RAG-ready split output), `full` (single markdown blob), or `both`.
- **Fail-loud shortfall reporting** — Every run emits a `_summary` item with `pages_crawled`, `pages_failed`, `total_tokens`, and `charged_for`. If zero pages crawled, the actor fails with a plain-English `shortfall_reason` so you know exactly what went wrong.
- **Zero charge on failure** — Charging happens only after content is confirmed extracted. Every failed-page record carries `"charged": false` for your audit trail.
- **URL exclusion patterns** — Regex-based exclude list. Skip PDFs, author pages, tag archives, or any URL pattern before they are even enqueued.

---

### Output Schema

Each successfully crawled page produces one item. Example with `outputFormat: "chunks"`:

```json
{
  "url": "https://docs.example.com/getting-started/installation",
  "canonical": "https://docs.example.com/getting-started/installation",
  "title": "Installation — Example Docs",
  "description": "How to install the Example SDK in your project.",
  "language": "en",
  "word_count": 412,
  "token_count": 556,
  "crawled_at": "2026-06-05T09:14:22.000Z",
  "status": "success",
  "chunks": [
    {
      "heading_path": ["Getting Started", "Installation"],
      "text": "## Installation\n\nRun the following command to install the SDK:\n\n```bash\nnpm install example-sdk\n```",
      "token_count": 28,
      "chunk_index": 0
    },
    {
      "heading_path": ["Getting Started", "Installation", "Requirements"],
      "text": "### Requirements\n\nNode.js 18 or later is required. The SDK does not support CommonJS — use ESM (`\"type\": \"module\"` in package.json).",
      "token_count": 41,
      "chunk_index": 1
    }
  ]
}
````

Failed pages:

```json
{
  "url": "https://docs.example.com/legacy-page",
  "status": "failed",
  "reason": "timeout",
  "message": "Page did not reach load state within timeout.",
  "charged": false
}
```

Run summary (always the final item):

```json
{
  "_type": "summary",
  "pages_requested": 25,
  "pages_crawled": 23,
  "pages_failed": 2,
  "total_tokens": 48210,
  "charged_for": 23
}
```

***

### Pricing

**You are charged per successfully crawled page — nothing else.**

| Scenario | Charge |
|---|---|
| Page crawled, content extracted | 1 credit per page |
| Page timed out | 0 credits |
| Page returned no content | 0 credits |
| Page blocked / 403 | 0 credits |
| Entire run produces zero pages | 0 credits + actor fails with reason |

No hidden platform fees, no per-token surcharges. Compare this to **Firecrawl**, which charges per scrape attempt regardless of content quality, and **other Apify scrapers** that charge on request rather than on result. RAG-Crawler only charges when your pipeline actually gets something useful.

***

### Use Cases

- **RAG pipelines** — Ingest product docs, knowledge bases, or competitor sites into your retrieval system. Chunks arrive pre-sized and pre-labelled with heading paths — no custom splitting code required.
- **LLM context injection** — Feed a full documentation site into an LLM in one run. Use `outputFormat: "full"` for single-page context or `outputFormat: "chunks"` for precise retrieval.
- **AI agents via MCP** — RAG-Crawler is MCP-native. Trigger it from any MCP tool server and pipe the JSON output directly into your agent's knowledge tool.
- **Vector database ingestion** — Output maps directly to Pinecone, Weaviate, Qdrant, or pgvector upsert payloads. `token_count` helps you stay under embedding model input limits.
- **Documentation indexing** — Crawl versioned docs sites. Heading paths give you natural document hierarchy for structured retrieval.
- **Competitor content analysis** — Crawl competitor sites to LLM-analyse content gaps, SEO positioning, or product messaging.

***

### Technical Notes

#### SPA Handling

When `renderJs: true`, the actor uses Playwright's Chromium in headless mode. Wait strategy options:

- `networkidle` (default) — waits until no more than 2 network connections for 500ms. Most reliable for SPAs that load data via API.
- `domcontentloaded` — fires when the HTML is parsed. Faster but may miss dynamically rendered content.
- `load` — waits for all resources including images. Slowest; rarely needed.

If the primary wait strategy times out (30s), the actor automatically retries with `domcontentloaded`. If `document.body` is null (which can happen on route-transition frames in some SPAs), the actor waits 2 seconds and retries HTML extraction once before marking the page as failed.

#### Chunking Algorithm

1. The markdown is walked line-by-line. Each ATX heading (`#` through `######`) opens a new section.
2. Content before the first heading is grouped as a section with an empty heading path.
3. If a section's token count is within `maxTokensPerChunk`, it becomes one chunk.
4. If a section exceeds the limit, its body is split by paragraph (blank-line boundaries) and paragraphs are greedily packed into sub-chunks. The heading line is repeated at the top of each sub-chunk to preserve retrieval context.
5. A single paragraph that exceeds `maxTokensPerChunk` is emitted as its own chunk (paragraphs are never split mid-sentence).

#### Token Approximation

RAG-Crawler uses `Math.ceil(wordCount × 1.35)` — no tiktoken or external tokenizer dependency. Accuracy:

| Content type | Error vs. cl100k\_base |
|---|---|
| English prose | ±8% |
| Mixed code + prose | ±15% |
| Dense code | ±20% |

For RAG chunking the error is acceptable — at worst a chunk boundary lands one paragraph off the "true" token boundary. For billing-critical token counting, run a local tiktoken pass on the output.

#### Custom Content Selector

If Readability misidentifies the main content area (common on documentation sites with complex layouts), set `customCss` to a CSS selector:

```
customCss: "article.docs-content"
customCss: "#main-content"
customCss: ".prose"
```

The selector takes precedence over Readability. If the selector matches nothing, the actor falls back to Readability, then to full-body conversion.

***

### Keywords

website to markdown · RAG · LLM · Firecrawl alternative · MCP · vector database · chunking · token count · web scraper · SPA crawler · Playwright · Cheerio · Readability · AI pipeline · document ingestion · Apify actor

# Actor input Schema

## `startUrls` (type: `array`):

One or more URLs to start crawling from. Each entry is a Crawlee RequestOptions object (at minimum {url: 'https://...'}).

## `maxPages` (type: `integer`):

Maximum number of pages to crawl. You will only be charged for pages that successfully crawl.

## `renderJs` (type: `boolean`):

Use Playwright for JS-rendered SPAs. Disable for static HTML — faster and cheaper.

## `waitStrategy` (type: `string`):

Playwright wait event before extracting content. networkidle is most reliable for SPAs; domcontentloaded is faster for static pages.

## `outputFormat` (type: `string`):

chunks: heading-split chunks with token counts (ideal for RAG). full: single markdown blob per page. both: include both.

## `maxTokensPerChunk` (type: `integer`):

Target maximum tokens per chunk when outputFormat is 'chunks' or 'both'. Oversized heading sections are split by paragraph.

## `stripBoilerplate` (type: `boolean`):

Use Mozilla Readability to remove navigation, headers, footers, ads and sidebars before converting to markdown.

## `excludeUrlPatterns` (type: `array`):

List of regex patterns. Any discovered URL matching any pattern is skipped and never enqueued.

## `customCss` (type: `string`):

Optional CSS selector to scope content extraction (e.g. 'article.main-content', '#docs-body'). Overrides Readability when set.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "maxPages": 10,
  "renderJs": true,
  "waitStrategy": "networkidle",
  "outputFormat": "chunks",
  "maxTokensPerChunk": 800,
  "stripBoilerplate": true,
  "excludeUrlPatterns": [
    "\\.(pdf|zip|png|jpg|jpeg|gif|svg|mp4|mp3|woff2?)$",
    "/tag/",
    "/author/"
  ],
  "customCss": ""
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://example.com"
        }
    ],
    "excludeUrlPatterns": [
        "\\.(pdf|zip|png|jpg|jpeg|gif|svg|mp4|mp3|woff2?)$",
        "/tag/",
        "/author/"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("themineworks/rag-crawler").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://example.com" }],
    "excludeUrlPatterns": [
        "\\.(pdf|zip|png|jpg|jpeg|gif|svg|mp4|mp3|woff2?)$",
        "/tag/",
        "/author/",
    ],
}

# Run the Actor and wait for it to finish
run = client.actor("themineworks/rag-crawler").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "excludeUrlPatterns": [
    "\\\\.(pdf|zip|png|jpg|jpeg|gif|svg|mp4|mp3|woff2?)$",
    "/tag/",
    "/author/"
  ]
}' |
apify call themineworks/rag-crawler --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=themineworks/rag-crawler",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "RAG Website Crawler - Clean Markdown for LLMs & AI",
        "description": "Crawl any website and extract clean, chunked Markdown ready for RAG pipelines and LLM context. Returns page text, titles and URLs. No API key. Works in Claude, ChatGPT & any MCP-compatible AI agent.",
        "version": "0.1",
        "x-build-id": "DydVmEkZxqspzORIz"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/themineworks~rag-crawler/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-themineworks-rag-crawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/themineworks~rag-crawler/runs": {
            "post": {
                "operationId": "runs-sync-themineworks-rag-crawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/themineworks~rag-crawler/run-sync": {
            "post": {
                "operationId": "run-sync-themineworks-rag-crawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "One or more URLs to start crawling from. Each entry is a Crawlee RequestOptions object (at minimum {url: 'https://...'}).",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxPages": {
                        "title": "Max pages",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of pages to crawl. You will only be charged for pages that successfully crawl.",
                        "default": 10
                    },
                    "renderJs": {
                        "title": "Render JavaScript (SPA support)",
                        "type": "boolean",
                        "description": "Use Playwright for JS-rendered SPAs. Disable for static HTML — faster and cheaper.",
                        "default": true
                    },
                    "waitStrategy": {
                        "title": "Page wait strategy",
                        "enum": [
                            "networkidle",
                            "domcontentloaded",
                            "load"
                        ],
                        "type": "string",
                        "description": "Playwright wait event before extracting content. networkidle is most reliable for SPAs; domcontentloaded is faster for static pages.",
                        "default": "networkidle"
                    },
                    "outputFormat": {
                        "title": "Output format",
                        "enum": [
                            "chunks",
                            "full",
                            "both"
                        ],
                        "type": "string",
                        "description": "chunks: heading-split chunks with token counts (ideal for RAG). full: single markdown blob per page. both: include both.",
                        "default": "chunks"
                    },
                    "maxTokensPerChunk": {
                        "title": "Max tokens per chunk",
                        "minimum": 50,
                        "maximum": 4000,
                        "type": "integer",
                        "description": "Target maximum tokens per chunk when outputFormat is 'chunks' or 'both'. Oversized heading sections are split by paragraph.",
                        "default": 800
                    },
                    "stripBoilerplate": {
                        "title": "Strip boilerplate",
                        "type": "boolean",
                        "description": "Use Mozilla Readability to remove navigation, headers, footers, ads and sidebars before converting to markdown.",
                        "default": true
                    },
                    "excludeUrlPatterns": {
                        "title": "Exclude URL patterns",
                        "type": "array",
                        "description": "List of regex patterns. Any discovered URL matching any pattern is skipped and never enqueued.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "customCss": {
                        "title": "Custom content selector (CSS)",
                        "type": "string",
                        "description": "Optional CSS selector to scope content extraction (e.g. 'article.main-content', '#docs-body'). Overrides Readability when set.",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```