# Rag Architect (`ai_solutionist/rag-architect`) Actor

Transform any website into vector-store-ready knowledge chunks for Pinecone, Weaviate, LangChain, LlamaIndex, Supabase, n8n & more. AI-generated Q\&A pairs, smart chunking, PII scrubbing. Build hallucination-free RAG chatbots in minutes.

- **URL**: https://apify.com/ai\_solutionist/rag-architect.md
- **Developed by:** [Jason Pellerin](https://apify.com/ai_solutionist) (community)
- **Categories:** AI, Developer tools, Automation
- **Stats:** 2 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $5.00 / 1,000 knowledge chunks

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## RAG-Architect: Automated Knowledge Engineering Factory

Transform raw web content into high-fidelity, vector-store-ready knowledge chunks with AI-generated Q&A pairs, structure-aware chunking, and PII scrubbing. The cleanroom for AI data.

### Why RAG-Architect?

Most AI projects fail not because the LLM is "dumb," but because the Knowledge Base is garbage.

**The Problems:**
- **Table Shredder**: Fixed-token chunking shreds tables mid-row, confusing your AI
- **Context Blindness**: Chunks lose their parent context ("The fee is $50" without knowing "For Florida Residents Only")
- **Metadata Rot**: Old and new policies on the same site confuse the AI
- **Synthetic Hallucinations**: LLM-generated Q&A without grounding checks make up unanswerable questions

**RAG-Architect solves all of this.**

### Features

#### Structure-Aware Chunking
- Splits by Markdown headers (`#`, `##`, `###`) - not fixed tokens
- **Table Guard**: Keeps tables whole, never splits mid-row
- **Code Guard**: Preserves code blocks as atomic units
- Configurable min/max chunk size with overlap

#### Parent-Child Context Injection
Every chunk gets a context header:
````

\[Source: example.com | Page: Pricing Plans | Section: Enterprise Tier | Updated: 2025-01-15]

The Enterprise plan includes unlimited API calls, dedicated support...

````

Your AI never loses its place.

#### Ground Truth Q&A Generator
For every chunk, generates 3-5 "battle-tested" questions using GPT-4o-mini:
1. Generate candidate questions based ONLY on the chunk text
2. **Self-Reflection Audit**: "Can this question be answered 100% by this chunk?"
3. Filter out low-confidence Q&A (threshold: 0.8)

#### PII Scrubbing
Automatically detects and redacts:
- Email addresses
- Phone numbers
- Social Security Numbers
- Credit card numbers
- Custom patterns (regex)
- Whitelist support for domains to preserve

#### 12 Output Formats

Drop directly into your stack of choice:

##### Universal Formats
| Format | Description | Best For |
|--------|-------------|----------|
| **raw** | Universal JSON with full metadata | Any custom integration |
| **csv** | Spreadsheet format with Q&A columns | Google Sheets, Excel, Airtable |
| **markdown** | Human-readable knowledge base docs | Documentation, wikis |

##### RAG Framework Formats
| Format | Description | Best For |
|--------|-------------|----------|
| **langchain** | LangChain Document format | Python LangChain pipelines |
| **llamaindex** | TextNode with relationships | LlamaIndex node graphs |

##### Vector Database Formats
| Format | Description | Best For |
|--------|-------------|----------|
| **n8n** | Vector Store Node compatible JSON | n8n workflow automation |
| **pinecone** | Vectors with rich metadata | Managed serverless vector search |
| **weaviate** | Class objects with properties | GraphQL-powered semantic search |
| **supabase** | pgvector rows with JSONB metadata | Postgres + vector search |
| **chroma** | Documents with embeddings-ready format | Local/embedded vector DB |
| **qdrant** | Points with payload | High-performance vector search |
| **milvus** | Entities for collection insert | Enterprise-scale vector DB |

### Quick Start

#### Input

```json
{
  "urls": [
    "https://example.com/pricing",
    "https://example.com/features"
  ],
  "openaiApiKey": "sk-...",
  "outputFormat": "n8n",
  "generateQA": true,
  "questionsPerChunk": 5,
  "chunkingConfig": {
    "splitOn": ["##", "###"],
    "maxChunkSize": 2000,
    "preserveTables": true,
    "preserveCodeBlocks": true
  },
  "piiConfig": {
    "enabled": true,
    "redactEmails": true,
    "redactPhones": true,
    "whitelist": ["*@mycompany.com"]
  }
}
````

#### Output (n8n format)

```json
{
  "documents": [
    {
      "id": "chunk_abc123",
      "text": "[Source: example.com | Page: Pricing | Section: Enterprise]\n\nThe Enterprise plan includes...",
      "metadata": {
        "source_url": "https://example.com/pricing",
        "title": "Pricing Plans",
        "section": "Enterprise",
        "parent_path": "Pricing > Enterprise",
        "word_count": 156,
        "chunk_index": 3,
        "total_chunks": 12
      },
      "questions": [
        {
          "q": "What is included in the Enterprise plan?",
          "a": "Unlimited API calls and dedicated support",
          "confidence": 0.95
        }
      ]
    }
  ],
  "summary": {
    "total_documents": 12,
    "total_questions": 48,
    "pii_redacted_count": 3,
    "processing_time_ms": 4521
  }
}
```

### Vector Database Formats

#### Pinecone

```json
{
  "vectors": [
    {
      "id": "chunk_abc123",
      "metadata": {
        "text": "The Enterprise plan includes...",
        "source_url": "https://example.com/pricing",
        "section": "Enterprise"
      }
    }
  ]
}
```

#### Qdrant

```json
{
  "points": [
    {
      "id": "chunk_abc123",
      "payload": {
        "content": "The Enterprise plan includes...",
        "source_url": "https://example.com/pricing",
        "questions": [...]
      }
    }
  ]
}
```

#### Chroma

```json
{
  "documents": ["The Enterprise plan includes..."],
  "metadatas": [{"source_url": "...", "section": "..."}],
  "ids": ["chunk_abc123"]
}
```

#### Milvus

```json
{
  "entities": [
    {
      "id": "chunk_abc123",
      "content": "The Enterprise plan includes...",
      "metadata": {...}
    }
  ]
}
```

### Configuration Options

#### Chunking Config

| Option | Default | Description |
|--------|---------|-------------|
| `splitOn` | `["##", "###"]` | Markdown header levels to split on |
| `minChunkSize` | `100` | Minimum characters per chunk |
| `maxChunkSize` | `2000` | Maximum characters per chunk |
| `overlapSize` | `50` | Characters to overlap between chunks |
| `preserveTables` | `true` | Keep tables as atomic units |
| `preserveCodeBlocks` | `true` | Keep code blocks as atomic units |

#### PII Config

| Option | Default | Description |
|--------|---------|-------------|
| `enabled` | `true` | Enable PII scrubbing |
| `redactEmails` | `true` | Redact email addresses |
| `redactPhones` | `true` | Redact phone numbers |
| `redactSSN` | `true` | Redact Social Security Numbers |
| `redactCreditCards` | `true` | Redact credit card numbers |
| `whitelist` | `[]` | Patterns to preserve (e.g., `*@company.com`) |
| `customPatterns` | `[]` | Custom regex patterns to redact |

#### Other Options

| Option | Default | Description |
|--------|---------|-------------|
| `outputFormat` | `raw` | Output format (12 options - see above) |
| `generateQA` | `true` | Generate Q\&A pairs for each chunk |
| `questionsPerChunk` | `3` | Number of Q\&A pairs per chunk (1-10) |
| `stealthLevel` | `2` | Anti-bot protection (1-3) |
| `waitForTimeout` | `30000` | Page load timeout in ms |

**Note**: OpenAI API key is only required when `generateQA: true`. Set `generateQA: false` for faster, cheaper runs without Q\&A generation.

### n8n Integration

RAG-Architect output drops directly into the n8n Vector Store Node:

```
[RAG-Architect Actor] → [HTTP Request] → [Vector Store Node] → [Pinecone/Weaviate/Supabase]
```

#### Example n8n Workflow

1. **HTTP Request Node**: Call RAG-Architect Actor
2. **Split In Batches**: Process documents in batches
3. **OpenAI Embeddings**: Generate embeddings
4. **Vector Store Insert**: Store in your database

### Pricing

**Pay-per-use on Apify platform (compute costs only)**

| Mode | Avg Processing Time | Est. Cost |
|------|---------------------|-----------|
| With Q\&A (generateQA: true) | ~30s per URL | ~$0.02-0.05 per URL |
| Without Q\&A (generateQA: false) | ~8s per URL | ~$0.01 per URL |
| OpenAI API (your key) | N/A | ~$0.002 per chunk |

**Example**: 100 URLs with Q\&A → ~$5 Apify + ~$2 OpenAI = ~$7 total

### vs. Website Content Crawler

| Feature | Website Content Crawler | RAG-Architect |
|---------|------------------------|---------------|
| Chunking | Fixed token count | Structure-aware (headers) |
| Tables | May split mid-row | Preserved whole |
| Context | Lost between chunks | Injected header |
| Q\&A | None | AI-generated with audit |
| PII | None | Auto-scrubbed |
| Output | Raw text | Vector-store-ready JSON |

### Technical Architecture

```
URL Input
    ↓
┌─────────────────────────────────────┐
│        Playwright Crawler           │
│  (Stealth Mode + Anti-Bot Evasion)  │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│        Content Extraction           │
│  (Readability.js + Metadata)        │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│      Structure-Aware Chunking       │
│  • Header Splitter                  │
│  • Table Guard                      │
│  • Code Guard                       │
│  • Context Injector                 │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│          Enrichment Layer           │
│  • Q&A Generator (GPT-4o-mini)      │
│  • Self-Reflection Audit            │
│  • PII Scrubber                     │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│        Output Formatter (12)        │
│  raw | csv | markdown | langchain   │
│  llamaindex | n8n | pinecone        │
│  weaviate | supabase | chroma       │
│  qdrant | milvus                    │
└─────────────────────────────────────┘
    ↓
Ready for AI
```

### Use Cases

1. **AI Chatbot Knowledge Bases**: Build hallucination-free chatbots
2. **Enterprise RAG Systems**: Clean, compliant knowledge bases
3. **Competitive Intelligence**: Extract structured intel from competitor sites
4. **Documentation Processing**: Convert docs to searchable knowledge
5. **Legal/Medical Compliance**: PII-scrubbed, audit-ready data

### Requirements

- Apify account
- OpenAI API key (for Q\&A generation)
- Vector database (optional)

### Support

- **Author**: Jason Pellerin (AI Solutionist)
- **Issues**: Report on Apify Actor page
- **Website**: [jasonpellerin.com](https://jasonpellerin.com)

### License

MIT License - Use freely for commercial and personal projects.

***

*Built for the "Nerd" (Agency Owner or Dev) who's drowning in "Data Debt."*
*RAG-Architect: The cleanroom for AI data.*

# Actor input Schema

## `urls` (type: `array`):

List of web page URLs to extract and chunk (one per line).

## `sitemapUrl` (type: `string`):

Process all URLs from a sitemap (takes precedence over individual URLs).

## `openaiApiKey` (type: `string`):

Required for Q\&A generation. Get one at platform.openai.com

## `outputFormat` (type: `string`):

Choose the format that matches your tech stack.

## `generateQA` (type: `boolean`):

Create 3-5 battle-tested questions per chunk using GPT-4o-mini.

## `questionsPerChunk` (type: `integer`):

Number of Q\&A pairs to generate for each chunk.

## `chunkingConfig` (type: `object`):

Configure how content is split into chunks.

## `piiConfig` (type: `object`):

Configure automatic detection and redaction of personal information.

## `stealthLevel` (type: `integer`):

Anti-bot protection level. 1=Basic, 2=Standard (residential proxies), 3=Elite.

## `waitForSelector` (type: `string`):

CSS selector to wait for before extracting (for JS-heavy pages).

## `waitForTimeout` (type: `integer`):

Maximum time to wait for page load.

## `excludeSelectors` (type: `string`):

CSS selectors to remove before extraction (comma-separated).

## `proxyConfiguration` (type: `object`):

Configure proxy for requests. Leave empty to use Apify's automatic proxy.

## Actor input object example

```json
{
  "urls": [
    "https://example.com"
  ],
  "outputFormat": "raw",
  "generateQA": true,
  "questionsPerChunk": 5,
  "chunkingConfig": {
    "splitOn": [
      "##",
      "###"
    ],
    "minChunkSize": 100,
    "maxChunkSize": 2000,
    "overlapSize": 50,
    "preserveTables": true,
    "preserveCodeBlocks": true
  },
  "piiConfig": {
    "enabled": true,
    "redactEmails": true,
    "redactPhones": true,
    "redactSSN": true,
    "redactCreditCards": true,
    "whitelist": []
  },
  "stealthLevel": 2,
  "waitForTimeout": 30000,
  "excludeSelectors": "nav, footer, .ads, .sidebar",
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# Actor output Schema

## `url` (type: `string`):

Source URL

## `title` (type: `string`):

Document title

## `chunkId` (type: `string`):

Unique chunk identifier

## `chunkText` (type: `string`):

Chunk text content

## `tokenCount` (type: `string`):

Token count for embedding

## `chunkType` (type: `string`):

Type of chunk (paragraph, heading, list)

## `position` (type: `string`):

Position in document

## `totalChunks` (type: `string`):

Total chunks from this document

## `contentHash` (type: `string`):

SHA-256 hash for deduplication

## `processingTimeMs` (type: `string`):

Processing time

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://example.com"
    ],
    "excludeSelectors": "nav, footer, .ads, .sidebar"
};

// Run the Actor and wait for it to finish
const run = await client.actor("ai_solutionist/rag-architect").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": ["https://example.com"],
    "excludeSelectors": "nav, footer, .ads, .sidebar",
}

# Run the Actor and wait for it to finish
run = client.actor("ai_solutionist/rag-architect").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://example.com"
  ],
  "excludeSelectors": "nav, footer, .ads, .sidebar"
}' |
apify call ai_solutionist/rag-architect --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=ai_solutionist/rag-architect",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Rag Architect",
        "description": "Transform any website into vector-store-ready knowledge chunks for Pinecone, Weaviate, LangChain, LlamaIndex, Supabase, n8n & more. AI-generated Q&A pairs, smart chunking, PII scrubbing. Build hallucination-free RAG chatbots in minutes.",
        "version": "1.0",
        "x-build-id": "7JhfdCZczATgsGueE"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/ai_solutionist~rag-architect/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-ai_solutionist-rag-architect",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/ai_solutionist~rag-architect/runs": {
            "post": {
                "operationId": "runs-sync-ai_solutionist-rag-architect",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/ai_solutionist~rag-architect/run-sync": {
            "post": {
                "operationId": "run-sync-ai_solutionist-rag-architect",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "URLs to Process",
                        "type": "array",
                        "description": "List of web page URLs to extract and chunk (one per line).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "sitemapUrl": {
                        "title": "Sitemap URL",
                        "type": "string",
                        "description": "Process all URLs from a sitemap (takes precedence over individual URLs)."
                    },
                    "openaiApiKey": {
                        "title": "OpenAI API Key",
                        "type": "string",
                        "description": "Required for Q&A generation. Get one at platform.openai.com"
                    },
                    "outputFormat": {
                        "title": "Output Format",
                        "enum": [
                            "raw",
                            "langchain",
                            "llamaindex",
                            "n8n",
                            "pinecone",
                            "weaviate",
                            "supabase",
                            "chroma",
                            "qdrant",
                            "milvus",
                            "csv",
                            "markdown"
                        ],
                        "type": "string",
                        "description": "Choose the format that matches your tech stack.",
                        "default": "raw"
                    },
                    "generateQA": {
                        "title": "Generate Q&A Pairs",
                        "type": "boolean",
                        "description": "Create 3-5 battle-tested questions per chunk using GPT-4o-mini.",
                        "default": true
                    },
                    "questionsPerChunk": {
                        "title": "Questions per Chunk",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Number of Q&A pairs to generate for each chunk.",
                        "default": 5
                    },
                    "chunkingConfig": {
                        "title": "Chunking Configuration",
                        "type": "object",
                        "description": "Configure how content is split into chunks.",
                        "properties": {
                            "splitOn": {
                                "title": "Split on Headers",
                                "description": "Markdown header levels to split on (e.g., ##, ###)",
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "minChunkSize": {
                                "title": "Minimum Chunk Size",
                                "description": "Minimum characters per chunk",
                                "type": "integer"
                            },
                            "maxChunkSize": {
                                "title": "Maximum Chunk Size",
                                "description": "Maximum characters per chunk",
                                "type": "integer"
                            },
                            "overlapSize": {
                                "title": "Overlap Size",
                                "description": "Characters to overlap between chunks",
                                "type": "integer"
                            },
                            "preserveTables": {
                                "title": "Preserve Tables",
                                "description": "Keep tables as atomic units",
                                "type": "boolean"
                            },
                            "preserveCodeBlocks": {
                                "title": "Preserve Code Blocks",
                                "description": "Keep code blocks as atomic units",
                                "type": "boolean"
                            }
                        },
                        "default": {
                            "splitOn": [
                                "##",
                                "###"
                            ],
                            "minChunkSize": 100,
                            "maxChunkSize": 2000,
                            "overlapSize": 50,
                            "preserveTables": true,
                            "preserveCodeBlocks": true
                        }
                    },
                    "piiConfig": {
                        "title": "PII Scrubbing",
                        "type": "object",
                        "description": "Configure automatic detection and redaction of personal information.",
                        "properties": {
                            "enabled": {
                                "title": "Enable PII Scrubbing",
                                "description": "Enable automatic detection and redaction of personal information",
                                "type": "boolean"
                            },
                            "redactEmails": {
                                "title": "Redact Emails",
                                "description": "Automatically redact email addresses",
                                "type": "boolean"
                            },
                            "redactPhones": {
                                "title": "Redact Phone Numbers",
                                "description": "Automatically redact phone numbers",
                                "type": "boolean"
                            },
                            "redactSSN": {
                                "title": "Redact SSN",
                                "description": "Automatically redact Social Security Numbers",
                                "type": "boolean"
                            },
                            "redactCreditCards": {
                                "title": "Redact Credit Cards",
                                "description": "Automatically redact credit card numbers",
                                "type": "boolean"
                            },
                            "whitelist": {
                                "title": "Whitelist",
                                "description": "Patterns to preserve (e.g., *@company.com)",
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            }
                        },
                        "default": {
                            "enabled": true,
                            "redactEmails": true,
                            "redactPhones": true,
                            "redactSSN": true,
                            "redactCreditCards": true,
                            "whitelist": []
                        }
                    },
                    "stealthLevel": {
                        "title": "Stealth Level",
                        "minimum": 1,
                        "maximum": 3,
                        "type": "integer",
                        "description": "Anti-bot protection level. 1=Basic, 2=Standard (residential proxies), 3=Elite.",
                        "default": 2
                    },
                    "waitForSelector": {
                        "title": "Wait for Selector",
                        "type": "string",
                        "description": "CSS selector to wait for before extracting (for JS-heavy pages)."
                    },
                    "waitForTimeout": {
                        "title": "Wait Timeout (ms)",
                        "minimum": 1000,
                        "maximum": 60000,
                        "type": "integer",
                        "description": "Maximum time to wait for page load.",
                        "default": 30000
                    },
                    "excludeSelectors": {
                        "title": "Exclude Selectors",
                        "type": "string",
                        "description": "CSS selectors to remove before extraction (comma-separated)."
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Configure proxy for requests. Leave empty to use Apify's automatic proxy.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
