# SEO Data Extractor (`nocodeventure/seo-data-extractor`) Actor

Extract comprehensive SEO metadata, headings, links, images, Open Graph tags, Twitter Cards, and technical data from websites. Perfect for SEO audits, competitor analysis, and content optimization. Runs on Apify platform with structured JSON output.

- **URL**: https://apify.com/nocodeventure/seo-data-extractor.md
- **Developed by:** [No-Code Venture](https://apify.com/nocodeventure) (community)
- **Categories:** Automation, SEO tools
- **Stats:** 31 total users, 2 monthly users, 100.0% runs succeeded, 2 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## SEO Data Extractor

> Extract comprehensive SEO metadata, headings, links, images, Open Graph tags, Twitter Cards, and technical data from websites. Perfect for SEO audits, competitor analysis, and content optimization. Runs on Apify platform with structured JSON output.

A comprehensive SEO data extraction tool that runs on the [Apify](https://apify.com/) platform.

### Features

Extract comprehensive SEO data from any webpage including:

- **Meta Information**: Title, description, keywords, robots directives, canonical URLs, author, and generator tags with length counts
- **Heading Structure**: All H1-H6 tags with text content and counts for each level
- **Content Analysis**: Word count, link analysis (total/internal/external), and image audit (total/without alt text)
- **Open Graph Tags**: Complete Open Graph metadata (title, description, image, URL, type, site name)
- **Twitter Cards**: Twitter Card metadata for social sharing
- **Technical SEO**: Status codes, response time, charset, language, viewport settings
- **Structured Data**: JSON-LD detection and schema type identification
- **Branding Assets**: Favicon, Apple touch icon, and theme color detection
- **Sitemap Extraction**: Optionally fetch and include all URLs from each domain's sitemap.xml
- **SSL Certificate Analysis**: Extract SSL/TLS certificate details including issuer, expiry, validity, and protocol version
- **Error Handling**: Graceful handling of HTTP errors (404, 500, etc.) with proper error codes and messages

### Use Cases

- **SEO Monitoring**: Track SEO data for your websites or competitors over time
- **Content Analysis**: Analyze meta tags to optimize webpage content for search engines
- **SEO Audits**: Collect data for comprehensive SEO audits across multiple pages
- **Competitor Analysis**: Track SEO data for your competitors
- **Bulk Data Extraction**: Process 1 to 100,000+ pages efficiently

### Input Configuration

| Field | Type | Description | Default |
|-------|------|-------------|---------|
| `startUrls` | Array | List of URLs to extract SEO data from | `https://nocodeventure.com` |
| `extractSitemapUrls` | Boolean | Fetch and include sitemap data for each domain | `false` |
| `sitemapUrl` | String | Custom sitemap path (e.g., `sitemap_index.xml` or `/sitemaps/main.xml`) | `/sitemap.xml` |
| `extractSslInfo` | Boolean | Extract SSL/TLS certificate information | `false` |
| `maxRequestsPerCrawl` | Integer | Maximum pages to scrape (0 = unlimited) | `100` |
| `requestTimeout` | Integer | Request timeout in seconds (3-10) | `5` |
| `maxConcurrency` | Integer | Parallel requests (1-50) | `10` |
| `maxRequestRetries` | Integer | Max retries for failed requests (0-5) | `1` |
| `proxyConfiguration` | Object | Proxy settings for anti-blocking | Apify Proxy disabled |

### Output Schema

The Actor returns structured JSON data with the following fields:

| Field | Type | Description |
|-------|------|-------------|
| `url` | String | The URL that was scraped |
| `scrapedAt` | String | ISO 8601 timestamp of when the page was scraped |
| `error` | String (optional) | Error code if scraping failed (e.g., "404", "500", "REQUEST_FAILED") |
| `errorMessage` | String (optional) | Human-readable error message |

#### Meta Information (`meta`)

| Field | Type | Description |
|-------|------|-------------|
| `title` | String | Page title from `<title>` tag |
| `titleLength` | Number | Character count of the title |
| `description` | String | Meta description content |
| `descriptionLength` | Number | Character count of the description |
| `keywords` | String | Meta keywords content |
| `robots` | String | Robots meta directive (e.g., "index, follow") |
| `canonical` | String | Canonical URL from meta tag |
| `author` | String | Author meta tag content |
| `generator` | String | Generator meta tag content |

#### Headings (`headings`)

| Field | Type | Description |
|-------|------|-------------|
| `h1.text` | String | Combined text content of all H1 tags |
| `h1.count` | Number | Number of H1 tags found |
| `h2.text` | String | Combined text content of all H2 tags |
| `h2.count` | Number | Number of H2 tags found |
| `h3.text` | String | Combined text content of all H3 tags |
| `h3.count` | Number | Number of H3 tags found |
| `h4.text` | String | Combined text content of all H4 tags |
| `h4.count` | Number | Number of H4 tags found |
| `h5.text` | String | Combined text content of all H5 tags |
| `h5.count` | Number | Number of H5 tags found |
| `h6.text` | String | Combined text content of all H6 tags |
| `h6.count` | Number | Number of H6 tags found |

#### Open Graph Tags (`openGraph`)

| Field | Type | Description |
|-------|------|-------------|
| `title` | String | Open Graph title |
| `description` | String | Open Graph description |
| `image` | String | Open Graph image URL |
| `url` | String | Open Graph URL |
| `type` | String | Open Graph type (e.g., "website", "article") |
| `siteName` | String | Open Graph site name |

#### Twitter Cards (`twitterCard`)

| Field | Type | Description |
|-------|------|-------------|
| `card` | String | Twitter card type (e.g., "summary", "summary_large_image") |
| `title` | String | Twitter card title |
| `description` | String | Twitter card description |
| `image` | String | Twitter card image URL |
| `site` | String | Twitter site handle |

#### Content Analysis (`content`)

| Field | Type | Description |
|-------|------|-------------|
| `wordCount` | Number | Total word count in page body |
| `links.total` | Number | Total number of links found |
| `links.internal` | Number | Number of internal links (same domain) |
| `links.external` | Number | Number of external links (different domain) |
| `images.total` | Number | Total number of images found |
| `images.withoutAlt` | Number | Number of images missing alt text |

#### Technical SEO (`technical`)

| Field | Type | Description |
|-------|------|-------------|
| `statusCode` | Number | HTTP response status code |
| `responseTime` | Number | Response time in milliseconds |
| `charset` | String | Character encoding (e.g., "UTF-8") |
| `language` | String | Page language from HTML lang attribute |
| `viewport` | String | Viewport meta tag content |
| `structuredData.hasStructuredData` | Boolean | Whether JSON-LD structured data was found |
| `structuredData.types` | Array | Array of structured data schema types found |

#### Branding Assets (`branding`)

| Field | Type | Description |
|-------|------|-------------|
| `favicon` | String | Favicon URL |
| `appleTouchIcon` | String | Apple touch icon URL |
| `themeColor` | String | Theme color meta tag content |

#### Sitemap Data (`sitemap`) - Optional

> **Note**: This field is only included when `extractSitemapUrls` is enabled. If the page scrape fails (HTTP error or request failure), the `sitemap` object will **not** be included in the output.

| Field | Type | Description |
|-------|------|-------------|
| `found` | Boolean | Whether a sitemap was found and parsed |
| `sitemapUrl` | String | The sitemap URL that was fetched |
| `isKnownPath` | Boolean | Whether a known/custom sitemap path was used (see below) |
| `urlCount` | Number | Total number of URLs found in the sitemap |
| `urls` | Array | List of all URLs from the sitemap |
| `error` | String (optional) | Error message if sitemap fetch failed |

**Example output with sitemap enabled:**

```json
{
  "url": "https://example.com",
  "meta": { ... },
  "sitemap": {
    "found": true,
    "sitemapUrl": "https://example.com/sitemap.xml",
    "isKnownPath": false,
    "urlCount": 156,
    "urls": [
      "https://example.com/",
      "https://example.com/about",
      "https://example.com/contact",
      ...
    ]
  },
  "scrapedAt": "2025-12-12T10:00:00.000Z"
}
````

**Sitemap caching**: If you have multiple URLs from the same domain, the sitemap is only fetched once and reused for all pages from that domain.

#### SSL Certificate Data (`ssl`) - Optional

> **Note**: This field is only included when `extractSslInfo` is enabled.

| Field | Type | Description |
|-------|------|-------------|
| `isHttps` | Boolean | Whether the site uses HTTPS |
| `isValid` | Boolean | Whether the SSL certificate is valid |
| `issuer` | String | Certificate issuer organization |
| `issuerCN` | String | Certificate issuer common name |
| `subject` | String | Certificate subject (domain) |
| `validFrom` | String | Certificate valid from date (ISO 8601) |
| `validTo` | String | Certificate expiry date (ISO 8601) |
| `daysUntilExpiry` | Number | Days until certificate expires (negative if expired) |
| `isExpired` | Boolean | Whether certificate is expired |
| `expiresSoon` | Boolean | Whether certificate expires within 30 days |
| `protocol` | String | SSL/TLS protocol version (e.g., "TLSv1.3") |
| `altNames` | Array | Subject Alternative Names (other domains covered) |
| `serialNumber` | String | Certificate serial number |
| `fingerprint` | String | Certificate fingerprint (SHA-256) |
| `error` | String (optional) | Error message if SSL check failed |

**Example output with SSL enabled:**

```json
{
  "url": "https://example.com",
  "meta": { ... },
  "ssl": {
    "isHttps": true,
    "isValid": true,
    "issuer": "Let's Encrypt",
    "issuerCN": "R3",
    "subject": "example.com",
    "validFrom": "2025-01-01T00:00:00.000Z",
    "validTo": "2025-04-01T00:00:00.000Z",
    "daysUntilExpiry": 90,
    "isExpired": false,
    "expiresSoon": false,
    "protocol": "TLSv1.3",
    "altNames": ["example.com", "www.example.com"],
    "serialNumber": "ABC123...",
    "fingerprint": "AA:BB:CC:..."
  },
  "scrapedAt": "2025-12-21T10:00:00.000Z"
}
```

**SSL caching**: If you have multiple URLs from the same domain, the SSL certificate is only checked once and reused for all pages from that domain.

##### Known Sitemap Paths

Some websites don't use the standard `/sitemap.xml` location. The Actor includes built-in support for these sites with `isKnownPath: true` in the output.

| Domain | Sitemap Location |
|--------|------------------|
| `amazon.com`, `www.amazon.com`, `aws.amazon.com` | `https://aws.amazon.com/ar/sitemaps/index/` |

When a known path is used, you'll see it in the logs:

```text
Using known sitemap path for www.amazon.com: https://aws.amazon.com/ar/sitemaps/index/
```

#### Error Output Example

When a URL returns an HTTP error (like 404), the Actor returns an error item instead of failing:

```json
{
  "url": "https://example.com/broken-link",
  "meta": {
    "title": "",
    "titleLength": 0,
    "description": "",
    "descriptionLength": 0,
    "keywords": "",
    "robots": "",
    "canonical": "",
    "author": "",
    "generator": ""
  },
  "technical": {
    "statusCode": 404,
    "responseTime": 150
  },
  "error": "404",
  "errorMessage": "Page not found",
  "scrapedAt": "2025-12-11T20:23:04.317Z"
}
```

This allows you to:

- Continue processing other URLs without failing the entire run
- Identify broken links and problematic URLs in your dataset
- Filter error results using the dedicated "Errors" view

### Output Views

The Actor provides multiple dataset views for different analysis needs:

- **SEO Overview**: Quick summary with URL, error status, title, description, canonical, robots, H1 count, and links
- **Errors**: Dedicated view for URLs that returned HTTP errors (404, 500, etc.) with error codes and messages
- **Heading Structure**: H1-H6 tags with text content and counts for each level
- **Open Graph**: Complete Open Graph metadata for social sharing
- **Twitter Cards**: Twitter Card metadata for social sharing
- **Content Analysis**: Word count, link breakdown (internal/external), and image audit data
- **Technical SEO**: HTTP status, response time, charset, language, viewport, and structured data
- **Branding Assets**: Favicon, Apple touch icon, and theme color information
- **Sitemap Data**: URLs found in each domain's sitemap (when sitemap extraction is enabled)
- **SSL Certificates**: Certificate validity, issuer, expiry dates, protocol version (when SSL extraction is enabled)

### How to Export

1. **Access Results**: After running, view collected data in Apify's interface
2. **Select Export Option**: Download as CSV, JSON, Excel, or XML
3. **Open in Tools**: Import into Excel, Google Sheets, or your analysis tool
4. **API Access**: Use the Apify API to integrate with your workflows

### Pricing Model

This Actor uses **Pay-Per-Event (PPE)** pricing with automatic charging via Apify's synthetic events:

- **Actor Start**: Charged automatically when the Actor starts
- **Dataset Item**: Charged automatically for each result pushed to the dataset

#### Error Handling & Billing

URLs that return HTTP errors (404, 500, etc.) are still charged because:

- The Actor had to make a request to discover the error
- Error items are returned with proper error codes and messages
- This allows you to identify broken links without failing the entire run

You can set a maximum spending limit in the Apify Console to control costs.

### What's Included

- **[Apify SDK](https://docs.apify.com/sdk/js)** - Toolkit for building Actors
- **[Input Schema](https://docs.apify.com/platform/actors/development/input-schema)** - Input validation
- **[Dataset](https://docs.apify.com/sdk/js/docs/concepts/storages#dataset)** - Structured data storage
- **[Proxy Configuration](https://docs.apify.com/platform/proxy)** - IP rotation for anti-blocking

### Limitations

**⚠️ JavaScript-Heavy Sites**: This tool primarily extracts data from static HTML. It may not capture content that loads dynamically via JavaScript, potentially resulting in incomplete data extraction.

### FAQ

#### Are duplicate URLs processed multiple times?

Yes. The Actor processes **every URL** in your input list, including duplicates. If you submit the same URL multiple times, it will be processed and charged each time.

**Tip**: Remove duplicates from your input list before running to save costs:

```text
https://example.com/page1  ← processed, charged
https://example.com/page1  ← processed again, charged again
https://example.com/page2  ← processed, charged
```

#### Am I charged for failed requests?

Yes. URLs that return HTTP errors (404, 500, etc.) or fail after retries are still charged because the Actor had to make a request to discover the error. However, you receive an error item in your dataset with the error code and message, so you know exactly what happened.

#### How can I control costs?

- Set a **maximum spending limit** in the Apify Console before running
- Use the `maxRequestsPerCrawl` input to limit the number of pages processed
- Remove duplicate URLs from your input list before running
- Set `maxRequestRetries` to 0 if you don't want failed requests to be retried

### Legal Disclaimer

#### ⚠️ Important Legal Notice

This tool is provided for educational and research purposes only. By using this SEO Data Extractor, you agree to:

- **Comply with all applicable laws**: You are solely responsible for ensuring your use of this tool complies with local, national, and international laws, including copyright laws, data protection regulations (such as GDPR, CCPA), and terms of service of target websites.

- **Respect website terms of service**: Many websites prohibit automated scraping in their terms of service. You must review and comply with each website's terms before using this tool.

- **Respect robots.txt**: This tool does not automatically check or respect robots.txt files. You are responsible for checking and honoring robots.txt directives.

- **Rate limiting and ethical use**: Use reasonable request rates and respect website operators. Excessive requests may constitute a denial-of-service attack.

- **Data privacy compliance**: Ensure your data collection and processing activities comply with privacy laws. Do not collect personal data without proper consent and legal basis.

- **No warranties**: This tool is provided "as is" without warranties of any kind. The authors are not responsible for any damages or legal consequences arising from its use.

- **Use at your own risk**: You assume all risks associated with using this tool. The authors disclaim all liability for any direct, indirect, incidental, or consequential damages.

**Before using this tool, consult with legal counsel to ensure compliance with applicable laws and regulations.**

# Actor input Schema

## `startUrls` (type: `array`):

List of URLs to extract SEO data from. Can be 1 to 100,000+ pages.

## `extractSitemapUrls` (type: `boolean`):

When enabled, automatically discovers and extracts all URLs from the website's sitemap.xml. The sitemap is detected from the first URL's domain.

## `sitemapUrl` (type: `string`):

Custom sitemap path or filename (e.g., sitemap\_index.xml or /sitemaps/main.xml). Applied to each domain. Leave blank to use /sitemap.xml

## `maxSitemapUrls` (type: `integer`):

Maximum number of URLs to include from each sitemap (some sites have 40K+ URLs).

## `extractSslInfo` (type: `boolean`):

When enabled, extracts SSL/TLS certificate information including issuer, expiry date, validity, and protocol version. Useful for security audits.

## `maxRequestsPerCrawl` (type: `integer`):

Maximum number of pages to scrape. Set to 0 for unlimited.

## `requestTimeout` (type: `integer`):

Maximum time in seconds to wait for each page to load.

## `maxConcurrency` (type: `integer`):

Maximum number of pages to scrape in parallel. Higher values are faster, may use more resources, but can be cheaper in some cases.

## `maxRequestRetries` (type: `integer`):

Maximum number of retries for failed requests. Set to 0 for no retries.

## `proxyConfiguration` (type: `object`):

Proxy settings to avoid getting blocked. Recommended for large crawls.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://nocodeventure.com"
    }
  ],
  "extractSitemapUrls": false,
  "maxSitemapUrls": 100,
  "extractSslInfo": false,
  "maxRequestsPerCrawl": 100,
  "requestTimeout": 5,
  "maxConcurrency": 10,
  "maxRequestRetries": 1,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

## `headings` (type: `string`):

No description

## `openGraph` (type: `string`):

No description

## `twitterCard` (type: `string`):

No description

## `content` (type: `string`):

No description

## `technical` (type: `string`):

No description

## `branding` (type: `string`):

No description

## `ssl` (type: `string`):

No description

## `fullData` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://nocodeventure.com"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("nocodeventure/seo-data-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://nocodeventure.com" }] }

# Run the Actor and wait for it to finish
run = client.actor("nocodeventure/seo-data-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://nocodeventure.com"
    }
  ]
}' |
apify call nocodeventure/seo-data-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=nocodeventure/seo-data-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "SEO Data Extractor",
        "description": "Extract comprehensive SEO metadata, headings, links, images, Open Graph tags, Twitter Cards, and technical data from websites. Perfect for SEO audits, competitor analysis, and content optimization. Runs on Apify platform with structured JSON output.",
        "version": "0.0",
        "x-build-id": "x6RFKMSOZWsYsHslE"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/nocodeventure~seo-data-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-nocodeventure-seo-data-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/nocodeventure~seo-data-extractor/runs": {
            "post": {
                "operationId": "runs-sync-nocodeventure-seo-data-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/nocodeventure~seo-data-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-nocodeventure-seo-data-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "URLs to Scrape",
                        "type": "array",
                        "description": "List of URLs to extract SEO data from. Can be 1 to 100,000+ pages.",
                        "default": [
                            {
                                "url": "https://nocodeventure.com"
                            }
                        ],
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "extractSitemapUrls": {
                        "title": "Extract URLs from Sitemap",
                        "type": "boolean",
                        "description": "When enabled, automatically discovers and extracts all URLs from the website's sitemap.xml. The sitemap is detected from the first URL's domain.",
                        "default": false
                    },
                    "sitemapUrl": {
                        "title": "Custom Sitemap Path",
                        "type": "string",
                        "description": "Custom sitemap path or filename (e.g., sitemap_index.xml or /sitemaps/main.xml). Applied to each domain. Leave blank to use /sitemap.xml"
                    },
                    "maxSitemapUrls": {
                        "title": "Max Sitemap URLs",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of URLs to include from each sitemap (some sites have 40K+ URLs).",
                        "default": 100
                    },
                    "extractSslInfo": {
                        "title": "Extract SSL Certificate Info",
                        "type": "boolean",
                        "description": "When enabled, extracts SSL/TLS certificate information including issuer, expiry date, validity, and protocol version. Useful for security audits.",
                        "default": false
                    },
                    "maxRequestsPerCrawl": {
                        "title": "Max Requests",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of pages to scrape. Set to 0 for unlimited.",
                        "default": 100
                    },
                    "requestTimeout": {
                        "title": "Request Timeout (seconds)",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Maximum time in seconds to wait for each page to load.",
                        "default": 5
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrency",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Maximum number of pages to scrape in parallel. Higher values are faster, may use more resources, but can be cheaper in some cases.",
                        "default": 10
                    },
                    "maxRequestRetries": {
                        "title": "Max Retries",
                        "minimum": 0,
                        "maximum": 2,
                        "type": "integer",
                        "description": "Maximum number of retries for failed requests. Set to 0 for no retries.",
                        "default": 1
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings to avoid getting blocked. Recommended for large crawls.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
