# Schema Markup Scraper & SEO Auditor (`autofacts/metadata-scraper`) Actor

Extract JSON-LD, Microdata, RDFa, Open Graph & Twitter Cards. Runs a 0-100 SEO audit — checks canonical, hreflang, headings, image alt, EEAT author signals. Detects 80+ schema.org types including LocalBusiness with NAP, geo coordinates, and Google Place IDs.

- **URL**: https://apify.com/autofacts/metadata-scraper.md
- **Developed by:** [Richard Feng](https://apify.com/autofacts) (community)
- **Categories:** Developer tools, Automation, SEO tools
- **Stats:** 146 total users, 18 monthly users, 100.0% runs succeeded, 8 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Schema Markup Scraper & SEO Analyzer

Extract structured data, metadata, and SEO signals from any web page. Built for technical SEO auditing, local business intelligence, content aggregation, and competitive analysis.

### What it does

This scraper visits one or more URLs and extracts everything a search engine sees: structured data (JSON-LD, Microdata, RDFa), social meta tags (Open Graph, Twitter Cards), and dozens of SEO signals. It then runs an automated audit and returns a 0-100 SEO score with actionable issues — aligned with Google's 2025 ranking signals and EEAT guidelines.

### Key capabilities

#### Structured data extraction
- **JSON-LD** — Parses all `<script type="application/ld+json">` blocks, including nested `@graph` structures
- **Microdata** — Extracts `itemscope`/`itemprop` schema.org markup with full nesting support
- **RDFa** *(opt-in)* — Parses `typeof`/`property`/`vocab` attributes with schema.org vocabulary resolution
- **Schema type detection** — Identifies all schema.org types present (Product, Article, LocalBusiness, BreadcrumbList, etc.)

#### Social & meta tags
- **Open Graph** — All `og:*` properties (title, description, image, type, locale, etc.)
- **Twitter Cards** — All `twitter:*` properties with special handling for `summary_large_image`
- **Dublin Core** — `DC.*` and `DCTerms.*` academic/institutional metadata
- **Standard meta tags** — viewport, description, keywords, robots, theme-color, and all others

#### SEO analysis
- **Canonical URL** — Detects `<link rel="canonical">`
- **Robots meta** — Extracts directives for `robots`, `googlebot`, `bingbot`, etc.
- **Heading hierarchy** — Maps H1–H6 structure, counts H1 tags, detects skipped levels
- **Image alt text audit** — Counts images with/without alt attributes, calculates coverage percentage
- **Viewport & charset** — Verifies mobile-first indexing prerequisites
- **SEO score (0-100)** — Automated audit checking 15 Google ranking signals with error/warning/info severity

#### International SEO
- **Hreflang tags** — Extracts all `<link rel="alternate" hreflang="...">` with built-in validation:
  - Flags missing `x-default` fallback
  - Validates ISO 639-1 language codes (catches common mistakes like `en-UK` → should be `en-GB`)
  - Detects missing self-referencing tags
- **Language detection** — `<html lang>`, `<meta http-equiv="content-language">`, `og:locale`

#### EEAT & author signals
- **Author extraction** — Pulls author info from JSON-LD (`Person` type with `sameAs` links), `<meta name="author">`, and `<a rel="author">`
- **Article metadata** — `datePublished`, `dateModified`, `headline`, `wordCount`, `publisher` from Article/NewsArticle/BlogPosting schema

#### Local / Geo SEO
- **LocalBusiness extraction** — Detects 80+ schema.org LocalBusiness subtypes (Restaurant, Hotel, Dentist, Store, etc.) and extracts NAP, geo coordinates, opening hours, price range
- **NAP (Name/Address/Phone)** — From any Organization or LocalBusiness schema
- **Geo meta tags** — `geo.region`, `geo.placename`, `geo.position`, `ICBM`
- **Google Maps references** — Embedded map iframes, Place IDs, CID numbers
- **hCard/vCard** *(opt-in)* — `.vcard`/`.h-card` microformat contact data

#### Breadcrumb validation
- Extracts `BreadcrumbList` schema items with position, name, and URL
- Validates sequential positions and flags relative URLs

### Input parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `startUrls` | Array | *(required)* | URLs to scrape |
| `proxy` | Object | Apify Proxy | Proxy configuration |
| `maxRequestsPerCrawl` | Integer | `100` | Maximum pages to scrape (1–100,000) |
| `maxConcurrency` | Integer | `10` | Parallel pages (1–100) |
| `extractMetaTags` | Boolean | `true` | Extract all meta tags |
| `extractSeoAnalysis` | Boolean | `true` | SEO signals: canonical, hreflang, robots, headings, author, images, breadcrumbs, Dublin Core, viewport, charset |
| `extractGeoData` | Boolean | `true` | Geo tags, LocalBusiness, NAP, Google Maps references |
| `computeSeoScore` | Boolean | `true` | Run SEO audit (0-100 score + issues list) |
| `extractRdfa` | Boolean | `false` | RDFa structured data (opt-in) |
| `extractHCard` | Boolean | `false` | hCard/vCard microformats (opt-in) |

### Output fields

Each scraped page produces a JSON object with these fields:

#### Core metadata
| Field | Type | Description |
|-------|------|-------------|
| `url` | String | Final URL after redirects |
| `title` | String | Page `<title>` content |
| `icon` | String | Favicon/apple-touch-icon URL |
| `linkedData` | Array | JSON-LD structured data blocks |
| `microdata` | Array | Microdata (schema.org) items |
| `openGraph` | Object | Open Graph properties |
| `twitterCard` | Object | Twitter Card properties |
| `metaTags` | Object | All other meta tags |

#### SEO analysis (when `extractSeoAnalysis` is enabled)
| Field | Type | Description |
|-------|------|-------------|
| `canonical` | String/null | Canonical URL |
| `robotsMeta` | Object | Robots directives (`{ robots: "index, follow", googlebot: "noarchive" }`) |
| `hreflang` | Object | `{ tags: [{lang, url}], hasXDefault: bool, issues: string[] }` |
| `language` | Object | `{ htmlLang, contentLanguage, ogLocale }` |
| `dublinCore` | Object | Dublin Core metadata |
| `viewport` | String/null | Viewport meta tag content |
| `charset` | String/null | Character encoding |
| `headings` | Object | `{ headings: [{level, text}], h1Count, issues[] }` |
| `imageAudit` | Object | `{ totalImages, imagesWithAlt, imagesWithoutAlt, altTexts[], issues[] }` |
| `authorInfo` | Object/null | `{ name, url, sameAs[], jobTitle, source }` |
| `schemaTypes` | Array | All schema.org types detected (e.g., `["Product", "BreadcrumbList"]`) |
| `articleMetadata` | Object/null | `{ datePublished, dateModified, headline, description, wordCount, publisher }` |
| `breadcrumbs` | Object/null | `{ items: [{position, name, url}], issues[] }` |

#### Geo / Local SEO (when `extractGeoData` is enabled)
| Field | Type | Description |
|-------|------|-------------|
| `geoTags` | Object/null | Geo meta tags (`{ region, placename, position, icbm }`) |
| `localBusiness` | Object/null | LocalBusiness schema data with NAP, geo coordinates, opening hours |
| `nap` | Object/null | Name, Address, Phone from Organization/LocalBusiness |
| `mapReferences` | Object/null | `{ googleMapsEmbeds[], placeIds[], cids[] }` |

#### SEO audit (when `computeSeoScore` is enabled)
| Field | Type | Description |
|-------|------|-------------|
| `seoAudit` | Object | `{ score: 0-100, issues: [{severity, code, message}] }` |

#### Optional extractors
| Field | Type | Description |
|-------|------|-------------|
| `rdfa` | Array | RDFa structured data (when `extractRdfa` is enabled) |
| `hCards` | Array | hCard/vCard contact data (when `extractHCard` is enabled) |

### SEO audit checks

The audit starts at 100 and deducts points for each issue found:

| Check | Severity | Points | What it catches |
|-------|----------|--------|-----------------|
| Missing `<title>` | Error | -10 | Core ranking signal |
| Title > 60 chars | Warning | -5 | SERP truncation |
| Missing meta description | Error | -10 | CTR impact |
| Description > 160 chars | Warning | -5 | SERP truncation |
| Missing or multiple H1 | Error | -10 | Content hierarchy |
| Missing canonical URL | Warning | -5 | Duplicate content risk |
| Missing og:title | Warning | -5 | Social share CTR |
| Missing og:description | Warning | -5 | Social share CTR |
| Missing og:image | Warning | -5 | Social CTR (40-60% impact) |
| No structured data | Warning | -5 | Rich results eligibility |
| Missing favicon | Warning | -5 | Brand trust signal |
| Missing viewport | Warning | -5 | Mobile-first indexing |
| Missing hreflang x-default | Warning | -5 | International SEO trust |
| Missing author (on articles) | Warning | -5 | EEAT signal |
| Missing datePublished (on articles) | Warning | -5 | Freshness signal |
| > 50% images without alt | Warning | -5 | Accessibility + AI |
| No BreadcrumbList (deep pages) | Info | -1 | Navigation hierarchy |

### Use cases

#### Technical SEO audit
Crawl your site and get an instant SEO health check across every page. Identify missing titles, broken heading hierarchies, absent structured data, and more — with a prioritized issues list.

#### E-commerce competitive analysis
Extract product schema (pricing, availability, reviews, return policies), breadcrumb structures, and rich snippet eligibility from competitor product pages.

#### Local business intelligence
Scrape LocalBusiness schema from directories, review sites, or business websites. Extract NAP data, opening hours, geo coordinates, Google Place IDs, and CID numbers for lead generation or data enrichment.

#### International SEO validation
Audit hreflang implementations across multilingual sites. Catch the errors that 75% of implementations contain: missing x-default fallbacks, invalid ISO codes, missing self-references.

#### EEAT & content analysis
Extract author information, `sameAs` links to verified profiles, publication dates, and publisher data from article pages. Monitor how well your content signals expertise and authority.

#### Social media preview testing
Verify how pages will appear when shared on Facebook (Open Graph) and Twitter/X (Twitter Cards). Check for missing images, truncated descriptions, and incomplete metadata.

#### Content aggregation
Build news aggregators or content feeds by extracting article metadata, publication dates, authors, and descriptions from multiple sources in a single crawl.

### Example output

#### News article (CNN)

```json
{
  "url": "https://edition.cnn.com/2025/04/18/politics/...",
  "title": "Supreme Court temporarily pauses deportations under Alien Enemies Act",
  "canonical": "https://www.cnn.com/2025/04/18/politics/...",
  "language": {
    "htmlLang": "en",
    "contentLanguage": null,
    "ogLocale": "en_US"
  },
  "hreflang": {
    "tags": [
      { "lang": "en-gb", "url": "https://edition.cnn.com/..." },
      { "lang": "en-us", "url": "https://www.cnn.com/..." },
      { "lang": "x-default", "url": "https://edition.cnn.com/..." }
    ],
    "hasXDefault": true,
    "issues": ["Missing self-referencing hreflang tag"]
  },
  "authorInfo": {
    "name": "Tierney Sneed, John Fritze",
    "url": null,
    "sameAs": [],
    "jobTitle": null,
    "source": "meta"
  },
  "schemaTypes": ["NewsArticle", "Person", "ImageObject", "Organization", "WebPage", "NewsMediaOrganization"],
  "seoAudit": {
    "score": 79,
    "issues": [
      { "severity": "warning", "code": "TITLE_TOO_LONG", "message": "Title is 94 chars (recommended: max 60)" },
      { "severity": "warning", "code": "DESCRIPTION_TOO_LONG", "message": "Meta description is 267 chars (recommended: max 160)" },
      { "severity": "warning", "code": "MISSING_DATE_PUBLISHED", "message": "Article page missing datePublished (impacts freshness signals)" },
      { "severity": "info", "code": "NO_BREADCRUMBS", "message": "Deep page with no BreadcrumbList schema (helps navigation hierarchy)" }
    ]
  }
}
````

#### E-commerce product (Farfetch)

```json
{
  "url": "https://www.farfetch.com/shopping/women/jacquemus-les-doubles-sandals-item-28543291.aspx",
  "title": "Jacquemus Les Doubles Sandals | Brown | FARFETCH",
  "canonical": "https://www.farfetch.com/shopping/women/jacquemus-les-doubles-sandals-item-28543291.aspx",
  "schemaTypes": ["ProductGroup", "ImageObject", "Brand", "Product", "Offer", "MerchantReturnPolicy", "UnitPriceSpecification", "BreadcrumbList", "ListItem"],
  "breadcrumbs": {
    "items": [
      { "position": 1, "name": "Women Home", "url": "/shopping/women/items.aspx" },
      { "position": 2, "name": "Jacquemus", "url": "/shopping/women/jacquemus/items.aspx" },
      { "position": 3, "name": "Shoes", "url": "/shopping/women/jacquemus/shoes-1/items.aspx" },
      { "position": 4, "name": "Heeled Sandals", "url": "/shopping/women/jacquemus/heeled-sandals-1/items.aspx" }
    ],
    "issues": ["Relative URL in breadcrumb position 1: \"/shopping/women/items.aspx\""]
  },
  "headings": {
    "h1Count": 1,
    "issues": ["Skipped heading level: h2 to h4"]
  },
  "imageAudit": {
    "totalImages": 6,
    "imagesWithAlt": 5,
    "imagesWithoutAlt": 1,
    "issues": ["1 image(s) missing alt attribute"]
  },
  "robotsMeta": { "robots": "noindex" },
  "seoAudit": { "score": 100, "issues": [] }
}
```

#### Wikipedia (RDFa + hCard)

```json
{
  "url": "https://en.wikipedia.org/wiki/San_Francisco",
  "title": "San Francisco - Wikipedia",
  "canonical": "https://en.wikipedia.org/wiki/San_Francisco",
  "schemaTypes": ["Article", "Organization", "ImageObject"],
  "authorInfo": {
    "name": "Contributors to Wikimedia projects",
    "source": "json-ld"
  },
  "articleMetadata": {
    "datePublished": "2001-11-13T04:30:40Z",
    "dateModified": "2026-03-28T17:14:57Z",
    "headline": "consolidated city and county in California, United States",
    "publisher": { "name": "Wikimedia Foundation, Inc." }
  },
  "rdfa": ["... 114 RDFa items extracted ..."],
  "hCards": [{ "name": "San Francisco", "... ": "..." }],
  "imageAudit": {
    "totalImages": 119,
    "imagesWithAlt": 43,
    "imagesWithoutAlt": 76,
    "issues": ["64% of images missing alt text (accessibility + AI understanding)"]
  },
  "seoAudit": {
    "score": 80,
    "issues": [
      { "severity": "error", "code": "MISSING_DESCRIPTION", "message": "Page has no meta description" },
      { "severity": "warning", "code": "MISSING_OG_DESCRIPTION", "message": "Missing og:description meta tag" },
      { "severity": "warning", "code": "IMAGES_MISSING_ALT", "message": "64% of images missing alt text" }
    ]
  }
}
```

### Technical details

- **Engine**: CheerioCrawler (server-side HTML parsing, no JavaScript execution)
- **Runtime**: Node.js 22, Apify SDK 3.5.3, Crawlee 3.15.3
- **Performance**: Lightweight and fast — no browser overhead
- **Sessions**: Automatic session rotation with cookie persistence
- **Proxies**: Full proxy support including Apify Proxy residential groups
- **Retries**: Up to 3 retries per request with automatic session rotation on blocks
- **Link following**: Optional crawling with configurable depth

### Limitations

- **No JavaScript rendering** — Pages that require JS to load content (e.g., SPAs, IMDB) will return minimal data. Use a browser-based scraper for these.
- **Anti-bot protection** — Some sites (Yelp, BBC, Medium) may block requests even with residential proxies. Results depend on proxy quality.
- **RDFa complexity** — The RDFa extractor handles the common schema.org vocabulary case. Exotic namespace prefixes may not be fully resolved.

# Actor input Schema

## `startUrls` (type: `array`):

URLs to start crawling from

## `proxy` (type: `object`):

Select proxies to be used by your crawler.

## `maxRequestsPerCrawl` (type: `integer`):

Maximum number of pages that the scraper will open. The crawl will stop when this limit is reached.

## `maxConcurrency` (type: `integer`):

Maximum number of pages that will be processed in parallel.

## `extractMetaTags` (type: `boolean`):

Extract all meta tags from the page

## `extractSeoAnalysis` (type: `boolean`):

Extract SEO signals: canonical URL, hreflang (with validation), robots meta, heading hierarchy, author/EEAT info, image alt audit, article metadata, breadcrumbs, Dublin Core, viewport, and charset

## `extractGeoData` (type: `boolean`):

Extract geo meta tags, LocalBusiness schema, NAP (Name/Address/Phone), and Google Maps/Place ID references

## `computeSeoScore` (type: `boolean`):

Run an SEO audit producing a 0-100 score and list of issues aligned with Google ranking signals

## `extractRdfa` (type: `boolean`):

Extract RDFa structured data (niche format, opt-in)

## `extractHCard` (type: `boolean`):

Extract hCard/vCard microformat contact data (niche format, opt-in)

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://apify.com"
    }
  ],
  "proxy": {
    "useApifyProxy": true
  },
  "maxRequestsPerCrawl": 100,
  "maxConcurrency": 10,
  "extractMetaTags": true,
  "extractSeoAnalysis": true,
  "extractGeoData": true,
  "computeSeoScore": true,
  "extractRdfa": false,
  "extractHCard": false
}
```

# Actor output Schema

## `metadata` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://apify.com"
        }
    ],
    "proxy": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("autofacts/metadata-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://apify.com" }],
    "proxy": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("autofacts/metadata-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://apify.com"
    }
  ],
  "proxy": {
    "useApifyProxy": true
  }
}' |
apify call autofacts/metadata-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=autofacts/metadata-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Schema Markup Scraper & SEO Auditor",
        "description": "Extract JSON-LD, Microdata, RDFa, Open Graph & Twitter Cards. Runs a 0-100 SEO audit — checks canonical, hreflang, headings, image alt, EEAT author signals. Detects 80+ schema.org types including LocalBusiness with NAP, geo coordinates, and Google Place IDs.",
        "version": "0.0",
        "x-build-id": "bKgMIgNqk39gU0BlS"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/autofacts~metadata-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-autofacts-metadata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/autofacts~metadata-scraper/runs": {
            "post": {
                "operationId": "runs-sync-autofacts-metadata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/autofacts~metadata-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-autofacts-metadata-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "URLs to start crawling from",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "proxy": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Select proxies to be used by your crawler."
                    },
                    "maxRequestsPerCrawl": {
                        "title": "Max Requests Per Crawl",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Maximum number of pages that the scraper will open. The crawl will stop when this limit is reached.",
                        "default": 100
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrency",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of pages that will be processed in parallel.",
                        "default": 10
                    },
                    "extractMetaTags": {
                        "title": "Extract Meta Tags",
                        "type": "boolean",
                        "description": "Extract all meta tags from the page",
                        "default": true
                    },
                    "extractSeoAnalysis": {
                        "title": "Extract SEO Analysis",
                        "type": "boolean",
                        "description": "Extract SEO signals: canonical URL, hreflang (with validation), robots meta, heading hierarchy, author/EEAT info, image alt audit, article metadata, breadcrumbs, Dublin Core, viewport, and charset",
                        "default": true
                    },
                    "extractGeoData": {
                        "title": "Extract Geo/Local SEO Data",
                        "type": "boolean",
                        "description": "Extract geo meta tags, LocalBusiness schema, NAP (Name/Address/Phone), and Google Maps/Place ID references",
                        "default": true
                    },
                    "computeSeoScore": {
                        "title": "Compute SEO Score",
                        "type": "boolean",
                        "description": "Run an SEO audit producing a 0-100 score and list of issues aligned with Google ranking signals",
                        "default": true
                    },
                    "extractRdfa": {
                        "title": "Extract RDFa",
                        "type": "boolean",
                        "description": "Extract RDFa structured data (niche format, opt-in)",
                        "default": false
                    },
                    "extractHCard": {
                        "title": "Extract hCard/vCard",
                        "type": "boolean",
                        "description": "Extract hCard/vCard microformat contact data (niche format, opt-in)",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
