# HTML to JSON Smart Parser (`parseforge/html-to-json-smart-parser`) Actor

Convert HTML to structured JSON using AI! Uses OpenAI to extract and structure data from HTML into clean JSON format. Perfect for developers and data analysts who need to transform HTML into structured data without manual parsing.

- **URL**: https://apify.com/parseforge/html-to-json-smart-parser.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** AI, Developer tools, Automation
- **Stats:** 40 total users, 2 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 🧩 HTML to JSON Smart Parser

> 🚀 **Convert HTML into structured JSON in seconds.** Bring your own OpenAI API key. URL fetch, paste HTML, or upload files. No bespoke parsers.

> 🕒 **Last updated:** 2026-05-09 · **🧠 BYO OpenAI key** · **📥 URL / paste / file upload** · **🔑 BYO model selection**

Convert HTML into clean structured JSON without writing a parser per page. Provide one or more URLs, paste HTML directly, or upload HTML files, then specify (or auto-detect) which fields to extract. The actor sends the HTML to your OpenAI account using your API key, parses the response, and returns one structured record per input. Built for developers who want layout-agnostic HTML extraction without bespoke selector code.

You bring your own OpenAI API key, so all model usage is billed directly to your OpenAI account. Choose the model (gpt-4o, gpt-4o-mini, gpt-3.5-turbo, etc.) based on your accuracy and cost trade-offs.

| 👥 Built for | 🎯 Primary use cases |
|---|---|
| Developers | Skip writing CSS selectors and XPath queries |
| Data engineers | Build layout-agnostic data pipelines |
| AI ops | Convert HTML into structured prompts for LLM workflows |
| Researchers | Index HTML archives without bespoke parsers |
| Content ops | Migrate HTML content into structured DBs |
| Indie devs | Add HTML parsing to side projects without a parser |

---

### 📋 What the HTML to JSON Smart Parser does

- 🌐 **Three input modes.** URL fetch, paste raw HTML, or upload HTML file URLs.
- 🧠 **AI-driven extraction.** Sends HTML to OpenAI with your key for layout-agnostic parsing.
- 🎯 **Field selection.** Specify which fields to extract or let the AI auto-detect.
- 🤖 **Model choice.** gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, or gpt-5 when available.
- ✏️ **Custom prompts.** Optional system prompt to bias the extraction.
- 🆔 **Per-input metadata.** Each record carries the source URL, prompt, and timestamp.

The actor processes inputs in the order you provide them. Records stream into the dataset as parsing completes.

> 💡 **Why it matters:** writing a parser per page type costs hours and breaks with every layout change. AI-driven extraction adapts to layout variation without code changes, so dev teams can ship structured-data features faster.

---

### 🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing URL input, custom field extraction, and how to feed the output into a downstream pipeline.

---

### ⚙️ Input

| Field | Type | Name | Description |
|---|---|---|---|
| url | array | URL (Fetch HTML) | URLs to fetch HTML from. The actor does a plain HTTP GET. |
| htmlContent | string | HTML Content (Paste) | Optional. Paste raw HTML directly. |
| htmlFileUrl | array | HTML File URL (Upload) | Optional. URLs to uploaded HTML files. |
| openAIApiKey | string | OpenAI API Key | Required. Your OpenAI API key. The actor uses this for the model call. |
| model | enum | OpenAI Model | gpt-4o-mini (default), gpt-4o, gpt-4-turbo, `gpt-3.5-turbo`, gpt-5. |

Example 1. URL extraction with default model.

```json
{
  "url": [{"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"}],
  "openAIApiKey": "sk-...",
  "model": "gpt-4o-mini"
}
````

Example 2. Paste HTML directly.

```json
{
  "htmlContent": "<html><body><h1>Title</h1><p>Body</p></body></html>",
  "openAIApiKey": "sk-...",
  "model": "gpt-4o"
}
```

> ⚠️ **Good to Know:** you must supply your own OpenAI API key. All model usage is billed to your OpenAI account.

***

### 📊 Output

The dataset returns one structured record per input. Each record carries the source identifier, extracted JSON, the model used, and a timestamp. Consume the dataset as JSON, CSV, Excel, XML, or RSS via the Apify console or API.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 🌐 sourceUrl | string (url) or null | `https://books.toscrape.com/.../1000/index.html` |
| 📦 parsedData | object | `{"title":"A Light in the Attic","price":51.77,"availability":"In stock"}` |
| 🤖 model | string | gpt-4o-mini |
| 🎯 prompt | string | `Extract title, price, and availability` |
| 📅 timestamp | ISO datetime | `2026-05-09T12:00:00.000Z` |
| ❗ error | string or null | null |

#### 📦 Sample records

##### 1. URL extraction (book product page)

```json
{
  "sourceUrl": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
  "parsedData": {
    "title": "A Light in the Attic",
    "price": 51.77,
    "availability": "In stock",
    "rating": "Three",
    "description": "It's hard to imagine a world without A Light in the Attic..."
  },
  "model": "gpt-4o-mini",
  "prompt": "Extract title, price, availability, rating, and description",
  "timestamp": "2026-05-09T12:00:00.000Z",
  "error": null
}
```

##### 2. Pasted HTML (simple page)

```json
{
  "sourceUrl": null,
  "parsedData": {
    "title": "Welcome",
    "body": "Today we launched our new product..."
  },
  "model": "gpt-4o",
  "timestamp": "2026-05-09T12:00:00.000Z",
  "error": null
}
```

##### 3. Failed parse (missing API key)

```json
{
  "sourceUrl": "https://example.com/page.html",
  "parsedData": null,
  "model": "gpt-4o-mini",
  "timestamp": "2026-05-09T12:00:00.000Z",
  "error": "Missing OpenAI API key"
}
```

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 🎯 | **Built for the job.** Single-purpose HTML-to-JSON pipeline with sensible defaults. |
| 🧠 | **BYO OpenAI key.** All model usage billed directly to your OpenAI account. |
| ⚙️ | **Model choice.** Pick model based on accuracy and cost trade-offs. |
| 🔁 | **Live processing.** Every run runs end to end with no caching of input HTML. |
| 🌐 | **No infra to manage.** Apify handles compute, scaling, scheduling, and storage. |
| 🛡️ | **Reliable.** Per-input error reporting means one bad URL does not kill the whole run. |
| 🚫 | **No code required.** Configure in the UI, run from CLI, schedule via cron, or call from any language with the Apify SDK. |

> 📊 Production-grade HTML-to-JSON conversion without writing or maintaining custom parsers.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Quality | Setup |
|---|---|---|---|---|---|
| **⭐ HTML to JSON Smart Parser** *(this Actor)* | $5 free credit + your OpenAI usage | Any HTML | **Live per run** | High, layout-agnostic | ⚡ 2 min |
| Hand-written parsers | Engineering hours | Per layout | Whenever you maintain it | High but brittle | 🐢 Days to weeks |
| Paid HTML-extraction SaaS | $$ monthly | Limited | Live | Variable | ⏳ Hours |
| Manual review | Hours per file | One at a time | Stale | Highest | 🕒 Variable |

Pick this Actor when you want flexible, layout-agnostic HTML parsing without owning the model integration.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the HTML to JSON Smart Parser page on the Apify Store.
3. 🎯 **Set inputs.** Provide URLs, paste HTML, or upload files. Add your OpenAI API key.
4. 🚀 **Run it.** Click **Start** and let the Actor parse each input.
5. 📥 **Download.** Grab your results in the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to first parsed JSON: **3-5 minutes** for a single URL.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 📊 Data engineering

- Build layout-agnostic data pipelines
- Skip CSS selectors and XPath queries
- Replace bespoke parsers across products
- Power ETL of HTML archives

</td>
<td width="50%" valign="top">

#### 🏢 AI ops and product

- Convert HTML into structured prompts
- Build LLM-driven content workflows
- Power RAG ingestion from HTML sources
- Surface structured data from emails

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 🎯 Research and migration

- Index HTML archives without bespoke parsers
- Migrate legacy HTML content into structured DBs
- Build content audits from CMS exports
- Power knowledge-base ingestion

</td>
<td width="50%" valign="top">

#### 🛠️ Engineering and product

- Add HTML parsing to your apps
- Wire parsing into CMS via webhooks
- Build prototype scrapers fast
- Skip the model-integration maintenance entirely

</td>
</tr>
</table>

***

### 🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Empirical datasets for papers, thesis work, and coursework
- Longitudinal studies tracking changes across snapshots
- Reproducible research with cited, versioned data pulls
- Classroom exercises on data analysis and ethical scraping

</td>
<td width="50%">

#### 🎨 Personal and creative

- Side projects, portfolio demos, and indie app launches
- Data visualizations, dashboards, and infographics
- Content research for bloggers, YouTubers, and podcasters
- Hobbyist collections and personal trackers

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Transparency reporting and accountability projects
- Advocacy campaigns backed by public-interest data
- Community-run databases for local issues
- Investigative journalism on public records

</td>
<td width="50%">

#### 🧪 Experimentation

- Prototype AI and machine-learning pipelines with real data
- Validate product-market hypotheses before engineering spend
- Train small domain-specific models on niche corpora
- Test dashboard concepts with live input

</td>
</tr>
</table>

***

### 🔌 Automating HTML to JSON Smart Parser

This Actor exposes a REST endpoint, so you can drive it from any language or workflow tool.

- **Node.js** - call it via the [Apify JS SDK](https://docs.apify.com/sdk/js).
- **Python** - call it via the [Apify Python SDK](https://docs.apify.com/sdk/python).
- **REST** - hit it directly through the [Apify v2 API](https://docs.apify.com/api/v2).

**Schedules.** Use Apify Scheduler to batch-parse a folder of HTML inputs. Combine with webhooks to trigger downstream workflows when parsing completes.

***

### ❓ Frequently Asked Questions

<details>
<summary><b>💳 Do I need a paid Apify plan to run this actor?</b></summary>

No, but you do need an OpenAI API key. You can start the actor on the free Apify plan (which includes **$5 in monthly credit**), but model calls are billed to your OpenAI account.

</details>

<details>
<summary><b>🚨 What happens if my run fails or returns no results?</b></summary>

Failed runs are not charged on Apify. If a single input fails, the actor records the error on that record only. If the OpenAI key is invalid or out of credits, the actor logs the error.

</details>

<details>
<summary><b>🧠 Why do I need to bring my own OpenAI key?</b></summary>

So your model usage is metered against your OpenAI account, with full control over rate limits and billing. We never see or store your key.

</details>

<details>
<summary><b>🤖 Which model should I pick?</b></summary>

gpt-4o-mini is the recommended default for cost. gpt-4o is more accurate for complex layouts. `gpt-3.5-turbo` is cheapest but less reliable on dense pages.

</details>

<details>
<summary><b>📥 Which input mode should I use?</b></summary>

URLs are simplest for public pages. Paste HTML when you have content not on the public web. Upload HTML files for bulk processing.

</details>

<details>
<summary><b>🧑‍💻 Can I call this actor from my own code?</b></summary>

Yes. Apify exposes every actor as a REST endpoint and ships first-class SDKs for [Node.js](https://docs.apify.com/sdk/js) and [Python](https://docs.apify.com/sdk/python).

</details>

<details>
<summary><b>📤 How do I export the data?</b></summary>

Every Apify dataset can be downloaded in one click as CSV, JSON, JSONL, Excel, HTML, XML, or RSS.

</details>

<details>
<summary><b>📅 Can I schedule the actor to run automatically?</b></summary>

Yes. Use the Apify scheduler to parse new URLs on a cadence. Wire to webhooks for trigger-driven parsing.

</details>

<details>
<summary><b>🏪 Can I use the data commercially?</b></summary>

Yes. Parsed data is yours to use, subject to your rights to the source HTML.

</details>

<details>
<summary><b>💼 Which plan should I pick for production use?</b></summary>

Apify's Starter and Scale plans are designed for production workloads. OpenAI usage is billed separately to your OpenAI account.

</details>

<details>
<summary><b>🛠️ Can you add other LLM providers?</b></summary>

Open the [contact form](https://tally.so/r/BzdKgA) and tell us about your use case. We add features regularly when there is a clear use case behind the request.

</details>

<details>
<summary><b>⚖️ Is it legal to use this Actor?</b></summary>

Yes, provided you have rights to the source HTML. You are responsible for compliance with OpenAI's terms, source-site terms, and applicable copyright laws.

***

</details>

### 🔌 Integrate with any app

HTML to JSON Smart Parser connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Get run notifications in your channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe results into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits and releases
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.

***

### 🔗 Recommended Actors

- [**📄 PDF to JSON Parser**](https://apify.com/parseforge/pdf-to-json-parser) - Convert PDFs into structured JSON
- [**📰 Article Extractor**](https://apify.com/parseforge/article-extractor) - Extract clean article text from any URL
- [**🌐 Website Content Crawler**](https://apify.com/parseforge/website-content-crawler) - Crawl and extract clean content from any site
- [**🔍 RAG Web Browser**](https://apify.com/parseforge/rag-web-browser) - Fetch clean text for AI retrieval pipelines
- [**🎤 Audio Transcriber**](https://apify.com/parseforge/audio-transcriber) - Convert audio recordings to structured text

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more reference-data scrapers.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new actor, propose a custom project, or report an issue.

***

> ⚠️ **Disclaimer.** This Actor is an independent tool. The actor processes only HTML you supply by URL, paste, or upload, and is intended for legitimate data-extraction workflows. Users are responsible for ensuring they hold the rights to the source content and for compliance with copyright, OpenAI's terms of service, and applicable law in their jurisdiction.

# Actor input Schema

## `url` (type: `array`):

URL(s) to fetch HTML content from. This will make a simple HTTP GET request to fetch the HTML. You can provide multiple URLs. Leave empty if pasting HTML or uploading files.

## `htmlContent` (type: `string`):

Paste your HTML content here to convert to JSON, if not provided, the AI will automatically extract all important fields it identifies.

## `htmlFileUrl` (type: `array`):

Upload HTML file(s) and paste their URL(s) here. You can provide multiple file URLs. Leave empty if using URLs or pasting HTML content. You can upload files using the file upload button in Apify Console.

## `openAIApiKey` (type: `string`):

Your OpenAI API key (if not provided it would return an error). You can get one from https://platform.openai.com/api-keys. If not provided, the actor will log an error and skip processing.

## `model` (type: `string`):

The OpenAI model to use for conversion. Options: gpt-5, gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo

## `fieldsToExtract` (type: `string`):

Specify which fields you want extracted from the HTML (e.g., \['title', 'price', 'description', 'images', 'specifications']). You can provide multiple field names. If not provided, the AI will automatically extract all important fields it identifies.

## `systemPrompt` (type: `string`):

Optional custom system prompt to guide the AI extraction. If not provided, a smart default prompt will be used that extracts meaningful information.

## Actor input object example

```json
{
  "url": [
    {
      "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
    }
  ],
  "htmlFileUrl": [],
  "model": "gpt-4o-mini"
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": [
        {
            "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
        }
    ],
    "htmlFileUrl": [],
    "openAIApiKey": "",
    "model": "gpt-4o-mini",
    "systemPrompt": ""
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/html-to-json-smart-parser").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "url": [{ "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html" }],
    "htmlFileUrl": [],
    "openAIApiKey": "",
    "model": "gpt-4o-mini",
    "systemPrompt": "",
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/html-to-json-smart-parser").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": [
    {
      "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
    }
  ],
  "htmlFileUrl": [],
  "openAIApiKey": "",
  "model": "gpt-4o-mini",
  "systemPrompt": ""
}' |
apify call parseforge/html-to-json-smart-parser --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/html-to-json-smart-parser",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "HTML to JSON Smart Parser",
        "description": "Convert HTML to structured JSON using AI! Uses OpenAI to extract and structure data from HTML into clean JSON format. Perfect for developers and data analysts who need to transform HTML into structured data without manual parsing.",
        "version": "1.0",
        "x-build-id": "mkU15p8OTaIQcRdxY"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~html-to-json-smart-parser/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-html-to-json-smart-parser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~html-to-json-smart-parser/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-html-to-json-smart-parser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~html-to-json-smart-parser/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-html-to-json-smart-parser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "url": {
                        "title": "URL (Fetch HTML)",
                        "type": "array",
                        "description": "URL(s) to fetch HTML content from. This will make a simple HTTP GET request to fetch the HTML. You can provide multiple URLs. Leave empty if pasting HTML or uploading files.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "htmlContent": {
                        "title": "HTML Content (Paste)",
                        "type": "string",
                        "description": "Paste your HTML content here to convert to JSON, if not provided, the AI will automatically extract all important fields it identifies."
                    },
                    "htmlFileUrl": {
                        "title": "HTML File URL (Upload)",
                        "type": "array",
                        "description": "Upload HTML file(s) and paste their URL(s) here. You can provide multiple file URLs. Leave empty if using URLs or pasting HTML content. You can upload files using the file upload button in Apify Console."
                    },
                    "openAIApiKey": {
                        "title": "OpenAI API Key",
                        "type": "string",
                        "description": "Your OpenAI API key (if not provided it would return an error). You can get one from https://platform.openai.com/api-keys. If not provided, the actor will log an error and skip processing."
                    },
                    "model": {
                        "title": "OpenAI Model",
                        "enum": [
                            "gpt-5",
                            "gpt-4o",
                            "gpt-4o-mini",
                            "gpt-4-turbo",
                            "gpt-3.5-turbo"
                        ],
                        "type": "string",
                        "description": "The OpenAI model to use for conversion. Options: gpt-5, gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo",
                        "default": "gpt-4o-mini"
                    },
                    "fieldsToExtract": {
                        "title": "Fields to Extract (Optional)",
                        "type": "string",
                        "description": "Specify which fields you want extracted from the HTML (e.g., ['title', 'price', 'description', 'images', 'specifications']). You can provide multiple field names. If not provided, the AI will automatically extract all important fields it identifies."
                    },
                    "systemPrompt": {
                        "title": "System Prompt (Optional)",
                        "type": "string",
                        "description": "Optional custom system prompt to guide the AI extraction. If not provided, a smart default prompt will be used that extracts meaningful information."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
