# PDF To JSON Parser (`parseforge/pdf-to-json-parser`) Actor

Convert PDF documents into structured JSON using AI-powered OCR and smart data extraction. The Actor processes every page to ensure complete coverage, then identifies text, fields, tables, and key details, delivering clean, organized JSON ready for automation or analysis.

- **URL**: https://apify.com/parseforge/pdf-to-json-parser.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** AI, Jobs, Automation
- **Stats:** 56 total users, 6 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 📄 PDF to JSON Parser

> 🚀 **Convert PDFs into structured JSON in seconds.** Upload any PDF and get clean, queryable fields. Optional field selection and custom prompts. No coding, no manual data entry.

> 🕒 **Last updated:** 2026-05-08 · **📊 Per-page parsing** · **🧠 AI-driven extraction** · **🚫 No auth** required

Convert PDF documents into clean, structured JSON without writing custom parsers per document type. Upload one or more PDFs, optionally tell the actor which fields to extract, and the AI processes every page and returns one record per document with the extracted fields plus full page text. Built for invoice automation, contract review, research-paper indexing, regulatory filings, and any workflow that turns scanned or born-digital PDFs into queryable data.

The output is a structured record per file: a back-reference to the source PDF, the document name, the number of pages, a topic summary, a timestamp, and the extracted fields under fetchedData. Hand the dataset off to your database, BI tool, or AI pipeline. Every run is processed live with no caching of input PDFs.

| 👥 Built for | 🎯 Primary use cases |
|---|---|
| Finance and AP teams | Auto-extract invoice fields into accounting systems |
| Legal and contract ops | Pull key terms, dates, parties from contracts |
| Research and academia | Index research papers for full-text search |
| Compliance and regulatory | Convert filings into queryable records |
| HR and recruiting | Parse resumes into structured candidate profiles |
| Data and engineering teams | Replace bespoke PDF parsers across products |

---

### 📋 What the PDF to JSON Parser does

- 📄 **Multi-PDF input.** Upload one or more PDFs via file upload or URL.
- 🧠 **Smart extraction.** Optionally specify the exact fields you want, or let the AI pick the important ones.
- ✏️ **Custom prompts.** Pass a system prompt to bias extraction toward your domain (legal, medical, financial, etc.).
- 📊 **Page-aware.** All pages of every PDF are processed before parsing, so nothing is lost.
- 🆔 **Back-reference.** Every record links back to the original PDF in the dataset.
- ⏱️ **Timestamp.** Every record carries a timestamp so you can rebuild a timeline.

The actor processes uploads in the order you provide them. Records stream into the dataset as parsing completes, so you can start consuming results before the run is fully finished. Ideal for workflows that need clean structured data from inconsistent PDF layouts.

> 💡 **Why it matters:** PDFs are the universal data format that nobody wants to parse. Bespoke parsers break with every layout change. AI-driven extraction adapts to layout variation without code changes, so finance, legal, and research teams can get from "PDF inbox" to "structured database" in minutes.

---

### 🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing PDF upload, custom field extraction, and how to feed the output into Google Sheets via Apify integrations.

---

### ⚙️ Input

| Field | Type | Name | Description |
|---|---|---|---|
| pdfFile | array of strings | PDF File | Required. One or more PDF file URLs (uploaded via file upload or pre-existing URLs). |
| fieldsToExtract | string | Fields to Extract | Optional. Comma-separated list of fields (e.g. `title, author, date, total, vendor`). Empty = auto-detect. |
| systemPrompt | string | System Prompt | Optional custom prompt to bias the extraction toward your domain. Empty = smart default. |
| maxItems | integer | Max Items | Free users: limited to 10 items (preview). Paid users: optional, max 1,000,000. |

Example 1. Extract specific fields from invoices.

```json
{
  "pdfFile": [
    "https://example.com/invoices/INV-1001.pdf",
    "https://example.com/invoices/INV-1002.pdf"
  ],
  "fieldsToExtract": "vendor, invoiceNumber, date, dueDate, lineItems, total, currency"
}
````

Example 2. Domain-specific extraction with custom prompt (legal contracts).

```json
{
  "pdfFile": [
    "https://example.com/contracts/MSA-2026.pdf"
  ],
  "fieldsToExtract": "parties, effectiveDate, termLength, autoRenewal, governingLaw, terminationClauses",
  "systemPrompt": "You are a contract analyst. Extract the requested fields verbatim from the agreement, preserving dates and numerical values exactly."
}
```

> ⚠️ **Good to Know:** when fieldsToExtract is set, the AI prioritizes those fields. When it is empty, the AI infers what is meaningful from the PDF and returns whatever it finds.

***

### 📊 Output

The dataset returns one structured record per PDF. Each record carries the document name, page count, topic, timestamp, and a fetchedData object with the extracted fields. Consume the dataset as JSON, CSV, Excel, XML, or RSS via the Apify console or API.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 📄 documentName | string | `INV-1001.pdf` |
| 📊 numberOfPages | number | `2` |
| 🏷️ topic | string | `Vendor invoice` |
| 📅 timestamp | ISO datetime | `2026-05-08T12:00:00.000Z` |
| 📦 fetchedData | object | `{ "vendor": "Acme Corp", "invoiceNumber": "INV-1001", ... }` |
| 🔗 sourceUrl | string (url) | `https://example.com/invoices/INV-1001.pdf` |
| ❗ error | string or null | null |

#### 📦 Sample records

##### 1. Typical record (invoice with custom fields)

```json
{
  "documentName": "INV-1001.pdf",
  "numberOfPages": 2,
  "topic": "Vendor invoice",
  "timestamp": "2026-05-08T12:00:00.000Z",
  "fetchedData": {
    "vendor": "Acme Corp",
    "invoiceNumber": "INV-1001",
    "date": "2026-04-30",
    "dueDate": "2026-05-30",
    "lineItems": [
      {"description": "Cloud services Q2", "amount": 1200},
      {"description": "Support add-on", "amount": 300}
    ],
    "total": 1500,
    "currency": "USD"
  },
  "sourceUrl": "https://example.com/invoices/INV-1001.pdf",
  "error": null
}
```

##### 2. Auto-detected fields (no fieldsToExtract specified)

```json
{
  "documentName": "research-paper.pdf",
  "numberOfPages": 18,
  "topic": "Research paper",
  "timestamp": "2026-05-08T12:00:00.000Z",
  "fetchedData": {
    "title": "Diffusion-based generative models for tabular data",
    "authors": ["Jane Doe", "Carlos Lee"],
    "abstract": "We present a diffusion-based approach...",
    "keywords": ["diffusion", "tabular", "generative"],
    "publicationYear": 2026,
    "doi": "10.1234/abcd.5678"
  },
  "sourceUrl": "https://example.com/papers/diffusion-2026.pdf",
  "error": null
}
```

##### 3. Failed parse (corrupt PDF)

```json
{
  "documentName": "broken-file.pdf",
  "numberOfPages": null,
  "topic": null,
  "timestamp": "2026-05-08T12:00:00.000Z",
  "fetchedData": null,
  "sourceUrl": "https://example.com/broken-file.pdf",
  "error": "Could not parse PDF: file is encrypted"
}
```

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 🎯 | **Built for the job.** Single-purpose PDF-to-JSON pipeline with sensible defaults. |
| 🧠 | **AI-driven extraction.** Adapts to layout variation without code changes. |
| ⚙️ | **Configurable.** Specify fields or pass a custom prompt for domain-specific extraction. |
| 🔁 | **Live processing.** Every run runs end to end with no caching of input PDFs. |
| 🌐 | **No infra to manage.** Apify handles compute, scaling, scheduling, and storage. |
| 🛡️ | **Reliable.** Per-file error reporting means one bad PDF does not kill the whole run. |
| 🚫 | **No code required.** Configure in the UI, run from CLI, schedule via cron, or call from any language with the Apify SDK. |

> 📊 Production-grade PDF parsing without writing or maintaining custom parsers per document type.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Accuracy | Setup |
|---|---|---|---|---|---|
| **⭐ PDF to JSON Parser** *(this Actor)* | $5 free credit, then pay-per-use | Any PDF | **Live per run** | High, layout-agnostic | ⚡ 2 min |
| Hand-written parsers | Engineering hours | Per layout | Whenever you maintain it | High but brittle | 🐢 Days to weeks |
| OCR-only tools | $$ monthly | Text extraction only | Live | Medium | ⏳ Hours |
| Manual data entry | Hours per file | Limited | Stale | Variable | 🕒 Variable |

Pick this Actor when you want flexible, layout-agnostic PDF parsing without owning the infrastructure.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the PDF to JSON Parser page on the Apify Store.
3. 🎯 **Upload your PDFs.** Drop one or more PDFs and (optionally) list the fields you need.
4. 🚀 **Run it.** Click **Start** and let the Actor extract structured data.
5. 📥 **Download.** Grab your results in the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to first parsed PDF: **3-5 minutes** for a short document.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 📊 Finance and AP automation

- Auto-extract invoice data into accounting systems
- Parse expense reports for reimbursement workflows
- Pull line items from vendor PDFs for analysis
- Build searchable archives of financial documents

</td>
<td width="50%" valign="top">

#### 🏢 Legal and contract ops

- Extract parties, dates, and key clauses from contracts
- Build searchable contract repositories
- Surface auto-renewal triggers and termination dates
- Power contract intelligence and review workflows

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 🎯 Research and compliance

- Index research papers for full-text search
- Convert regulatory filings into queryable records
- Build literature databases for systematic review
- Power KYC and due-diligence workflows from filings

</td>
<td width="50%" valign="top">

#### 🛠️ Engineering and product

- Replace bespoke PDF parsers across products
- Add document intelligence to SaaS tools
- Wire datasets into your apps via the Apify API or webhooks
- Skip the layout-handling and OCR maintenance entirely

</td>
</tr>
</table>

***

### 🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Empirical datasets for papers, thesis work, and coursework
- Longitudinal studies tracking changes across snapshots
- Reproducible research with cited, versioned data pulls
- Classroom exercises on data analysis and ethical scraping

</td>
<td width="50%">

#### 🎨 Personal and creative

- Side projects, portfolio demos, and indie app launches
- Data visualizations, dashboards, and infographics
- Content research for bloggers, YouTubers, and podcasters
- Hobbyist collections and personal trackers

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Transparency reporting and accountability projects
- Advocacy campaigns backed by public-interest data
- Community-run databases for local issues
- Investigative journalism on public records

</td>
<td width="50%">

#### 🧪 Experimentation

- Prototype AI and machine-learning pipelines with real data
- Validate product-market hypotheses before engineering spend
- Train small domain-specific models on niche corpora
- Test dashboard concepts with live input

</td>
</tr>
</table>

***

### 🔌 Automating PDF to JSON Parser

This Actor exposes a REST endpoint, so you can drive it from any language or workflow tool.

- **Node.js** - call it via the [Apify JS SDK](https://docs.apify.com/sdk/js).
- **Python** - call it via the [Apify Python SDK](https://docs.apify.com/sdk/python).
- **REST** - hit it directly through the [Apify v2 API](https://docs.apify.com/api/v2).

**Schedules.** Use Apify Scheduler to process a folder of PDFs on a cron cadence. Combine with webhooks to trigger downstream workflows the moment parsing completes.

***

### ❓ Frequently Asked Questions

<details>
<summary><b>💳 Do I need a paid Apify plan to run this actor?</b></summary>

No. You can start right now on the free Apify plan, which includes **$5 in monthly credit**. That is enough to run the actor several times and explore the output. Paid plans unlock higher item caps, more concurrent runs, and larger datasets. [Create a free Apify account here](https://console.apify.com/sign-up?fpr=vmoqkp).

</details>

<details>
<summary><b>🚨 What happens if my run fails or returns no results?</b></summary>

Failed runs are not charged. If a single PDF fails (corrupt, encrypted, unreadable URL), the actor records the error on that record only and continues with the rest of the batch. If the whole run fails, re-run it or open our [contact form](https://tally.so/r/BzdKgA).

</details>

<details>
<summary><b>📏 How large can my PDFs be?</b></summary>

There is no hard cap, but processing time and cost scale with page count. We recommend splitting documents over 100 pages into chunks for faster results and easier downstream review.

</details>

<details>
<summary><b>🧠 How does extraction work?</b></summary>

The actor sends the PDF content to an AI extraction service together with your field list (or a smart default prompt). The AI returns structured JSON which is then validated and pushed to the dataset.

</details>

<details>
<summary><b>🌐 What languages are supported?</b></summary>

Most major languages are supported, including English, Spanish, French, German, Portuguese, Italian, Japanese, and Chinese. The AI auto-detects the document language; you can also bias it via the system prompt.

</details>

<details>
<summary><b>🧑‍💻 Can I call this actor from my own code?</b></summary>

Yes. Apify exposes every actor as a REST endpoint and ships first-class SDKs for [Node.js](https://docs.apify.com/sdk/js) and [Python](https://docs.apify.com/sdk/python). You can start a run, read the dataset, and handle webhooks from your own app in a few lines.

</details>

<details>
<summary><b>📤 How do I export the data?</b></summary>

Every Apify dataset can be downloaded in one click as CSV, JSON, JSONL, Excel, HTML, XML, or RSS. You can also pull results programmatically via the [Apify API](https://docs.apify.com/api/v2) or stream into BigQuery, S3, and other destinations through built-in integrations.

</details>

<details>
<summary><b>📅 Can I schedule the actor to run automatically?</b></summary>

Yes. Use the Apify scheduler to run the actor on any cadence, from hourly to monthly. Drop new PDF URLs into the input each cycle, or wire the actor to fire on a webhook from your inbox or storage system.

</details>

<details>
<summary><b>🏪 Can I use the data commercially?</b></summary>

Yes. PDFs you have rights to are yours to parse and use in your own internal pipelines, products, and reports.

</details>

<details>
<summary><b>💼 Which plan should I pick for production use?</b></summary>

Apify's Starter and Scale plans are designed for production workloads. They give you faster instances, more concurrent runs, and higher quotas. Pick the plan that matches your document volume and refresh cadence; the in-app pricing calculator will help you size it.

</details>

<details>
<summary><b>🛠️ Can you add tabular extraction or OCR for scanned PDFs?</b></summary>

Open the [contact form](https://tally.so/r/BzdKgA) and tell us about your use case. We add features regularly when there is a clear use case behind the request.

</details>

<details>
<summary><b>⚖️ Is it legal to parse PDFs with this Actor?</b></summary>

Yes, provided you have rights to the PDFs. You are responsible for compliance with copyright, privacy, and licensing laws applicable to the documents you submit.

***

</details>

### 🔌 Integrate with any app

PDF to JSON Parser connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Get run notifications in your channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe results into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits and releases
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a parse completes, like firing a summarization actor or pinging a Slack channel.

***

### 🔗 Recommended Actors

- [**📰 Article Extractor**](https://apify.com/parseforge/article-extractor) - Extract clean article text from any URL
- [**🎤 Audio Transcriber**](https://apify.com/parseforge/audio-transcriber) - Convert audio recordings to structured text
- [**📊 HTML to JSON Smart Parser**](https://apify.com/parseforge/html-to-json-smart-parser) - Parse any HTML page into structured JSON
- [**🎬 YouTube AI Transcriber**](https://apify.com/parseforge/youtube-ai-transcriber) - Transcribe YouTube videos via URL
- [**🌐 Website Content Crawler**](https://apify.com/parseforge/website-content-crawler) - Crawl entire sites and export structured content

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more reference-data scrapers.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new actor, propose a custom project, or report an issue.

***

> ⚠️ **Disclaimer.** This Actor is an independent tool. The actor processes only PDFs you supply by URL and is intended for legitimate document automation workflows. Users are responsible for ensuring they hold the rights to parse the PDFs they submit and for compliance with copyright, privacy, and licensing laws in their jurisdiction.

# Actor input Schema

## `pdfFile` (type: `array`):

Upload one or more PDF files using file upload. The files will be automatically processed.

## `fieldsToExtract` (type: `string`):

Specify which fields you want extracted (e.g., "title, author, date, description, price"). If not provided, the AI will automatically extract all important fields it identifies.

## `systemPrompt` (type: `string`):

Optional custom system prompt to guide the AI extraction. If not provided, a smart default prompt will be used that extracts meaningful information from the PDF.

## `maxItems` (type: `integer`):

Free users: Limited to 100. Paid users: Optional, max 1,000,000.

## Actor input object example

```json
{
  "pdfFile": [
    "https://api.apify.com/v2/key-value-stores/Fy9UrApb6rz0dRBqm/records/sample.pdf?signature=1c819r8xjMsE0Fi3AhqnT"
  ],
  "maxItems": 10
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "pdfFile": [
        "https://api.apify.com/v2/key-value-stores/Fy9UrApb6rz0dRBqm/records/sample.pdf?signature=1c819r8xjMsE0Fi3AhqnT"
    ],
    "fieldsToExtract": "",
    "systemPrompt": "",
    "maxItems": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/pdf-to-json-parser").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "pdfFile": ["https://api.apify.com/v2/key-value-stores/Fy9UrApb6rz0dRBqm/records/sample.pdf?signature=1c819r8xjMsE0Fi3AhqnT"],
    "fieldsToExtract": "",
    "systemPrompt": "",
    "maxItems": 10,
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/pdf-to-json-parser").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "pdfFile": [
    "https://api.apify.com/v2/key-value-stores/Fy9UrApb6rz0dRBqm/records/sample.pdf?signature=1c819r8xjMsE0Fi3AhqnT"
  ],
  "fieldsToExtract": "",
  "systemPrompt": "",
  "maxItems": 10
}' |
apify call parseforge/pdf-to-json-parser --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/pdf-to-json-parser",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "PDF To JSON Parser",
        "description": "Convert PDF documents into structured JSON using AI-powered OCR and smart data extraction. The Actor processes every page to ensure complete coverage, then identifies text, fields, tables, and key details, delivering clean, organized JSON ready for automation or analysis.",
        "version": "1.0",
        "x-build-id": "qjbaatFAFKCggs6qX"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~pdf-to-json-parser/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-pdf-to-json-parser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~pdf-to-json-parser/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-pdf-to-json-parser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~pdf-to-json-parser/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-pdf-to-json-parser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "pdfFile": {
                        "title": "PDF File",
                        "type": "array",
                        "description": "Upload one or more PDF files using file upload. The files will be automatically processed."
                    },
                    "fieldsToExtract": {
                        "title": "Fields to Extract (Optional)",
                        "type": "string",
                        "description": "Specify which fields you want extracted (e.g., \"title, author, date, description, price\"). If not provided, the AI will automatically extract all important fields it identifies."
                    },
                    "systemPrompt": {
                        "title": "System Prompt (Optional)",
                        "type": "string",
                        "description": "Optional custom system prompt to guide the AI extraction. If not provided, a smart default prompt will be used that extracts meaningful information from the PDF."
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 100. Paid users: Optional, max 1,000,000."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
