# Doc To Markdown MCP Server (`abotapi/doc-to-markdown-mcp`) Actor

An MCP server that converts documents to clean Markdown. Convert PDFs, Word docs, Excel spreadsheets, PowerPoints, HTML, images, and more to AI-friendly Markdown format.

- **URL**: https://apify.com/abotapi/doc-to-markdown-mcp.md
- **Developed by:** [AbotAPI](https://apify.com/abotapi) (community)
- **Categories:** MCP servers, Developer tools, Automation
- **Stats:** 3 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.10 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Document to Markdown MCP Server

An Apify MCP server that converts documents to clean Markdown. Convert PDFs, Word docs, Excel spreadsheets, PowerPoints, HTML, images, and more to AI-friendly Markdown format.

### About this MCP Server

To understand how to connect to and utilize this MCP server, please refer to the official Model Context Protocol documentation at [mcp.apify.com](https://mcp.apify.com).

#### Connection URL

MCP clients can connect to this server at:

````

https://\<YOUR\_USERNAME>--doc-to-markdown-mcp.apify.actor/mcp

```

### Standby Mode

This Actor uses **Standby mode**, a new, lightweight method for using Actors. Instead of starting an Actor for each input and waiting for results, the Actor remains ready in the background to handle arbitrary HTTP requests, just like any web or API server. [Learn more](https://docs.apify.com/platform/actors/running/standby).

> This is a new feature, and we'd love to hear your feedback.

#### Actor URL

Send an HTTP request to this URL, and wait for the response:

```

https://\<YOUR\_USERNAME>--doc-to-markdown-mcp.apify.actor?token=YOUR\_APIFY\_TOKEN

```

The MCP endpoint is available at `/mcp`:

```

https://\<YOUR\_USERNAME>--doc-to-markdown-mcp.apify.actor/mcp?token=YOUR\_APIFY\_TOKEN

````

### Key Features

- **Batch Processing**: Handle up to 20 documents simultaneously for increased productivity
- **Extensive Format Support**: Convert 15+ file formats with intelligent content extraction
- **Customizable Output**: Tailor markdown formatting to meet specific needs
- **MCP Integration**: Compatible with MCP-compatible AI systems and chatbots
- **Metadata Preservation**: Maintain essential document metadata during conversion
- **Table and Image Handling**: Recognize table structures and manage images effectively

### Supported Formats (15+)

| Category | Formats |
|----------|---------|
| Documents | PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), RTF |
| Web | HTML, XML |
| Images | JPEG, PNG, GIF, BMP (with EXIF metadata extraction) |
| Audio | WAV, MP3 (with speech transcription) |
| Data | CSV, JSON, YAML |
| Archives | ZIP (recursive extraction) |

### MCP Tools

#### Single Document Conversion

##### `convert_url_to_markdown`
Convert a document from a URL to Markdown.

```json
{
  "url": "https://example.com/document.pdf",
  "options": {
    "include_metadata": true,
    "include_toc": true
  }
}
````

##### `convert_file_to_markdown`

Convert a file from Apify Key-Value Store.

```json
{
  "key": "my-document.pdf",
  "options": {
    "include_metadata": true
  }
}
```

##### `convert_base64_to_markdown`

Convert a base64-encoded document.

```json
{
  "content": "base64-encoded-content",
  "filename": "document.pdf"
}
```

##### `convert_html_to_markdown`

Convert HTML content directly.

```json
{
  "html": "<h1>Hello World</h1><p>Content here</p>"
}
```

#### Batch Processing

##### `batch_convert_urls`

Convert multiple documents from URLs simultaneously (max 20).

```json
{
  "urls": [
    "https://example.com/doc1.pdf",
    "https://example.com/doc2.docx",
    "https://example.com/doc3.pptx"
  ],
  "options": {
    "include_metadata": true
  }
}
```

##### `batch_convert_files`

Convert multiple files from Key-Value Store (max 20).

```json
{
  "keys": ["report1.pdf", "report2.docx", "data.xlsx"]
}
```

#### Utility Tools

##### `extract_metadata`

Extract metadata without full conversion.

```json
{
  "url": "https://example.com/document.pdf"
}
```

##### `get_supported_formats`

List all supported file formats.

##### `get_output_options`

Get available formatting options.

### Output Formatting Options

Customize your markdown output with these options:

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `include_metadata` | boolean | true | Include file metadata header |
| `include_toc` | boolean | false | Generate table of contents |
| `heading_style` | string | "atx" | "atx" (# Heading) or "setext" (underlined) |
| `table_format` | string | "pipe" | "pipe" (|col|) or "simple" |
| `image_handling` | string | "reference" | "inline", "reference", or "extract" |

#### Example with Options

```json
{
  "url": "https://example.com/report.pdf",
  "options": {
    "include_metadata": true,
    "include_toc": true,
    "heading_style": "atx",
    "table_format": "pipe"
  }
}
```

### Usage

#### With Claude Desktop

Add to your Claude Desktop config (`~/.config/claude/claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "doc-to-markdown": {
      "url": "https://<YOUR_USERNAME>--doc-to-markdown-mcp.apify.actor/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}
```

#### Uploading Files to Key-Value Store

##### Via Apify Console

1. Go to Actor's **Storage** → **Key-Value Store**
2. Click **Add record**
3. Upload your file
4. Use the key name in `convert_file_to_markdown`

##### Via API

```bash
curl -X PUT \
  "https://api.apify.com/v2/key-value-stores/YOUR_STORE_ID/records/document.pdf" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/pdf" \
  --data-binary @document.pdf
```

# Actor input Schema

## `defaultIncludeMetadata` (type: `boolean`):

Include file metadata header in converted output by default

## `defaultIncludeToc` (type: `boolean`):

Generate table of contents in converted output by default

## `defaultHeadingStyle` (type: `string`):

Default heading style for markdown output

## Actor input object example

```json
{
  "defaultIncludeMetadata": true,
  "defaultIncludeToc": false,
  "defaultHeadingStyle": "atx"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("abotapi/doc-to-markdown-mcp").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("abotapi/doc-to-markdown-mcp").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call abotapi/doc-to-markdown-mcp --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=abotapi/doc-to-markdown-mcp",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Doc To Markdown MCP Server",
        "description": "An MCP server that converts documents to clean Markdown. Convert PDFs, Word docs, Excel spreadsheets, PowerPoints, HTML, images, and more to AI-friendly Markdown format.",
        "version": "1.0",
        "x-build-id": "addi7ZCsO1Su3SXEx"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/abotapi~doc-to-markdown-mcp/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-abotapi-doc-to-markdown-mcp",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/abotapi~doc-to-markdown-mcp/runs": {
            "post": {
                "operationId": "runs-sync-abotapi-doc-to-markdown-mcp",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/abotapi~doc-to-markdown-mcp/run-sync": {
            "post": {
                "operationId": "run-sync-abotapi-doc-to-markdown-mcp",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "defaultIncludeMetadata": {
                        "title": "Include Metadata by Default",
                        "type": "boolean",
                        "description": "Include file metadata header in converted output by default",
                        "default": true
                    },
                    "defaultIncludeToc": {
                        "title": "Include TOC by Default",
                        "type": "boolean",
                        "description": "Generate table of contents in converted output by default",
                        "default": false
                    },
                    "defaultHeadingStyle": {
                        "title": "Default Heading Style",
                        "enum": [
                            "atx",
                            "setext"
                        ],
                        "type": "string",
                        "description": "Default heading style for markdown output",
                        "default": "atx"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
