# Markdownify MCP Server (`crawlerbros/markdownify-mcp-server`) Actor

Convert any webpage to clean, formatted Markdown perfect for AI consumption. Ideal for building knowledge bases, documentation scrapers, and content migration tools.

- **URL**: https://apify.com/crawlerbros/markdownify-mcp-server.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** MCP servers, AI, Automation
- **Stats:** 18 total users, 3 monthly users, 100.0% runs succeeded, 2 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $1.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Markdownify MCP Server

Convert any webpage to clean, formatted Markdown perfect for AI consumption. This Actor is ideal for building knowledge bases, documentation scrapers, and content migration tools.

### Features

✅ **Convert any webpage to Markdown** - Clean, formatted output  
✅ **CSS Selector Support** - Include/exclude specific sections  
✅ **JavaScript Rendering** - Optional Playwright support for dynamic content  
✅ **Authentication Support** - HTTP Basic Auth for restricted content  
✅ **Customizable Output** - Configure heading styles, strip tags, etc.  
✅ **Error Handling** - Graceful failures with detailed error messages  
✅ **MCP Server Ready** - Structured output for AI consumption

### How It Works

1. **Input** - Provide URL(s) and optional configuration
2. **Fetch** - Download webpage content (HTTP or Playwright)
3. **Extract** - Apply include/exclude selectors
4. **Convert** - Transform HTML to clean Markdown
5. **Output** - Save to Apify dataset with metadata

### Input Parameters

#### Required

- **`urls`** (array of strings) - List of webpage URLs to convert

#### Optional

- **`includeSelectors`** (array of strings) - CSS selectors to include specific sections  
  Example: `["article", ".main-content", "#documentation"]`

- **`excludeSelectors`** (array of strings) - CSS selectors to exclude  
  Example: `["nav", "footer", ".advertisement", "script", "style"]`

- **`useJavaScript`** (boolean) - Enable Playwright for JavaScript-heavy pages  
  Default: `false`

- **`headingStyle`** (string) - Markdown heading style  
  Options: `"ATX"` (# Heading) or `"SETEXT"` (Heading\n=======)  
  Default: `"ATX"`

- **`stripTags`** (array of strings) - HTML tags to completely remove  
  Default: `["script", "style", "iframe", "noscript"]`

- **`auth`** (object) - HTTP Basic Authentication credentials  
  Example: `{"username": "user", "password": "pass"}`

- **`timeout`** (integer) - Request timeout in seconds  
  Default: `30`, Range: `10-120`

### Input Example

```json
{
  "urls": ["https://apify.com/docs", "https://en.wikipedia.org/wiki/Markdown"],
  "excludeSelectors": ["nav", "footer", ".advertisement"],
  "useJavaScript": false,
  "headingStyle": "ATX",
  "timeout": 30
}
````

### Output Format

Each converted page is saved as a separate record in the dataset:

```json
{
  "url": "https://example.com",
  "title": "Example Domain",
  "markdown": "# Example Domain\n\nThis domain is for use...",
  "markdown_length": 1234,
  "success": true,
  "error": null,
  "scraped_at": "2025-10-24T10:30:00.000Z",
  "meta": {
    "method": "http",
    "heading_style": "ATX",
    "stripped_tags": ["script", "style"],
    "used_include_selectors": false,
    "used_exclude_selectors": true
  }
}
```

### Use Cases

#### 📚 Build AI-Ready Knowledge Bases

Convert documentation, wikis, and help centers into Markdown for AI training or RAG systems.

#### 📝 Content Migration

Migrate existing web content to Markdown for static site generators (Jekyll, Hugo, etc.).

#### 🤖 AI Agent Integration

Enable AI agents to consume web content in a clean, structured format.

#### 📄 Documentation Scraping

Extract and format technical documentation from multiple sources.

#### 🔄 Content Synchronization

Keep Markdown versions of web pages up-to-date automatically.

### API Integration

#### JavaScript/Node.js

```javascript
const { ApifyClient } = require("apify-client");

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const input = {
  urls: ["https://example.com"],
  excludeSelectors: ["nav", "footer"],
};

const run = await client.actor("YOUR_ACTOR_ID").call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();

items.forEach((item) => {
  console.log(`Title: ${item.title}`);
  console.log(`Markdown length: ${item.markdown_length}`);
  console.log(item.markdown);
});
```

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')

input_data = {
    'urls': ['https://example.com'],
    'excludeSelectors': ['nav', 'footer']
}

run = client.actor('YOUR_ACTOR_ID').call(run_input=input_data)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(f"Title: {item['title']}")
    print(f"Markdown length: {item['markdown_length']}")
    print(item['markdown'])
```

#### cURL

```bash
curl -X POST https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "excludeSelectors": ["nav", "footer"]
  }'
```

### Tips & Best Practices

#### 🚀 Performance

- Use `useJavaScript: false` for static pages (much faster)
- Only enable `useJavaScript: true` for dynamic content
- Use `includeSelectors` to extract only what you need
- Batch multiple URLs in a single run

#### 🎯 Accuracy

- Test selectors in browser DevTools first
- Use specific `includeSelectors` for precise extraction
- Combine `include` and `exclude` for best results
- Add common noise elements to `excludeSelectors`

#### 🔧 Troubleshooting

- **Empty markdown?** Check if selectors are correct
- **Missing content?** Try enabling `useJavaScript`
- **Timeout errors?** Increase `timeout` value
- **Authentication issues?** Verify `auth` credentials

### Development

#### Local Testing

```bash
## Install dependencies
pip install -r requirements.txt

## Install Playwright browsers
playwright install chromium

## Run locally
python -m src
```

#### Project Structure

```
markdownify-mcp/
├── .actor/
│   ├── actor.json          # Actor configuration
│   ├── input_schema.json   # Input validation
│   └── output_schema.json  # Output structure
├── src/
│   ├── __main__.py         # Main entry point
│   ├── fetcher.py          # HTTP & Playwright fetchers
│   ├── extractor.py        # Content extraction
│   └── converter.py        # HTML to Markdown
├── Dockerfile              # Docker configuration
├── requirements.txt        # Python dependencies
└── README.md              # This file
```

### License

Apache 2.0

### Support

For issues, questions, or feature requests, please contact support or open an issue in the repository.

***

**Made with ❤️ for the AI community**

# Actor input Schema

## `startUrls` (type: `array`):

List of webpage URLs to convert to Markdown

## `includeSelectors` (type: `array`):

CSS selectors to include specific sections. If empty, entire page is converted.

## `excludeSelectors` (type: `array`):

CSS selectors to exclude from conversion (e.g., nav, footer, ads)

## `useJavaScript` (type: `boolean`):

Use Playwright to render JavaScript-heavy pages. Slower but handles dynamic content.

## `headingStyle` (type: `string`):

Markdown heading style

## `stripTags` (type: `array`):

HTML tags to completely remove (e.g., script, style, iframe)

## `auth` (type: `object`):

HTTP Basic Authentication credentials for restricted content

## `timeout` (type: `integer`):

Maximum time to wait for each page to load

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "excludeSelectors": [
    "nav",
    "footer",
    ".advertisement",
    "aside"
  ],
  "useJavaScript": false,
  "headingStyle": "ATX",
  "stripTags": [
    "script",
    "style",
    "iframe",
    "noscript"
  ],
  "auth": {},
  "timeout": 30
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://example.com"
        }
    ],
    "excludeSelectors": [
        "nav",
        "footer",
        ".advertisement",
        "aside"
    ],
    "stripTags": [
        "script",
        "style",
        "iframe",
        "noscript"
    ],
    "auth": {}
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/markdownify-mcp-server").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://example.com" }],
    "excludeSelectors": [
        "nav",
        "footer",
        ".advertisement",
        "aside",
    ],
    "stripTags": [
        "script",
        "style",
        "iframe",
        "noscript",
    ],
    "auth": {},
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/markdownify-mcp-server").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://example.com"
    }
  ],
  "excludeSelectors": [
    "nav",
    "footer",
    ".advertisement",
    "aside"
  ],
  "stripTags": [
    "script",
    "style",
    "iframe",
    "noscript"
  ],
  "auth": {}
}' |
apify call crawlerbros/markdownify-mcp-server --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/markdownify-mcp-server",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Markdownify MCP Server",
        "description": "Convert any webpage to clean, formatted Markdown perfect for AI consumption. Ideal for building knowledge bases, documentation scrapers, and content migration tools.",
        "version": "1.0",
        "x-build-id": "bpIIMnkCpNcBZ1FzN"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~markdownify-mcp-server/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-markdownify-mcp-server",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~markdownify-mcp-server/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-markdownify-mcp-server",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~markdownify-mcp-server/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-markdownify-mcp-server",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "URLs to Convert",
                        "type": "array",
                        "description": "List of webpage URLs to convert to Markdown",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "includeSelectors": {
                        "title": "Include CSS Selectors (Optional)",
                        "type": "array",
                        "description": "CSS selectors to include specific sections. If empty, entire page is converted.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "excludeSelectors": {
                        "title": "Exclude CSS Selectors (Optional)",
                        "type": "array",
                        "description": "CSS selectors to exclude from conversion (e.g., nav, footer, ads)",
                        "items": {
                            "type": "string"
                        }
                    },
                    "useJavaScript": {
                        "title": "Enable JavaScript Rendering",
                        "type": "boolean",
                        "description": "Use Playwright to render JavaScript-heavy pages. Slower but handles dynamic content.",
                        "default": false
                    },
                    "headingStyle": {
                        "title": "Heading Style",
                        "enum": [
                            "ATX",
                            "SETEXT"
                        ],
                        "type": "string",
                        "description": "Markdown heading style",
                        "default": "ATX"
                    },
                    "stripTags": {
                        "title": "Strip HTML Tags",
                        "type": "array",
                        "description": "HTML tags to completely remove (e.g., script, style, iframe)",
                        "items": {
                            "type": "string"
                        }
                    },
                    "auth": {
                        "title": "Authentication (Optional)",
                        "type": "object",
                        "description": "HTTP Basic Authentication credentials for restricted content",
                        "properties": {
                            "username": {
                                "title": "Username",
                                "type": "string",
                                "description": "HTTP Basic Auth username",
                                "editor": "textfield"
                            },
                            "password": {
                                "title": "Password",
                                "type": "string",
                                "description": "HTTP Basic Auth password",
                                "editor": "textfield"
                            }
                        }
                    },
                    "timeout": {
                        "title": "Request Timeout (seconds)",
                        "minimum": 10,
                        "maximum": 120,
                        "type": "integer",
                        "description": "Maximum time to wait for each page to load",
                        "default": 30
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
