Pricing

Pay per usage

Document Structure Extractor — Markdown to JSON outline

Turn Markdown documents into structured JSON: nested heading tree with section text, fenced code blocks, links, parsed tables, and size statistics. Pure parsing, no LLM cost.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shinobu Otani

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

Document Structure Extractor

Turn Markdown documents into structured JSON — heading tree, section text, code blocks, links, and parsed tables. Pure parsing, deterministic, no LLM cost.

What it does

For each input document it extracts:

Title (first # heading) and preamble text
Nested section tree: level, heading, body text, character counts, children — fenced code blocks never miscounted as headings
Code blocks with language tags and line numbers
Links ([text](url))
Tables parsed into header + rows
Stats: lines, characters, heading and code-block counts

Input

{
    "documents": ["# Guide\n\nIntro.\n\n## Setup\n\n```bash\npip install x\n```"]
}

Output (one dataset item per document)

{
    "title": "Guide",
    "sections": [
        {
            "level": 1, "heading": "Guide", "text": "Intro.",
            "children": [{"level": 2, "heading": "Setup", "...": "..."}]
        }
    ],
    "code_blocks": [{"lang": "bash", "code": "pip install x", "line": 7}],
    "links": [],
    "tables": [],
    "stats": {"lines": 9, "chars": 52, "headings": 2, "code_blocks": 1}
}

Typical uses

Building tables of contents / outlines for documentation sites
Feeding section-level structure into RAG ingestion pipelines
Auditing docs: section sizes, code-block coverage, dead-link candidates

HTML to Markdown — clean conversion, boilerplate stripping

shoebill-dev27/html-to-markdown

Convert scraped HTML into clean Markdown and plain text: headings, nested lists, links, images, code blocks, blockquotes, and tables. Drops scripts, styles, and structural boilerplate (nav/footer/aside) so only content remains. Pure parsing, no LLM cost.

Shinobu Otani

Markdown API

vivid_astronaut/markdown

Fabio Suizu

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.

Logiover

HTML to Markdown

web.harvester/html-to-markdown

Convert HTML to clean Markdown. Supports GFM tables, code blocks, and custom rules. Perfect for content migration and documentation.

Web Harvester

PDF URL to Markdown, Tables & RAG Extractor

thescrapelab/Apify-PDF-url-scraper

Extract clean Markdown, page text, tables, metadata, summaries, and AI-ready RAG chunks from PDF URLs.

Inus Grobler

HTML to Markdown Converter - Bulk Web Content to MD

santamaria-automations/html-to-markdown

Extract main article content from any website and convert to clean Markdown including headings, links, images, tables, and code blocks. Perfect for LLM training, AI pipelines, and documentation. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Ale

Image to Markdown

abotapi/any-doc-parser

Image to Markdown converts images and scanned PDFs into structured Markdown using AI-powered document understanding. It recognizes text, tables, mathematical formulas (LaTeX), and figures while preserving the correct reading order and document layout.

AbotAPI

Web-to-Markdown Generator for AI & RAG Pipelines

profitstack/web-to-markdown-generator-for-ai-rag-pipelines

Convert any website into clean, heading-based chunking, LLM-ready Markdown for RAG and AI agents.

Manas Mantri

AI Web to Markdown - LLM-Ready Extractor

wiry_kingdom/ai-web-to-markdown

Convert any URL into clean LLM-ready markdown. Strips ads, nav, footer. Preserves headings, lists, tables, code blocks. Returns token count. Perfect for RAG, fine-tuning, AI agents. 10x cheaper than Firecrawl.