Document Structure Extractor — Markdown to JSON outline avatar

Document Structure Extractor — Markdown to JSON outline

Pricing

Pay per usage

Go to Apify Store
Document Structure Extractor — Markdown to JSON outline

Document Structure Extractor — Markdown to JSON outline

Turn Markdown documents into structured JSON: nested heading tree with section text, fenced code blocks, links, parsed tables, and size statistics. Pure parsing, no LLM cost.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shinobu Otani

Shinobu Otani

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Document Structure Extractor

Turn Markdown documents into structured JSON — heading tree, section text, code blocks, links, and parsed tables. Pure parsing, deterministic, no LLM cost.

What it does

For each input document it extracts:

  • Title (first # heading) and preamble text
  • Nested section tree: level, heading, body text, character counts, children — fenced code blocks never miscounted as headings
  • Code blocks with language tags and line numbers
  • Links ([text](url))
  • Tables parsed into header + rows
  • Stats: lines, characters, heading and code-block counts

Input

{
"documents": ["# Guide\n\nIntro.\n\n## Setup\n\n```bash\npip install x\n```"]
}

Output (one dataset item per document)

{
"title": "Guide",
"sections": [
{
"level": 1, "heading": "Guide", "text": "Intro.",
"children": [{"level": 2, "heading": "Setup", "...": "..."}]
}
],
"code_blocks": [{"lang": "bash", "code": "pip install x", "line": 7}],
"links": [],
"tables": [],
"stats": {"lines": 9, "chars": 52, "headings": 2, "code_blocks": 1}
}

Typical uses

  • Building tables of contents / outlines for documentation sites
  • Feeding section-level structure into RAG ingestion pipelines
  • Auditing docs: section sizes, code-block coverage, dead-link candidates