Document Structure Extractor — Markdown to JSON outline
Pricing
Pay per usage
Go to Apify Store

Document Structure Extractor — Markdown to JSON outline
Turn Markdown documents into structured JSON: nested heading tree with section text, fenced code blocks, links, parsed tables, and size statistics. Pure parsing, no LLM cost.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Shinobu Otani
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Document Structure Extractor
Turn Markdown documents into structured JSON — heading tree, section text, code blocks, links, and parsed tables. Pure parsing, deterministic, no LLM cost.
What it does
For each input document it extracts:
- Title (first
#heading) and preamble text - Nested section tree: level, heading, body text, character counts, children — fenced code blocks never miscounted as headings
- Code blocks with language tags and line numbers
- Links (
[text](url)) - Tables parsed into header + rows
- Stats: lines, characters, heading and code-block counts
Input
{"documents": ["# Guide\n\nIntro.\n\n## Setup\n\n```bash\npip install x\n```"]}
Output (one dataset item per document)
{"title": "Guide","sections": [{"level": 1, "heading": "Guide", "text": "Intro.","children": [{"level": 2, "heading": "Setup", "...": "..."}]}],"code_blocks": [{"lang": "bash", "code": "pip install x", "line": 7}],"links": [],"tables": [],"stats": {"lines": 9, "chars": 52, "headings": 2, "code_blocks": 1}}
Typical uses
- Building tables of contents / outlines for documentation sites
- Feeding section-level structure into RAG ingestion pipelines
- Auditing docs: section sizes, code-block coverage, dead-link candidates