docTOmd
D5.0Streamlined document-to-markdown conversion using [Docling](https://github.com/docling-project/docling). Converts PDF, DOCX, XLSX, HTML, and images into markdown with page-organized external images.
Get This Skill on GitHubOverview
name: docTOmd description: Convert documents to markdown. USE WHEN user mentions convert to markdown OR to md OR PDF to markdown OR doc to md OR docx to markdown OR document to md OR extract text from PDF OR convert PDF OR convert document OR make markdown from OR /doctomd. invocation: /doctomd
docTOmd
Streamlined document-to-markdown conversion using Docling. Converts PDF, DOCX, XLSX, HTML, and images into markdown with page-organized external images.
Features
- Converts documents to markdown using Docling natively
- Images saved as external PNG files organized by page
- Page-based folder structure (ideal for knowledge bases with hundreds of images)
- Minimal YAML frontmatter with document metadata
- No LLM processing required
Usage
bun run $PAI_DIR/skills/docTOmd/Tools/Convert.ts <file_path> [options]
Options:
--output, -o <path>- Custom output path--assets-dir <path>- Custom assets directory (default: {name}-images/)--ocr- Force OCR processing--vlm- Use Vision Language Model pipeline--help, -h- Show help
Image Organization
Images are automatically saved in page-based subdirectories:
document-images/
page-001/
image-001.png
image-002.png
page-002/
image-003.png
This structure:
- Makes it easy to navigate to specific pages
- Scales efficiently to hundreds of images
- Provides clear context for each image
Examples
User: "Convert report.pdf to markdown"
-> bun run Convert.ts report.pdf
-> Output: report.md + report-images/ directory
User: "/doctomd document.docx"
-> bun run Convert.ts document.docx
-> Output: document.md + document-images/ directory
User: "Turn this PDF into md"
-> bun run Convert.ts document.pdf
-> Output: document.md + document-images/ directory
User: "Convert scanned.pdf using OCR"
-> bun run Convert.ts scanned.pdf --ocr
-> Output: scanned.md + scanned-images/ directory
Supported Formats
PDF, DOCX, XLSX, HTML, Markdown, PNG, JPG, TIFF
Output
- YAML frontmatter (title, source, hash, page/word/image counts)
- Markdown content with preserved structure
- External images in page-organized directories with relative paths
Dependencies
Required: Python with docling>=2.67.0
pip install "docling>=2.67.0"
Ready to use this skill?
Visit the original repository to get the full skill configuration and installation instructions.
View on GitHub