Built on PaddleOCR's PP-StructureV3 and VL models, this skill pulls structured Markdown and JSON from complex PDFs and document images. It handles the hard stuff: multi-column layouts with correct reading order, tables at cell level, LaTeX formulas, and even seals. You'll want this for invoices, financial reports, academic papers, or any document where layout matters. The CLI is straightforward with options for page ranges and resource saving. One thing to note: it requires an access token and has API rate limits, so factor that into any production use. Output quality is solid for documents that typically break simpler OCR tools.
npx skills add https://github.com/paddlepaddle/paddleocr --skill paddleocr-doc-parsing