LiteParse is a local document parser that pulls text and layout data from PDFs, Office files, and images without sending anything to the cloud. It's built by the LlamaIndex team and runs through a CLI or Node API. The default OCR is Tesseract.js, which means zero setup but moderate accuracy. You can plug in a custom HTTP OCR server if you need better results. It handles batch processing, page ranges, bounding boxes, and can generate screenshots for visual layout tasks. If you're building agents that need to read documents or doing any kind of local ETL on unstructured files, this covers the basics without adding API costs or privacy concerns.
npx skills add https://github.com/run-llama/llamaparse-agent-skills --skill liteparse