Wraps SiliconFlow's PaddleOCR API to extract text from images through a Python script. Takes single files or glob patterns, spits out either plain text or JSON with bounding box coordinates for each detected text region. Supports the usual image formats (JPG, PNG, WebP, BMP, GIF) and lets you customize the recognition prompt if you need specific formatting like markdown tables. The coordinate system uses normalized LOC values that convert to pixels, which is a bit quirky but documented. Honestly just a straightforward CLI wrapper around an existing OCR API, nothing fancy, but it handles batch processing cleanly and the JSON output format is well structured if you need programmatic access to text positions.
npx skills add https://github.com/aotenjou/silicon-paddleocr --skill silicon-paddle-ocr