This adds Claude's vision capabilities to your code environment so you can process images, PDFs, and screenshots. You get base64 encoding patterns, multi-image comparison, and document extraction out of the box. The skill includes optimization helpers that resize images to around 1568px to cut token usage by 30-50%, which matters when you're processing multiple files. Works well for OCR-like text extraction, chart analysis, and structured data pulls from receipts or forms. One thing to note: it won't identify specific people and can struggle with handwriting, but for technical diagrams, UI screenshots, and printed documents it's solid.
npx skills add https://github.com/lobbi-docs/claude --skill vision-multimodal