This wraps Gemini Pro's vision API for analyzing images from the command line. You can extract text from screenshots, debug error messages, compare UI states, or pull code snippets out of images. The prompting examples are genuinely useful, especially the structured templates for OCR, UI analysis, and turning screenshots into bug reports. It's basically OCR plus understanding, so you get both the text and context about what you're looking at. Works well for those moments when you need to grab text from an image or quickly analyze what's happening in a screenshot without manually typing everything out. Quality depends on your image clarity, and you'll want to verify extracted text since no OCR is perfect.
npx skills add https://github.com/johnlindquist/claude --skill gemini-image