This routes image analysis requests to either Zhipu's GLM-4V or Qwen's VL models instead of using local libraries like PIL or pytesseract. You point it at local files, URLs, or even videos, and it handles the API calls and base64 encoding. The thinking mode flag on Zhipu is worth knowing about for complex tasks like object localization where you need better reasoning. It's clearly built for a Chinese-language bot framework (CountBot) but the command structure is straightforward. Zhipu offers a free tier which makes it practical for prototyping, while Qwen apparently handles document parsing better. The multi-image comparison feature is handy for spot-the-difference tasks.
npx skills add https://github.com/countbot-ai/countbot --skill image-analysis