This lets Claude analyze images and videos using Qwen's vision models through QwenCloud's API. It handles OCR, multi-image comparison, screenshot understanding, chart reading, and video comprehension. The skill comes with three Python scripts: analyze.py for general vision tasks, reason.py for chain-of-thought visual reasoning with QVQ models, and ocr.py for text extraction. The default model is qwen3.6-plus, though you can swap in specialized ones like qwen3-vl-plus for precise object localization or qvq-max for mathematical reasoning over visual data. Everything runs locally with just Python 3.9 and your QwenCloud API key. The documentation is thorough about model selection and includes curl fallbacks if you need them.
npx skills add https://github.com/qwencloud/qwencloud-ai --skill qwencloud-vision