This one handles vision-based AI chat through the z-ai-web-dev-sdk, letting you analyze images, videos, and documents with conversational follow-ups. You can use it via CLI for quick one-off analysis or integrate it into backend code for multi-turn conversations. The CLI approach is handy for testing: just pass an image URL or local file with a prompt and get results piped to JSON. The SDK route gives you more control for building apps that need to maintain context across multiple images or ask follow-up questions about the same visual content. Works with base64 encoding or direct URLs, though base64 is recommended for reliability.
npx skills add https://github.com/answerzhao/agent-skills --skill vlm