This gives you access to Alibaba Cloud's Qwen realtime TTS models, which are built for low-latency streaming audio. You get five model variants to pick from, including flash and voice conversion options, with the newest dated January 2026. The setup requires a virtual environment and the AliCloud SDK. It's passed audits from Gen Agent Trust Hub, Socket, and Snyk, and has decent traction with 391 GitHub stars and 272 installs. If you're building something that needs Chinese language TTS with minimal delay, or you're already in the Alibaba ecosystem, this is a straightforward integration. The streaming capability matters most when you need audio feedback without waiting for full sentence generation.
npx skills add https://github.com/cinience/alicloud-skills --skill alicloud-ai-audio-tts-realtime