This is your gateway to Alibaba's DashScope speech APIs when local Whisper or local TTS won't cut it. You get cloud ASR with optional coarse timestamps, cloud TTS with preset voices like Cherry, and the ability to create reusable voice clone profiles from sample audio. Output comes as OGG files ready for Telegram voice notes. It expects a DASHSCOPE_API_KEY in a dotfile and won't guess if it's missing. The scripts live in their own venv under the skill's work directory, which keeps your project clean. Honest take: reach for this when you need Chinese voice cloning or when local models are too slow, but remember you're paying per API call and relying on network availability.
npx skills add https://github.com/ada20204/qwen-voice --skill qwen-voice