This is a skill for OmniVoice, a zero-shot TTS model that covers 600+ languages and lets you clone voices from short audio clips, design voices with text prompts like "female, British accent, low pitch," or just generate speech without any reference at all. It's built on diffusion models and runs fast enough for real work (RTF around 0.025). The Python API is clean and you can either provide reference audio with transcription for cloning, describe speaker attributes for voice design, or let it pick a random voice. It supports non-verbal tags like [laughter] and pronunciation hints in both English and Chinese. If you need production-grade multilingual TTS with flexible voice control, this handles the full pipeline from installation to batch inference.
npx skills add https://github.com/aradotso/trending-skills --skill omnivoice-tts