Converts text to speech using Xiaomi's MiMo V2.5 models with three modes: preset voices, voice design from text descriptions, and voice cloning from audio samples. The preset mode supports singing and comes with eight built-in voices like 冰糖 (lively girl) and 白桦 (mature male). You can control emotion and style either through natural language prompts or inline tags like (紧张,深呼吸) for nervous breathing. The voice design mode is interesting because it lets you write character descriptions ("middle-aged male, auctioneer style, rapid rhythm") and generates matching voices. Supports Chinese dialects, mixed emotions, and director mode for detailed performance control. Requires a MiMo API key and Python with openai installed. The documentation is surprisingly thorough on how to write effective voice descriptions and when to add emotional tags versus full context.
npx skills add https://github.com/xiaomimimo/mimo-skills --skill mimo-v2-5-tts