This is proper local TTS for Apple Silicon that runs entirely on-device through MLX, no API keys or cloud services. You can clone voices from 15-25 second samples, stream audio in real-time, or batch process entire books. The voice cloning won't fool anyone but it does capture general characteristics well enough for personal use. Supports emotion tags that actually produce sounds like laughs and sighs rather than speaking the words. The documentation is unusually thorough with zero-padded filename warnings, PDF extraction workflows, and realistic time estimates. If you've got an M-series Mac and want to generate hours of audio without sending text to external services, this does the job.
npx skills add https://github.com/emzod/speak --skill speak-tts