If you want voice cloning and TTS running locally instead of paying ElevenLabs per character, this is a solid open-source option. It's a Tauri desktop app with a FastAPI backend that runs entirely on your machine, supports five different TTS engines including Qwen3 and Chatterbox Turbo, and handles 23 languages. The REST API on localhost:17493 makes it easy to integrate into your own apps, and it includes a multi-track Stories editor for mixing different voices. Works with Apple Silicon MLX, CUDA, or CPU fallback. The code samples are comprehensive and it ships pre-built binaries for macOS and Windows, so you can skip the build process unless you're on Linux.
npx skills add https://github.com/aradotso/trending-skills --skill voicebox-voice-synthesis