This handles the full voice AI pipeline: speech-to-text, natural language processing, and text-to-speech with support for multiple providers like OpenAI Whisper, Google Cloud, Azure, and Eleven Labs. You get a complete VoiceAssistant class that manages conversation history and async processing, plus real-time audio streaming with voice activity detection. The examples cover smart home voice control and meeting transcription, which gives you a solid starting point. The privacy and latency guidance is actually useful, especially the recommendation to delete audio after processing and use streaming for lower latency. If you're building anything voice-first, this covers the architectural decisions you'll need to make early, and the multi-provider approach means you're not locked into one vendor's pricing or capabilities.
npx skills add https://github.com/qodex-ai/ai-agent-skills --skill voice-ai-integration