This is for building voice AI systems where humans talk naturally with AI and get responses fast enough that it doesn't feel broken. The core tradeoff is laid out clearly: speech-to-speech models like OpenAI's Realtime API give you the lowest latency and preserve emotion, but you sacrifice control. Pipeline architectures where you chain STT, LLM, and TTS let you debug and tune each piece, but every component adds milliseconds. The skill hammers on latency budgets because that's what separates demos from production. It covers the practical stuff like handling interruptions, dealing with background noise, and detecting when users start talking so the AI can shut up. If you're shipping something where 800ms response time is the difference between natural and awkward, this maps the territory.
npx skills add https://github.com/davila7/claude-code-templates --skill voice-agents