Getting voice agents right is about hitting that sub-800ms latency target while handling interruptions gracefully. You've got two paths: speech-to-speech (OpenAI Realtime API, most natural, hardest to debug) or the classic pipeline approach (Deepgram STT + your LLM + ElevenLabs TTS, slower but you control every step). The skill walks through both architectures with production code, plus the VAD patterns that make or break turn-taking. Honestly, if you're building anything customer-facing, the pipeline approach wins. You need to see what the AI actually said for compliance and debugging, even if it costs you 200ms. The Pipecat examples are solid for getting started without building WebSocket handlers from scratch.
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill voice-agents