This lets you add real-time voice to a custom agent runtime by connecting ElevenLabs speech-to-text and text-to-speech to your own server logic. You expose a WebSocket endpoint, ElevenLabs handles the browser audio pipeline and turn-taking, and you stream response text back after validating user intent. The skill keeps agent logic on your server instead of in ElevenLabs hosted agents, which matters when you need full control over how speech recognition maps to actions or tools. It includes server patterns for Python and TypeScript, browser token endpoints, and interruption-aware streaming. Treat speech text as untrusted input and validate it before passing anything to downstream logic or tool calls.
npx skills add https://github.com/elevenlabs/skills --skill speech-engineprisma/skills
firebase/agent-skills
Dexploarer/hyper-forge
prisma/skills