Speech Engine

613 installs319 stars

Summary

This lets you add real-time voice to a custom agent runtime by connecting ElevenLabs speech-to-text and text-to-speech to your own server logic. You expose a WebSocket endpoint, ElevenLabs handles the browser audio pipeline and turn-taking, and you stream response text back after validating user intent. The skill keeps agent logic on your server instead of in ElevenLabs hosted agents, which matters when you need full control over how speech recognition maps to actions or tools. It includes server patterns for Python and TypeScript, browser token endpoints, and interruption-aware streaming. Treat speech text as untrusted input and validate it before passing anything to downstream logic or tool calls.

Install

npx skills add https://github.com/elevenlabs/skills --skill speech-engine

Speech Engine

Install

Speech Engine

Install

Recommended

Recommended