This hooks Claude up to ElevenLabs Scribe and Whisper models through the inference.sh CLI. You get three transcription options: ElevenLabs Scribe v2 with 98%+ accuracy and speaker diarization, Fast Whisper Large V3 for speed, or Whisper V3 Large for maximum accuracy. It handles timestamps, multi-language support, and translation to English. The workflow examples are solid, especially the video subtitling pipeline that chains transcription with caption generation. Works well for meeting recordings, podcasts, and voice notes. You'll need the belt CLI installed, but once that's done the commands are straightforward JSON over the command line.
npx skills add https://github.com/inference-sh/skills --skill speech-to-text