OpenAI's Whisper wrapped for Claude Code, giving you speech recognition across 99 languages trained on 680,000 hours of audio. You get six model sizes to pick from (tiny at 39M parameters up to large at 1550M), so you can trade speed for accuracy depending on whether you're transcribing a quick voice memo or a three-hour podcast. It handles transcription, translation to English, and spits out timestamps at the word level in formats like SRT or WebVTT. GPU support makes it 10 to 20 times faster if you have the hardware, but it runs on CPU too. The initial prompt feature helps with technical jargon, which honestly makes a difference if you're working with domain-specific audio.
npx skills add https://github.com/davila7/claude-code-templates --skill whisper