OpenAI's speech recognition model that handles 99 languages and was trained on 680,000 hours of audio. You get six model sizes from tiny (39M params) to turbo (809M params), with turbo being the sweet spot for speed and quality. It's straightforward for transcribing podcasts, generating subtitles, or translating foreign audio to English text. The main gotchas are hallucinations on long audio and no speaker diarization, so if you need to identify who's talking when, look at AssemblyAI instead. Works great with an initial prompt to help with technical terms, and runs 10-20x faster on GPU. MIT licensed with 72,900+ GitHub stars.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill whisper