This converts audio and video files into text transcripts using WhisperX, with support for word level timestamps if you need precise timing. You can pick from different model sizes (tiny to large-v2) depending on whether you want speed or accuracy, output to TXT, SRT, VTT, or JSON, and it handles multiple languages with auto-detection. The skill walks you through configuration questions before running, which is helpful if you're not sure what settings to use. First run is slow because it downloads models, but subsequent transcriptions are faster. Honestly most useful when you need subtitles or searchable transcripts from recordings and don't want to mess with Whisper configuration yourself.
npx skills add https://github.com/infquest/vibe-ops-plugin --skill audio-transcribe