This is a local speech-to-text skill that uses FunASR to transcribe audio and video files into timestamped Markdown. It supports common formats like mp4, mp3, wav, and m4a, with speaker diarization built in by default. The ONNX acceleration mode runs on Paraformer models for faster processing, and there's a video keyframe extraction feature that automatically detects and captures PPT slides from recordings. After transcription, it outputs a summary prompt so you can immediately generate an AI summary of the content. Made by a lawyer who probably transcribes a lot of depositions and client meetings, judging by the detailed documentation around meeting transcription workflows.
npx skills add https://github.com/cat-xierluo/legal-skills --skill funasr-transcribe