This is a proper video editor that runs through conversation. You describe what you want, it transcribes your footage, reasons about cuts from the transcript, proposes a strategy in plain English, waits for confirmation, then executes with ffmpeg. It handles the production correctness rules (30ms audio fades at cuts, subtitles applied last in the filter chain, word-boundary alignment) so you don't get pops or misaligned captions. You can ask it to cut filler, generate overlay animations with Remotion or Manim, burn subtitles, color grade, whatever the format supports. The core insight is that it treats audio as primary and only drills into visuals at decision points, which keeps the iteration loop fast. No presets, no menu diving.
npx skills add https://github.com/browser-use/video-use --skill video-use