This is a complete short-form video pipeline built for Chinese platforms: Xiaohongshu, Douyin, and Weixin Shipin Hao. It takes raw voiceover audio and b-roll, runs transcription with mlx-whisper or faster-whisper, lets you review and correct the transcript with redistributed word timings, then handles rough cuts, scene boundary detection, highlight picking, and LLM-driven script rewriting into five-act story structures. The auto-enrich pass schedules b-roll cutaways, chapter cards, emoji stickers, beat-synced BGM with librosa, and generates abstract concept visuals through Codex's built-in imagegen tool. It runs an 80-plus-rule content linter for platform compliance, renders with heavy CJK fonts and loudness normalization, runs post-render QA, and exports to all three platform aspect ratios with titles, captions, and tags. The storyboard and provider decision modules are especially detailed. You'll need FFmpeg and Python 3.
npx skills add https://github.com/maxazure/video-editing-skill --skill video-editing