This wraps Wan-AI's flagship text-to-video model through RunComfy's CLI, letting you generate 2-15 second clips at 720p or 1080p with audio-driven lip sync if you supply a track. The main draw is the audio_url parameter for syncing mouth movement to your own voiceover, plus multi-reference conditioning and solid motion physics. You get prompt expansion by default, which rewrites short prompts for better results, but you can disable it for literal control. Best for lip-synced ads or multi-language dub variants where you need the same visual with different audio. Duration caps at 15 seconds and there's no 4K, so plan to stitch longer narratives from multiple calls.
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill wan-2-7