ByteDance's Seedance 2.0 Pro generates 4–15 second cinematic video clips with native lip-synced audio, and its real strength is multi-modal references: you can feed it up to 9 images, 3 videos, and 3 audio files in a single call. The prompting model is sensible: stable identity goes in image_url, evolving narrative goes in the text prompt. It's the right pick when you need a spokesperson ad or dialogue piece with consistent branding across languages, or when you want camera-shot grammar without manual compositing. Resolution caps at 720p on the playground tier, and you'll hit schema errors if your reference videos or audio fall outside the 2–15 second window.
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill seedance-v2