Turns text prompts into video clips using a structured JSON workflow that lets you specify everything from camera angles and character dialogue to audio cues and era-specific settings. You can optionally feed it a reference image to guide the visual style or anchor the first frame. The workflow is pretty hands-on: you write a detailed JSON prompt, optionally generate a reference image using the image-generation skill, then call a Python script with your parameters. Best suited for narrative scenes where you want tight control over composition and storytelling elements rather than quick throwaway clips. The JSON structure feels verbose but pays off when you need reproducible results or want to iterate on specific aspects like camera movement or audio layering.
npx skills add https://github.com/bytedance/deer-flow --skill video-generation