This is an audio-driven video synthesis tool that generates talking head videos from a single image or existing video, syncing lip movements, head poses, and body gestures to an audio track. It supports unlimited video lengths through a streaming mode and includes TTS capabilities for Chinese text. The setup is heavyweight, requiring about 30GB of model weights and 16GB+ GPU memory, though it offers quantization options if you're running lean. You get two main workflows: image-to-video for creating digital avatars from photos, and video-to-video for redubbing existing footage. Generation speed is roughly 5-10 seconds per second of 480P output. The quality depends heavily on having clear facial input and proper audio normalization at 16kHz. Worth the setup if you're building virtual presenters or need automated video dubbing at scale.
npx -y skills add anbeime/skill --skill infinitetalk --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
davila7/claude-code-templates
orchestra-research/ai-research-skills
agentspace-so/runcomfy-agent-skills
inferen-sh/skills