Generates talking head videos from a single portrait image using Pruna's P-Video-Avatar model through the inference.sh CLI. You give it a photo and either text (uses built-in TTS with 30 voices across 10 languages) or your own audio file, and it creates a lip-synced video at 720p or 1080p. Pricing is $0.025 per second of output for 720p, which is legitimately cheaper than HeyGen or Synthesia. Works well for product demos, multilingual content, or any situation where you need a consistent AI presenter without filming. You can control speaking style and background behavior through prompts. The workflow is straightforward: run the command with an image URL and script, get back a video. Pairs nicely with their P-Image model if you need to generate the portrait first.
npx skills add https://github.com/inference-sh-skills/skills --skill p-video-avatar