Turns two-speaker conversations into natural-sounding audio using Dia TTS through the inference.sh CLI. You write dialogue with speaker tags like [S1] and [S2], and the model handles voice separation automatically. The punctuation matters here: exclamation points add energy, ellipses create hesitation, and parenthetical cues like (laughs) or (sighs) work surprisingly well for emotional texture. Best for podcast scripts, explainer videos, or any scenario where you need back-and-forth dialogue without hiring voice actors. The guide covers pacing tricks and script writing patterns that actually help, like keeping sentences under 15 words and using contractions. For longer projects you'll want to generate in segments and merge them afterward.
npx skills add https://github.com/inference-sh/skills --skill dialogue-audio