CCM
/Skills
SkillsMCPMarketplacesDigestLearnAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Claude Code Marketplaces

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Learn
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Tts

marswaveai/skills
1.1k installs59 stars
Summary

Turns text into audio with two modes: quick single-voice generation for casual reading, or script mode for multi-speaker dialogue where you can assign different voices to different characters. The skill auto-detects which mode you want based on whether your input has multiple speakers or character markers. It follows a careful step-by-step flow with explicit confirmation before generating anything, saves your voice preferences per language, and handles both inline playback and file downloads. Built on the listenhub CLI. The authentication and config setup happens silently in the background, which is nice since you don't have to think about it until you want to change settings.

Install to Claude Code

npx -y skills add marswaveai/skills --skill tts --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
Put your SEO on autopilot
Put your SEO on autopilot
An agent that runs the SEO playbooks that move rankings and ships PRs you control.
Get founding access →
Vibe Prospecting MCPVibe Prospecting MCP
Vibe Prospecting MCP
Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.
Try For Free →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
Put your SEO on autopilot
Put your SEO on autopilot
An agent that runs the SEO playbooks that move rankings and ships PRs you control.
Get founding access →
Vibe Prospecting MCPVibe Prospecting MCP
Vibe Prospecting MCP
Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.
Try For Free →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Files
SKILL.mdView on GitHub
Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
Put your SEO on autopilot
Put your SEO on autopilot
An agent that runs the SEO playbooks that move rankings and ships PRs you control.
Get founding access →
Vibe Prospecting MCPVibe Prospecting MCP
Vibe Prospecting MCP
Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.
Try For Free →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Categories
Generative Media
First SeenMay 16, 2026
View on GitHub

When to Use

  • User wants to convert text to spoken audio
  • User asks for "read aloud", "TTS", "text to speech", "voice narration"
  • User says "朗读", "配音", "语音合成"
  • User wants multi-speaker scripted audio or dialogue

When NOT to Use

  • User wants a podcast-style discussion with topic exploration (use /podcast)
  • User wants an explainer video with visuals (use /explainer)
  • User wants to generate an image (use /image-gen)

Purpose

Convert text into natural-sounding speech audio. Two paths:

  1. Quick mode (--mode direct): Single voice, low-latency, sync. For casual chat, reading snippets, instant audio.
  2. Script mode (--mode smart): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content.

Hard Constraints

  • Always check CLI auth following shared/cli-authentication.md
  • Follow shared/cli-patterns.md for CLI execution, errors, and interaction patterns
  • Never hardcode speaker IDs in CLI calls — use built-in defaults from shared/speaker-selection.md as fallback only; fetch from the speakers CLI when the user wants to change voice
  • Always read config following shared/config-pattern.md before any interaction
  • Always follow shared/speaker-selection.md for speaker selection (text table + free-text input)
  • Never save files to ~/Downloads/ or /tmp/ as primary output — save artifacts to the current working directory with friendly topic-based names (see shared/config-pattern.md § Artifact Naming)
Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation CLI command until the user has explicitly confirmed.

Mode Detection

Determine the mode from the user's input automatically before asking any questions:

SignalMode
"多角色", "脚本", "对话", "script", "dialogue", "multi-speaker"Script
Multiple characters mentioned by name or roleScript
Input contains structured segments (A: ..., B: ...)Script
Single paragraph of text, no character markersQuick
"读一下", "read this", "TTS", "朗读" with plain textQuick
AmbiguousQuick (default)

Interaction Flow

Step -1: CLI Auth Check

Follow shared/cli-authentication.md. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually.

Then follow shared/cli-authentication.md § Auth Mode Detection to determine AUTH_MODE and set:

if [ "$AUTH_MODE" = "openapi" ]; then
  CMD_PREFIX="listenhub openapi tts"
else
  CMD_PREFIX="listenhub tts"
fi

All subsequent CLI calls use $CMD_PREFIX instead of hardcoded listenhub tts.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0 (Zero-Question Boot).

If file doesn't exist — silently create with defaults and proceed:

mkdir -p ".listenhub/tts"
echo '{"outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/tts/config.json"
CONFIG_PATH=".listenhub/tts/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Do NOT ask any setup questions. Proceed directly to the Interaction Flow.

If file exists — read config silently and proceed:

CONFIG_PATH=".listenhub/tts/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/tts/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Setup Flow (user-initiated reconfigure only)

Only run when the user explicitly asks to reconfigure. Display current settings:

当前配置 (tts):
  输出方式:{inline / download / both}
  语言偏好:{zh / en / 未设置}
  默认主播:{speakerName / 使用内置默认}

Then ask:

  1. outputMode: Follow shared/output-mode.md § Setup Flow Question.

  2. Language (optional): "默认语言?"

    • "中文 (zh)"
    • "English (en)"
    • "每次手动选择" → keep null

After collecting answers, save immediately:

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
# Save language if user chose one (not "每次手动选择")
if [ "$LANGUAGE" != "null" ]; then
  NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

Quick Mode — $CMD_PREFIX create --mode direct

Step 1: Extract text

Get the text to convert. If the user hasn't provided it, ask:

"What text would you like me to read aloud?"

Step 2: Determine voice

  • If config.defaultSpeakers.{language}[0] is set → use it silently (skip to Step 4)
  • If not set → use the built-in default from shared/speaker-selection.md for the detected language (skip to Step 4)
  • Only show speaker selection if the user explicitly asks to change voice

Step 3: Save preference

After the user explicitly selects a new voice (not when using defaults):

Question: "Save {voice name} as your default voice for {language}?"
Options:
  - "Yes" — update .listenhub/tts/config.json
  - "No" — use for this session only

Step 4: Confirm

Ready to generate:

  Text: "{first 80 chars}..."
  Voice: {voice name}

Proceed?

Step 5: Generate

For short text, pass inline:

RESULT=$($CMD_PREFIX create --text "{text}" --mode direct --speaker "{name}" --lang {lang} --json 2>/tmp/lh-err)
EXIT_CODE=$?

if [ $EXIT_CODE -ne 0 ]; then
  ERROR=$(cat /tmp/lh-err)
  case $EXIT_CODE in
    2) echo "Auth error: run 'listenhub auth login'" ;;
    3) echo "Timeout: try --no-wait" ;;
    *) echo "Error: $ERROR" ;;
  esac
  rm -f /tmp/lh-err
fi
rm -f /tmp/lh-err

AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')

For long text, write to a temp file first (see shared/cli-patterns.md § Long Text Input):

cat > /tmp/lh-content.txt << 'ENDCONTENT'
Long text content goes here...
ENDCONTENT

RESULT=$($CMD_PREFIX create --text "$(cat /tmp/lh-content.txt)" --mode direct --speaker "{name}" --lang {lang} --json)
AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')

rm -f /tmp/lh-content.txt

Step 6: Present result

Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.

inline or both: Display the audioUrl as a clickable link.

Present:

Audio generated!

在线收听:{audioUrl}

download or both: Also download the file. Generate a topic slug from the text content following shared/config-pattern.md § Artifact Naming.

SLUG="{topic-slug}"  # e.g. "server-maintenance-notice"
NAME="${SLUG}.mp3"
# Dedup: if file exists, append -2, -3, etc.
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -o "$NAME" "$AUDIO_URL"

Present:

Audio generated!

已保存到当前目录:
  {NAME}

Script Mode — $CMD_PREFIX create --mode smart

Step 1: Get scripts

Determine whether the user already has a scripts array:

  • Already provided (JSON or clear segments): parse and display for confirmation

  • Not yet provided: help the user structure segments. Ask:

    "Please provide the script with speaker assignments. Format: each line as SpeakerName: text content. I'll convert it."

    Once the user provides the script, parse it into speaker-annotated text.

Step 2: Assign voices per character

For each unique character in the script:

  • If config.defaultSpeakers.{language} has saved voices → auto-assign silently (one per character in order)
  • If not set → use built-in defaults from shared/speaker-selection.md (Primary for first character, Secondary for second)
  • Only show speaker selection if the user explicitly asks to change voices

Step 3: Save preferences

After all voices are assigned (if any were new):

Question: "Save these voice assignments for future sessions?"
Options:
  - "Yes" — update defaultSpeakers in .listenhub/tts/config.json
  - "No" — use for this session only

Step 4: Confirm

Ready to generate:

  Characters:
    {name}: {voice}
    {name}: {voice}
  Segments: {count}
  Title: (auto-generated)

Proceed?

Step 5: Generate

Format the script text with speaker markers and submit. For multi-speaker scripts, include speaker names inline in the text. Run with run_in_background: true since script mode may take longer.

Submit (foreground) with --no-wait:

RESULT=$($CMD_PREFIX create --text "{formatted script with speaker markers}" --mode smart --speaker "{name1}" --speaker "{name2}" --lang {lang} --no-wait --json)
ID=$(echo "$RESULT" | jq -r '.id')
echo "Submitted: $ID"

For long scripts, write to a temp file first:

cat > /tmp/lh-content.txt << 'ENDCONTENT'
SpeakerA: First line of dialogue
SpeakerB: Second line of dialogue
...
ENDCONTENT

RESULT=$($CMD_PREFIX create --text "$(cat /tmp/lh-content.txt)" --mode smart --speaker "{name1}" --speaker "{name2}" --lang {lang} --no-wait --json)
ID=$(echo "$RESULT" | jq -r '.id')

rm -f /tmp/lh-content.txt

Poll (background) with run_in_background: true and timeout: 600000:

ID="<id-from-above>"
for i in $(seq 1 60); do
  RESULT=$(listenhub creation get "$ID" --json 2>/dev/null)
  STATUS=$(echo "$RESULT" | jq -r '.status // "processing"')

  case "$STATUS" in
    completed) echo "$RESULT"; exit 0 ;;
    failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
    *) sleep 10 ;;
  esac
done
echo "TIMEOUT" >&2; exit 2

Step 6: Present result

When the background task completes, parse the result:

AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
SUBTITLES_URL=$(echo "$RESULT" | jq -r '.subtitlesUrl // empty')
DURATION=$(echo "$RESULT" | jq -r '.audioDuration // empty')
CREDITS=$(echo "$RESULT" | jq -r '.credits // empty')

Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.

inline or both: Display the audioUrl and subtitlesUrl as clickable links.

Present:

Audio generated!

在线收听:{audioUrl}
字幕:{subtitlesUrl}
时长:{audioDuration / 1000}s
消耗积分:{credits}

download or both: Also download the file. Generate a topic slug following shared/config-pattern.md § Artifact Naming.

SLUG="{topic-slug}"  # e.g. "welcome-dialogue"
NAME="${SLUG}.mp3"
# Dedup: if file exists, append -2, -3, etc.
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -o "$NAME" "$AUDIO_URL"

Present:

已保存到当前目录:
  {NAME}

Updating Config

When saving preferences, merge into .listenhub/tts/config.json — do not overwrite unchanged keys.

  • Quick voice: set defaultSpeakers.{language}[0] to the selected speakerId
  • Script voices: set defaultSpeakers.{language} to the full array assigned this session
  • Language: set language if the user explicitly specifies it

API Reference

  • CLI execution patterns: shared/cli-patterns.md
  • CLI authentication: shared/cli-authentication.md
  • Speaker list: shared/cli-speakers.md
  • Speaker selection guide: shared/speaker-selection.md
  • Config pattern: shared/config-pattern.md
  • Output mode: shared/output-mode.md

Composability

  • Invokes: speakers CLI (for speaker selection)
  • Invoked by: explainer (for voiceover)

Examples

Quick mode:

"TTS this: The server will be down for maintenance at midnight."

  1. Detect: Quick mode (plain text, "TTS this")
  2. Read config: defaultSpeakers.en is empty
  3. Use built-in default: Mars (cozy-man-english)
  4. Confirm → user approves
  5. Generate:
    RESULT=$($CMD_PREFIX create --text "The server will be down for maintenance at midnight." --mode direct --speaker "Mars" --lang en --json)
    AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
    
  6. Present: display audioUrl as link (inline mode)

Script mode:

"帮我做一段双人对话配音,A说:欢迎大家,B说:谢谢邀请"

  1. Detect: Script mode ("双人对话")
  2. Parse segments: A -> "欢迎大家", B -> "谢谢邀请"
  3. Read config: defaultSpeakers.zh empty
  4. Use built-in defaults: 原野 (Primary) + 高晴 (Secondary)
  5. Confirm → user approves
  6. Generate:
    RESULT=$($CMD_PREFIX create --text "A: 欢迎大家
    B: 谢谢邀请" --mode smart --speaker "原野" --speaker "高晴" --lang zh --no-wait --json)
    ID=$(echo "$RESULT" | jq -r '.id')
    
  7. Poll in background until complete
  8. Present: audioUrl, subtitlesUrl, duration

Recommended

More Generative Media →
stable-diffusion-image-generation

orchestra-research/ai-research-skills

stable diffusion image generation
266
9.2k
elevenlabs-music-generation

agentspace-so/runcomfy-agent-skills

Generate studio-quality songs and instrumentals from text descriptions, 5 seconds to 5 minutes, with section-level structure control.
116.9k
11
elevenlabs-music-generation

runcomfy-com/skills

elevenlabs music generation
82.4k
1
elevenlabs-music-generation

doany-ai/skills

elevenlabs music generation
82.2k
elevenlabs-tts

inferen-sh/skills

39.1k
173
elevenlabs-stt

inferen-sh/skills

2.7k
173