Four endpoints for generating and editing music with StepFun-AI's open-weights ACE Step model via RunComfy's CLI. The base model costs $0.0002 per second of audio (about 27× cheaper than ElevenLabs Music), good for drafts and high-volume work. ACE Step 1.5 adds 50+ language vocal support and better structured-lyric handling for an extra fifty percent. Text-to-audio takes comma-separated tags and optional section-marked lyrics, spits out 5 seconds to 4 minutes of stereo. Audio-inpaint regenerates a time range inside an existing track. Audio-outpaint extends before or after. The tag-driven approach means you write "lo-fi hip-hop, mellow, rhodes piano, 75 BPM" instead of hoping a prompt does the right thing. Most useful when you need cheap iteration or want to fix one section without re-rendering the whole song.
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill ace-step