This is the image edit endpoint of OpenAI's GPT Image 2, wrapped with the prompting patterns that actually work. Use it when you need to preserve a face or brand mark while swapping out multilingual text, moving headlines around with layout precision, or composing subjects across up to 10 reference images. The documented anti-patterns are worth reading: lead with what stays unchanged, quote in-image text character-for-character instead of paraphrasing, and use numbered refs when you pass multiple images. Routes through the local RunComfy CLI. If you need batch consistency across 20 SKU variants or photorealistic portrait fidelity, the skill doc steers you to Nano Banana Edit or Flux Kontext instead.
npx -y skills add doany-ai/skills --skill gpt-image-edit --agent claude-codeInstalls into .claude/skills of the current project.
runcomfy.com · Edit endpoint · Text-to-image sibling · GitHub
OpenAI GPT Image 2 — /edit endpoint (ChatGPT Images 2.0 image-to-image) on the RunComfy Model API. Strongest in its class at preserving identity through targeted edits and rewriting embedded text in any script (Latin, kana, CJK, Cyrillic, Arabic).
npx skills add agentspace-so/runcomfy-skills --skill gpt-image-edit -g
| You want | Use |
|---|---|
| Edit multilingual / embedded text in image | GPT Image Edit |
| Identity preservation through translated headline variants | GPT Image Edit |
| Layout-precise edit (move headline, swap CTA, etc.) | GPT Image Edit |
| Up to 10 reference images | GPT Image Edit |
| Batch up to 20 images consistently | Nano Banana Edit |
| Single-shot precise local edit, source-fidelity-first | Flux Kontext |
| Generate from scratch with GPT Image 2 | sibling gpt-image-2 skill |
| Batch SKU galleries with stable identity | Nano Banana Edit |
npm i -g @runcomfy/cliruncomfy login opens a browser device-code flow.RUNCOMFY_TOKEN=<token> instead of runcomfy login.openai/gpt-image-2/edit| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | Edit instruction. Lead with preservation, end with the change. |
images | string[] | yes | — | Up to 10 publicly-fetchable HTTPS URLs. First is primary; rest are auxiliary. |
size | enum | no | auto | auto (preserve input), 1024_1024 (1:1), 1024_1536 (2:3 portrait), 1536_1024 (3:2 landscape). |
size=auto preserves the input ratio — strongly recommended unless the edit explicitly changes framing.
Single-ref preservation edit:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Keep the person'\''s face, pose, and brand mark unchanged. Replace the background with a soft warm-grey studio sweep and a gentle floor shadow.",
"images": ["https://.../portrait.jpg"]
}' \
--output-dir <absolute/path>
Multilingual text rewrite (preserve everything except the headline):
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"今日のおすすめ\" in bold Japanese kana, same position and font weight as before.",
"images": ["https://.../poster-en.jpg"]
}' \
--output-dir <absolute/path>
Multi-ref composition:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity (face, pose, clothing) unchanged.",
"images": ["https://.../subject.jpg", "https://.../room.jpg"]
}' \
--output-dir <absolute/path>
Lead with preservation goals. Always: "Keep [face / pose / clothing / brand / framing] unchanged." Then state the change. The model honors what's stated up front.
Multilingual text — quote the characters, name the script. "the headline reads \"コーヒー\" in bold Japanese kana", "the label says \"АРОМА\" in Cyrillic, white on black", "the right-margin caption reads \"تخفيض\" in Arabic right-to-left". Don't paraphrase — quote.
Directional language for spatial edits. Concrete spatial scopes work: "move the headline from top-right to bottom-center", "remove the leftmost object only", "replace the watermark in the bottom-right corner".
Multi-ref numbering. When passing multiple images, refer to them by number: "subject from image 1, lighting from image 2, color palette from image 3". The model routes cues correctly.
Use size: "auto" to preserve input ratio. Only override when the edit explicitly changes framing (e.g. cropping a 16:9 to 1:1).
Anti-patterns:
size outside the 3 fixed values + auto → 422.| Use case | Why GPT Image Edit |
|---|---|
| Multilingual ad localization | One source asset → many language variants of the same headline |
| Brand-safe headline / CTA swaps | Layout precision + preservation language hold the rest stable |
| Multi-ref composition (subject from one, scene from another) | Numbered refs route cues correctly |
| Layout-precise repositioning | Directional language ("top-right to bottom-center") honored |
| Identity preservation across signage edits | Strongest in class for face / brand preservation through targeted edits |
Background swap with full preservation (page example):
Turn the background into a bright minimal white-to-soft-gray studio
sweep with gentle floor shadow; add a large headline in-image that
reads "OPEN STUDIO" in a bold clean sans-serif, high contrast, centered;
keep the main person or product, pose, and face identity unchanged
Multilingual variant:
Keep the photograph, layout, lighting, and brand mark exactly as in the
input. Replace only the in-image headline.
The new headline reads "コーヒー" in bold Japanese kana, same position
and font weight as before.
Multi-ref composition:
Compose subject from image 1 into the kitchen from image 2.
Match the warm window light and color palette of image 2.
Keep subject identity (face, pose, clothing) from image 1 unchanged.
size: 3 fixed values + auto — anything else 422s.images: up to 10 — first is primary, rest are auxiliary cues.| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill invokes runcomfy run openai/gpt-image-2/edit with a JSON body matching the schema. The CLI POSTs to https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/edit, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.
runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.--input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.davila7/claude-code-templates
orchestra-research/ai-research-skills
agentspace-so/runcomfy-agent-skills
inferen-sh/skills