How to use Wan 2.7 Image (text-to-image, editing, and image sets)
Wan 2.7 Image is a still-image model that does more than generate a picture from text. The same model handles text-to-image, editing an existing image with a prompt, editing just one region of an image, blending several reference images into one new scene, and producing a whole set of consistent images from a single request. Two versions are offered: wan2.7-image-pro, which supports 4K output on text-to-image and is the stronger editor, and wan2.7-image, which is faster but caps out lower.
This guide covers the five things people actually build with it — plain generation, editing, region edits, multi-image reference, and image sets — plus the settings and prompt habits that separate a clean result from a wasted render. If you've used an image-to-video model like Wan 2.2 before, the prompting instincts carry over: be specific, describe what you want rather than what you don't, and iterate on one variable at a time.
The five things Wan 2.7 Image can do
Text-to-image. No input image — just a prompt. This is the only mode that supports 4K on the Pro model, and the only one where thinking mode (an extra reasoning pass before generation) applies.
Image editing. Give it one image and a prompt describing the change — swap a background, change an outfit, restyle the lighting — and it edits the image rather than starting from scratch.
Interactive (region) editing. Same as editing, but you also pass a bounding box so the model only touches that part of the image and leaves the rest untouched. Useful when a full-image edit keeps changing things you didn't ask about.
Multi-image reference. Feed it up to nine images and one prompt that describes how to combine them — a person from one, an outfit from another, a background from a third. The model composites them into a single new image.
Image set generation. One request, multiple images that share a consistent subject or style — a four-season set of the same character, or a small product series. You set a maximum count and the model decides how many it actually needs.
Step-by-step
The flow is the same shape regardless of which of the five modes you're using — only the inputs and a couple of parameters change.
- 1
Pick the mode you actually need
Plain generation for a new image from text. Editing for changing an existing image. Interactive editing when you only want one area touched. Multi-image reference when the result needs to combine elements from more than one source image. Image sets when you need several consistent images from one request, not one.
- 2
Prepare your input images, if any
JPEG, PNG (no alpha channel), BMP, or WEBP, 240 to 8,000 pixels on each side, under 20 MB, aspect ratio between 1:8 and 8:1. For multi-image reference, the order you list the images in is the order the model treats them — put your primary subject first.
- 3
Write the prompt
Describe the finished image, not the transformation history. For edits, name the specific change ("replace the background with a rainy city street at night") rather than a vague instruction. For multi-image reference, refer to each image plainly ("the car from the first image", "the graffiti style from the second").
- 4
Set resolution and count
Choose 1K, 2K, or (Pro, text-to-image only) 4K, or specify exact pixel dimensions if you need a non-standard aspect ratio. Set n to how many images you want (1-4 normally, up to 12 for an image set) — remember every successful image is billed, so don't set n higher than you'll actually use.
- 5
For a region edit, add the bounding box
Pass the box as top-left and bottom-right pixel coordinates on the original image. If an input image needs no edit, pass an empty box for it rather than omitting it — the list of boxes has to line up with the list of images.
- 6
Generate, then download promptly
Results are hosted for 24 hours only. Save the images you're keeping immediately — don't rely on being able to re-fetch a result URL later.
Prompting edits and multi-image composites
For a single-image edit, say what changes and, if it matters, what stays the same: "keep the person and pose, replace only the background with a sunlit beach." The model tends to preserve anything you don't mention, so an unwanted change usually means the prompt was vague about that part of the image, not that the model ignored you.
For multi-image reference, treat the prompt like a set of instructions for a compositor: name which image contributes which element, in the order you listed them. "Put the outfit from image 2 on the person in image 1, in the setting from image 3" is far more reliable than a single sentence describing the end result with no reference to the sources.
For an image set, describe the through-line that has to stay constant — usually a character or a style — and then the one thing that changes per image. The example the docs use is a stray orange cat whose "features must be consistent across all images," with one season different in each shot. Name the constant explicitly; the model will otherwise treat each image as a fresh roll of the dice.
Recommended settings (baseline)
Start here, then adjust one variable at a time. Resolution limits differ by mode and model, so check which row applies before you set size.
| Model | wan2.7-image-pro for 4K and stronger editing; wan2.7-image for faster, cheaper runs |
|---|---|
| Resolution — text-to-image | 1K, 2K, or 4K (Pro only); square output when no image is input |
| Resolution — editing, region edits, multi-image, sets | 1K or 2K only, even on Pro; output aspect ratio follows the input image |
| Reference images | 0-9 images; order matters, list your primary subject first |
| n (image count) | 1-4 for a single generation; up to 12 for an image set — set it to what you'll use, not the max |
| Thinking mode | On by default for plain text-to-image; adds time for extra quality, off for fast iteration |
| Watermark | Off by default; enable if you need the "AI Generated" label for a given surface |
| Seed | Fixed while comparing prompt changes; randomize once you're exploring variations |
Common problems and fixes
An edit changed parts of the image you didn't ask about: the prompt didn't say what should stay the same. Add an explicit "keep X unchanged" clause, or switch to interactive editing with a bounding box so only the intended region can move.
A multi-image composite blends the wrong elements: the prompt didn't tie an instruction to a specific image. Reference each source image explicitly by its position ("image 1", "image 2") rather than describing the result in the abstract.
An image set doesn't stay consistent: the constant (character, style) wasn't stated as a hard requirement. Say directly that the subject's features must match across every image, and keep the varying element to one clear axis, like season or angle.
4K request failed or was ignored: 4K is only available on wan2.7-image-pro, and only for text-to-image with no input image. Any edit, region edit, multi-image, or image-set request tops out at 2K regardless of model.
Continuer la lecture
How to use Z-Image for AI image generation
A practical guide to Z-Image (z-image-turbo): a fast, lightweight text-to-image model with clean English and Chinese text rendering — the resolutions that work best, how to write a prompt it renders faithfully, and the settings that decide speed versus quality.
How to use Dola-Seedream-5.0-lite for AI image generation
A practical guide to Dola-Seedream-5.0-lite (model ID seedream-5-0-260128): a text- and image-input model with web-connected retrieval, strong reference consistency, and accurate instruction following — how to prompt it, combine reference images, and pick the right pricing tier.
How to use Seedream 4.5 for AI image generation
A practical guide to ByteDance Seedream 4.5: how its text-to-image and image-to-image (multi-image fusion) modes work, how to reference input images in a prompt, the resolution and batch settings that matter, and how to get consistent, high-detail results.
How to generate videos with Wan 2.7
A practical guide to Wan 2.7 image-to-video: native audio, clips up to about fifteen seconds, and built-in prompt optimization — what the model adds over 2.5, how to prompt sound, and how to script a longer clip so it holds together.
Recevez les nouveaux guides par email
Un email quand nous publions de nouveaux guides et analyses de modèles. Pas de spam, désinscription à tout moment.
