How to use Wan 2.7 Image (text-to-image, editing, and image sets)

Dernière mise à jour: 2026-07-059 min de lectureDifficulté: Beginner-friendly

Wan 2.7 Image is a still-image model that does more than generate a picture from text. The same model handles text-to-image, editing an existing image with a prompt, editing just one region of an image, blending several reference images into one new scene, and producing a whole set of consistent images from a single request. Two versions are offered: wan2.7-image-pro, which supports 4K output on text-to-image and is the stronger editor, and wan2.7-image, which is faster but caps out lower.

This guide covers the five things people actually build with it — plain generation, editing, region edits, multi-image reference, and image sets — plus the settings and prompt habits that separate a clean result from a wasted render. If you've used an image-to-video model like Wan 2.2 before, the prompting instincts carry over: be specific, describe what you want rather than what you don't, and iterate on one variable at a time.

The five things Wan 2.7 Image can do

Text-to-image. No input image — just a prompt. This is the only mode that supports 4K on the Pro model, and the only one where thinking mode (an extra reasoning pass before generation) applies.

Image editing. Give it one image and a prompt describing the change — swap a background, change an outfit, restyle the lighting — and it edits the image rather than starting from scratch.

Interactive (region) editing. Same as editing, but you also pass a bounding box so the model only touches that part of the image and leaves the rest untouched. Useful when a full-image edit keeps changing things you didn't ask about.

Multi-image reference. Feed it up to nine images and one prompt that describes how to combine them — a person from one, an outfit from another, a background from a third. The model composites them into a single new image.

Image set generation. One request, multiple images that share a consistent subject or style — a four-season set of the same character, or a small product series. You set a maximum count and the model decides how many it actually needs.

Step-by-step

The flow is the same shape regardless of which of the five modes you're using — only the inputs and a couple of parameters change.

1
Pick the mode you actually need
Plain generation for a new image from text. Editing for changing an existing image. Interactive editing when you only want one area touched. Multi-image reference when the result needs to combine elements from more than one source image. Image sets when you need several consistent images from one request, not one.
2
Prepare your input images, if any
JPEG, PNG (no alpha channel), BMP, or WEBP, 240 to 8,000 pixels on each side, under 20 MB, aspect ratio between 1:8 and 8:1. For multi-image reference, the order you list the images in is the order the model treats them — put your primary subject first.
3
Write the prompt
Describe the finished image, not the transformation history. For edits, name the specific change ("replace the background with a rainy city street at night") rather than a vague instruction. For multi-image reference, refer to each image plainly ("the car from the first image", "the graffiti style from the second").
4
Set resolution and count
Choose 1K, 2K, or (Pro, text-to-image only) 4K, or specify exact pixel dimensions if you need a non-standard aspect ratio. Set n to how many images you want (1-4 normally, up to 12 for an image set) — remember every successful image is billed, so don't set n higher than you'll actually use.
5
For a region edit, add the bounding box
Pass the box as top-left and bottom-right pixel coordinates on the original image. If an input image needs no edit, pass an empty box for it rather than omitting it — the list of boxes has to line up with the list of images.
6
Generate, then download promptly
Results are hosted for 24 hours only. Save the images you're keeping immediately — don't rely on being able to re-fetch a result URL later.

Prompting edits and multi-image composites

For a single-image edit, say what changes and, if it matters, what stays the same: "keep the person and pose, replace only the background with a sunlit beach." The model tends to preserve anything you don't mention, so an unwanted change usually means the prompt was vague about that part of the image, not that the model ignored you.

For multi-image reference, treat the prompt like a set of instructions for a compositor: name which image contributes which element, in the order you listed them. "Put the outfit from image 2 on the person in image 1, in the setting from image 3" is far more reliable than a single sentence describing the end result with no reference to the sources.

For an image set, describe the through-line that has to stay constant — usually a character or a style — and then the one thing that changes per image. The example the docs use is a stray orange cat whose "features must be consistent across all images," with one season different in each shot. Name the constant explicitly; the model will otherwise treat each image as a fresh roll of the dice.

Recommended settings (baseline)

Start here, then adjust one variable at a time. Resolution limits differ by mode and model, so check which row applies before you set size.

Model	wan2.7-image-pro for 4K and stronger editing; wan2.7-image for faster, cheaper runs
Resolution — text-to-image	1K, 2K, or 4K (Pro only); square output when no image is input
Resolution — editing, region edits, multi-image, sets	1K or 2K only, even on Pro; output aspect ratio follows the input image
Reference images	0-9 images; order matters, list your primary subject first
n (image count)	1-4 for a single generation; up to 12 for an image set — set it to what you'll use, not the max
Thinking mode	On by default for plain text-to-image; adds time for extra quality, off for fast iteration
Watermark	Off by default; enable if you need the "AI Generated" label for a given surface
Seed	Fixed while comparing prompt changes; randomize once you're exploring variations

Common problems and fixes

An edit changed parts of the image you didn't ask about: the prompt didn't say what should stay the same. Add an explicit "keep X unchanged" clause, or switch to interactive editing with a bounding box so only the intended region can move.

A multi-image composite blends the wrong elements: the prompt didn't tie an instruction to a specific image. Reference each source image explicitly by its position ("image 1", "image 2") rather than describing the result in the abstract.

An image set doesn't stay consistent: the constant (character, style) wasn't stated as a hard requirement. Say directly that the subject's features must match across every image, and keep the varying element to one clear axis, like season or angle.

4K request failed or was ignored: 4K is only available on wan2.7-image-pro, and only for text-to-image with no input image. Any edit, region edit, multi-image, or image-set request tops out at 2K regardless of model.

Continuer la lecture

Recevez les nouveaux guides par email

Un email quand nous publions de nouveaux guides et analyses de modèles. Pas de spam, désinscription à tout moment.

How to use Wan 2.7 Image (text-to-image, editing, and image sets)

The five things Wan 2.7 Image can do

Step-by-step

Pick the mode you actually need

Prepare your input images, if any

Write the prompt

Set resolution and count

For a region edit, add the bounding box

Generate, then download promptly

Prompting edits and multi-image composites

Recommended settings (baseline)

Common problems and fixes

Continuer la lecture

How to use Z-Image for AI image generation

How to use Dola-Seedream-5.0-lite for AI image generation

How to use Seedream 4.5 for AI image generation

How to generate videos with Wan 2.7

Recevez les nouveaux guides par email