How to turn an image into a video with AI

Last updated: 2026-07-048 min readDifficulty: Beginner-friendly

Image-to-video (often shortened to I2V) is the workflow where you hand an AI model a single still picture and a short description of movement, and it produces a few seconds of video built from that frame. It is the most reliable way to get controllable AI video, because the model isn't inventing a whole scene — it is animating one you already chose.

This guide is deliberately tool-agnostic. Whether you use Wan, a hosted service, or something else, the decisions that make or break an image-to-video result are the same: the source image, the motion description, and a handful of settings. Get those right and almost any current model gives you a usable clip.

How image-to-video differs from text-to-video

With text-to-video you describe everything and the model invents the scene, the subject, and the motion at once. You get variety, but little control — the face, the setting, and the framing are all up to the model.

Image-to-video fixes the scene in place. You supply the first frame, so the subject's appearance and composition are locked. The model's only job is to add believable movement. That constraint is exactly why I2V is the workflow to reach for when you care what the result looks like.

Choosing a source image that animates well

Sharp and well-lit beats artistic every time. Motion amplifies whatever is already in the frame, so a soft or noisy image comes out soft and noisy in motion.

One clear subject. A single person or object with room around them animates cleanly. A crowded frame gives the model too many things to move and usually produces warping somewhere.

Leave space for the motion you want. If you plan to have someone walk or turn, don't crop them tight to the edges — the model needs room to move them into, or the motion looks cramped and stretched.

Step-by-step

The flow is short and the same across tools. The quality lives in steps 1 and 3.

1
Prepare the source image
Pick a sharp image with one clear subject, and crop it to the aspect ratio you want the final video in — portrait for phones and social, landscape for wide. Whatever is in the frame is what the model will animate.
2
Set the output resolution
Use a resolution the model officially supports for your aspect ratio. Feeding an arbitrary size is the most common cause of stretched or failed output, because most video models expect dimensions that are multiples of a fixed block size.
3
Describe only the motion
Write one or two short sentences about what should move, using present-progressive verbs: "she is turning toward the camera, hair drifting in the wind." Do not re-describe the scene — the image already is the scene.
4
Set clip length and guidance
Keep the first clip short, around five seconds, and use a moderate guidance value. Short and moderate is the reliable zone across every model; long and extreme is where distortion lives.
5
Generate, then re-roll the seed
Run it. If the motion is nearly right, change only the seed and try again before touching anything else — these models are random, and a new seed often turns a near-miss into a keeper.

Settings that matter (any model)

The labels vary between tools, but these five levers exist almost everywhere and decide the outcome.

Source image	Sharp, one subject, cropped to target aspect ratio
Resolution	A size the model officially supports; don't improvise dimensions
Clip length	~5 seconds to start; drift and distortion grow with length
Motion prompt	Present-progressive, describes movement only, physically plausible
Guidance strength	Moderate — too high over-bakes, too low ignores your prompt

Common problems and fixes

Subject warps or melts: the requested motion is too big or guidance is too high. Ask for smaller, believable movement and lower guidance.

Almost no movement: the prompt is too vague or guidance too low. Use a concrete action verb and nudge guidance up.

Stretched or wrong shape: your dimensions aren't a supported size for the aspect ratio. Fix the resolution.

Flicker and grain: usually a soft source image. Start from a sharper picture — motion won't hide input noise, it multiplies it.

Where to go next

Once you can reliably animate a single image, the natural next steps are writing stronger motion prompts and stitching clips into longer sequences. We cover both in dedicated guides.

How image-to-video differs from text-to-video

Choosing a source image that animates well

Step-by-step

Prepare the source image

Set the output resolution

Describe only the motion

Set clip length and guidance

Generate, then re-roll the seed

Settings that matter (any model)

Common problems and fixes

Where to go next