How to use HappyHorse image-to-video (first-frame AI video)
HappyHorse image-to-video (I2V) takes one image — the first frame of your clip — plus a short prompt describing the motion, and animates it into a few seconds of video. It's the most direct HappyHorse workflow: no juggling multiple references like reference-to-video, no inventing the scene from scratch like text-to-video. You already have the frame; HappyHorse's job is to bring it to life.
This guide covers the full request-and-poll workflow, the image requirements that decide whether a render even starts, the settings worth knowing, and the mistakes that most often turn a clean photo into a warped clip.
What image-to-video is for
Reference-to-video composes several images into one scene. Text-to-video invents the scene from words alone. Image-to-video sits between them in simplicity: you supply exactly one image — the first frame — and a prompt describing what should move, and HappyHorse animates that single frame forward.
That makes it the right tool whenever you already have the exact photo you want animated. If your shot needs elements from more than one photo, use reference-to-video instead; if you don't have a source image at all, use text-to-video.
Step-by-step
The workflow is a request-and-poll cycle: submit the image and prompt, then check back for the finished video.
- 1
Prepare your first-frame image
Pick a sharp, well-lit image with one clear subject. It must be JPEG, JPG, PNG, or WEBP, at least 300 pixels on both width and height, with an aspect ratio between 1:2.5 and 2.5:1, and no larger than 20MB. Host it at a public URL or encode it as a base64 data string — both are accepted.
- 2
Write a motion-only prompt
Describe what should move, not what's already in the image. HappyHorse already has the scene from your picture; a prompt that re-describes the subject or setting fights the image instead of animating it.
- 3
Set resolution, duration, and watermark
Choose 720P or 1080P (1080P is the default), a duration between 3 and 15 seconds (5 is the default and the reliable starting point), and decide whether to keep the watermark (on by default).
- 4
Submit the task and save the task ID
Send the request with the X-DashScope-Async header set to enable — HTTP calls only support asynchronous processing, and a missing header returns an error. The response hands back a task_id, valid for 24 hours. Save it; don't submit the same job twice while waiting.
- 5
Poll for the result on an interval
Video generation takes one to five minutes. Query the task endpoint with your task_id every 10-15 seconds rather than holding an open connection. The task moves through PENDING, then RUNNING, then SUCCEEDED or FAILED.
- 6
Download the finished video promptly
Once the status is SUCCEEDED, the response includes a video_url. That link is only valid for 24 hours before it's purged, so download and store the video in your own storage as soon as it's ready.
Image requirements that decide whether a render even starts
HappyHorse validates the first-frame image before it will generate anything, so a rejected image fails fast rather than producing a bad video. Both dimensions must be at least 300 pixels — a tiny thumbnail will bounce.
Aspect ratio has to fall between 1:2.5 and 2.5:1. Very tall or very wide crops outside that range are rejected outright, so crop toward a more standard shape before submitting rather than after a failed call.
The output video inherits your source image's aspect ratio automatically — unlike HappyHorse text-to-video, image-to-video has no separate ratio parameter to set, because the picture already decides it.
Recommended settings (baseline)
Start here, then adjust one variable at a time.
| First-frame image | JPEG/JPG/PNG/WEBP, ≥300px on both sides, aspect ratio between 1:2.5 and 2.5:1, up to 20MB; public URL or base64 |
|---|---|
| Resolution | 1080P (default) or 720P — 720P is the cheaper, faster option for drafts and iteration |
| Duration | 3-15 seconds; 5 seconds is the default and the most reliable length to start from |
| Watermark | On by default (bottom-right "Happy Horse" mark); can be turned off |
| Seed | Fixed while tuning so you compare like-for-like; randomize once you're happy with the motion to explore variations |
| Prompt | Any language; up to 5,000 non-Chinese characters or 2,500 Chinese characters — longer input is truncated |
Writing a motion prompt that works
Lead with the subject's motion in present-progressive form: "is walking," "is turning," "is smiling." HappyHorse responds to described continuous action, not a static pose.
Add one layer of secondary motion for realism — wind in hair, rising steam, a flickering light — to make the clip read as alive rather than a warped photo. One or two of these is plenty; more competes with the main motion.
Keep it plausible for the duration you've set. A 5-second clip can hold a small, believable action; asking for a large or physically improbable motion in that window is the fastest way to get distortion.
Common problems and fixes
Task creation fails with a synchronous-call error: the X-DashScope-Async header is missing or not set to enable. HTTP calls to this endpoint only support asynchronous processing.
Image is rejected before generation starts: check dimensions (≥300px both sides), aspect ratio (between 1:2.5 and 2.5:1), format (JPEG/JPG/PNG/WEBP), and file size (≤20MB).
Subject warps or melts: the motion prompt re-describes the scene, or is asking for too much movement for the duration. Strip the prompt back to motion only and simplify the action.
Task status comes back UNKNOWN: the task_id is more than 24 hours old, or doesn't exist. Create a new task — there's no way to recover an expired one.
Video link no longer works: the video_url is only valid for 24 hours after the task succeeds. Always download and store the file promptly rather than relying on the link later.
Where I2V fits versus reference-to-video and text-to-video
Reach for image-to-video when you already have the one photo that shows exactly what you want animated — it's the simplest and most direct of the three HappyHorse workflows. If the shot needs elements pulled from several separate photos, reference-to-video composes them into one scene instead. If you don't have a source image at all, text-to-video generates the scene and the motion together from a written description.
The techniques also chain together: a text-to-video or reference-to-video result can become the source image for a further image-to-video pass, which is a common way to extend or refine a clip you've already generated.
Continuer la lecture
How to use HappyHorse text-to-video (AI video from a prompt)
A practical guide to HappyHorse text-to-video: how to write a prompt that produces physically realistic, motion-smooth video with no source image, the resolution and duration settings that matter, and the mistakes that waste a render.
How to use HappyHorse reference-to-video (multi-image AI video)
A practical guide to HappyHorse reference-to-video: how to combine several reference images — a person, an outfit, an accessory — into one AI video scene, how to reference each image in your prompt, and the settings and mistakes that decide whether the shots blend or clash.
How to edit an existing video with HappyHorse (AI video editing)
A practical guide to HappyHorse video editing: how to restyle or locally replace parts of an existing video using a reference image and a text instruction, the input requirements that decide whether it works, and the settings and mistakes that make or break an edit.
How to turn an image into a video with AI
A model-agnostic guide to animating a still image into short AI video: what image-to-video actually does, how to pick a source image that moves well, and the settings that decide whether the result looks alive or broken.
How to write prompts for AI video generation
The prompt structure that actually works for AI video: why motion prompts are different from image prompts, the present-progressive rule, and the specific phrasing that gets you believable movement instead of a warped photo.
Recevez les nouveaux guides par email
Un email quand nous publions de nouveaux guides et analyses de modèles. Pas de spam, désinscription à tout moment.
