How to use HappyHorse image-to-video (first-frame AI video)

Dernière mise à jour: 2026-07-058 min de lectureDifficulté: Beginner-friendly

HappyHorse image-to-video (I2V) takes one image — the first frame of your clip — plus a short prompt describing the motion, and animates it into a few seconds of video. It's the most direct HappyHorse workflow: no juggling multiple references like reference-to-video, no inventing the scene from scratch like text-to-video. You already have the frame; HappyHorse's job is to bring it to life.

This guide covers the full request-and-poll workflow, the image requirements that decide whether a render even starts, the settings worth knowing, and the mistakes that most often turn a clean photo into a warped clip.

What image-to-video is for

Reference-to-video composes several images into one scene. Text-to-video invents the scene from words alone. Image-to-video sits between them in simplicity: you supply exactly one image — the first frame — and a prompt describing what should move, and HappyHorse animates that single frame forward.

That makes it the right tool whenever you already have the exact photo you want animated. If your shot needs elements from more than one photo, use reference-to-video instead; if you don't have a source image at all, use text-to-video.

Step-by-step

The workflow is a request-and-poll cycle: submit the image and prompt, then check back for the finished video.

1
Prepare your first-frame image
Pick a sharp, well-lit image with one clear subject. It must be JPEG, JPG, PNG, or WEBP, at least 300 pixels on both width and height, with an aspect ratio between 1:2.5 and 2.5:1, and no larger than 20MB. Host it at a public URL or encode it as a base64 data string — both are accepted.
2
Write a motion-only prompt
Describe what should move, not what's already in the image. HappyHorse already has the scene from your picture; a prompt that re-describes the subject or setting fights the image instead of animating it.
3
Set resolution, duration, and watermark
Choose 720P or 1080P (1080P is the default), a duration between 3 and 15 seconds (5 is the default and the reliable starting point), and decide whether to keep the watermark (on by default).
4
Submit the task and save the task ID
Send the request with the X-DashScope-Async header set to enable — HTTP calls only support asynchronous processing, and a missing header returns an error. The response hands back a task_id, valid for 24 hours. Save it; don't submit the same job twice while waiting.
5
Poll for the result on an interval
Video generation takes one to five minutes. Query the task endpoint with your task_id every 10-15 seconds rather than holding an open connection. The task moves through PENDING, then RUNNING, then SUCCEEDED or FAILED.
6
Download the finished video promptly
Once the status is SUCCEEDED, the response includes a video_url. That link is only valid for 24 hours before it's purged, so download and store the video in your own storage as soon as it's ready.

Image requirements that decide whether a render even starts

HappyHorse validates the first-frame image before it will generate anything, so a rejected image fails fast rather than producing a bad video. Both dimensions must be at least 300 pixels — a tiny thumbnail will bounce.

Aspect ratio has to fall between 1:2.5 and 2.5:1. Very tall or very wide crops outside that range are rejected outright, so crop toward a more standard shape before submitting rather than after a failed call.

The output video inherits your source image's aspect ratio automatically — unlike HappyHorse text-to-video, image-to-video has no separate ratio parameter to set, because the picture already decides it.

Recommended settings (baseline)

Start here, then adjust one variable at a time.

First-frame image	JPEG/JPG/PNG/WEBP, ≥300px on both sides, aspect ratio between 1:2.5 and 2.5:1, up to 20MB; public URL or base64
Resolution	1080P (default) or 720P — 720P is the cheaper, faster option for drafts and iteration
Duration	3-15 seconds; 5 seconds is the default and the most reliable length to start from
Watermark	On by default (bottom-right "Happy Horse" mark); can be turned off
Seed	Fixed while tuning so you compare like-for-like; randomize once you're happy with the motion to explore variations
Prompt	Any language; up to 5,000 non-Chinese characters or 2,500 Chinese characters — longer input is truncated

Writing a motion prompt that works

Lead with the subject's motion in present-progressive form: "is walking," "is turning," "is smiling." HappyHorse responds to described continuous action, not a static pose.

Add one layer of secondary motion for realism — wind in hair, rising steam, a flickering light — to make the clip read as alive rather than a warped photo. One or two of these is plenty; more competes with the main motion.

Keep it plausible for the duration you've set. A 5-second clip can hold a small, believable action; asking for a large or physically improbable motion in that window is the fastest way to get distortion.

Common problems and fixes

Task creation fails with a synchronous-call error: the X-DashScope-Async header is missing or not set to enable. HTTP calls to this endpoint only support asynchronous processing.

Image is rejected before generation starts: check dimensions (≥300px both sides), aspect ratio (between 1:2.5 and 2.5:1), format (JPEG/JPG/PNG/WEBP), and file size (≤20MB).

Subject warps or melts: the motion prompt re-describes the scene, or is asking for too much movement for the duration. Strip the prompt back to motion only and simplify the action.

Task status comes back UNKNOWN: the task_id is more than 24 hours old, or doesn't exist. Create a new task — there's no way to recover an expired one.

Video link no longer works: the video_url is only valid for 24 hours after the task succeeds. Always download and store the file promptly rather than relying on the link later.

Where I2V fits versus reference-to-video and text-to-video

Reach for image-to-video when you already have the one photo that shows exactly what you want animated — it's the simplest and most direct of the three HappyHorse workflows. If the shot needs elements pulled from several separate photos, reference-to-video composes them into one scene instead. If you don't have a source image at all, text-to-video generates the scene and the motion together from a written description.

The techniques also chain together: a text-to-video or reference-to-video result can become the source image for a further image-to-video pass, which is a common way to extend or refine a clip you've already generated.

Continuer la lecture

Recevez les nouveaux guides par email

Un email quand nous publions de nouveaux guides et analyses de modèles. Pas de spam, désinscription à tout moment.

How to use HappyHorse image-to-video (first-frame AI video)

What image-to-video is for

Step-by-step

Prepare your first-frame image

Write a motion-only prompt

Set resolution, duration, and watermark

Submit the task and save the task ID

Poll for the result on an interval

Download the finished video promptly

Image requirements that decide whether a render even starts

Recommended settings (baseline)

Writing a motion prompt that works

Common problems and fixes

Where I2V fits versus reference-to-video and text-to-video

Continuer la lecture

How to use HappyHorse text-to-video (AI video from a prompt)

How to use HappyHorse reference-to-video (multi-image AI video)

How to edit an existing video with HappyHorse (AI video editing)

How to turn an image into a video with AI

How to write prompts for AI video generation

Recevez les nouveaux guides par email