Wan AI: Alibaba's video models, from open weights to Wan 2.7
Wan is Alibaba's family of video generation models, and it plays a role nobody else in the top tier does: the open one. Wan's open-weight releases (the 2.1/2.2 line) can be downloaded and run on your own GPU — free, unlimited, private — while its newer hosted versions (2.5 and 2.7) compete with the commercial flagships, adding native audio and clips up to around fifteen seconds.
That split personality is the point of this page: Wan is simultaneously the best self-hosted option for tinkerers and a serious hosted engine for production. We use it heavily and keep detailed version-by-version guides; this page is the map of the family and how to pick your entry point.
Quick facts
| Made by | Alibaba (Tongyi lab) |
|---|---|
| What it does | Text-to-video and image-to-video; newer versions add native audio and a strong image model (Wan 2.7 Image) |
| The open side | Wan 2.1/2.2 open weights — run locally in ComfyUI on a consumer GPU, free and unlimited |
| The hosted side | Wan 2.5 (quality tier) and Wan 2.7 (audio + up to ~15s clips), via API and hosted platforms |
| Pricing | Local: free (your hardware). Hosted: pay-per-second API pricing, cheaper than Western flagships |
| Best for | Self-hosters, high-volume generation on a budget, and audio-enabled clips without flagship prices |
What Wan does best
The open-weight releases changed who gets to generate video. Wan 2.2 on a consumer GPU produces genuinely good image-to-video — subject consistency, real motion — with zero per-clip cost, no queue, no content gatekeeping, and full pipeline control in ComfyUI. Nothing commercial matches that combination for tinkerers and volume workflows.
The hosted line is quietly competitive: Wan 2.5 stepped up detail and motion stability, and Wan 2.7 added native audio and much longer clips — the two features that otherwise force you onto Veo or Sora — at aggressive API pricing.
Because the family spans free-local to premium-hosted, you can prototype locally and move the same prompting style to the hosted tiers when a project needs audio, length, or peak quality. Our Wan 2.2, 2.5, and 2.7 guides cover each rung in depth.
Limitations to know before you commit
Local Wan has a hardware price of admission: a modern GPU with substantial VRAM, plus the willingness to set up ComfyUI. It's the cheapest per clip and the most expensive in setup effort.
Ecosystem polish trails the Western flagships — the consumer-facing apps and editing tools around Veo or Runway don't exist in the same form; Wan is a model you access through platforms and APIs more than a packaged creative suite.
Ceiling-level photorealism and prompt adherence at the very top end still belong to Veo 3 and the newest closed models; Wan's pitch is 90% of the result at a fraction of the cost, not outright leadership.
How to get access
Local route: download the open Wan weights and run them in ComfyUI. Budget an afternoon for setup; after that every generation is free. Our Wan 2.2 guide is the walkthrough.
Hosted route: Alibaba's model API exposes the full line including 2.5 and 2.7 with per-second pricing, and a growing set of third-party generation platforms offer Wan models with credit systems — often the easiest way to try 2.7's audio without an API account.
How Wan compares
Against Veo 3: Veo wins on peak fidelity, ecosystem, and audio polish; Wan 2.7 answers with native audio and longer clips at a much lower price, plus the open escape hatch no Google product offers.
Against Kling and Hailuo: those have friendlier consumer apps and free daily credits; Wan counters with open weights and better economics at API volume. For a creator who wants to own their pipeline, Wan is the default choice.
Preguntas frecuentes
- Is Wan free?
- The open-weight versions (Wan 2.1/2.2) are genuinely free: download the weights and run them locally in ComfyUI on your own GPU, with no per-clip cost. The newer hosted versions (2.5, 2.7) are paid per second via API or through hosted platforms' credit systems.
- What's the difference between Wan 2.2, 2.5, and 2.7?
- 2.2 is the light, fast, open workhorse (silent, ~5s clips). 2.5 is the hosted quality upgrade — steadier motion, finer detail. 2.7 is the premium tier: native audio and clips up to about fifteen seconds in one pass. Our per-version guides cover each in detail.
- What do I need to run Wan locally?
- A modern GPU with substantial VRAM, ComfyUI, and the open Wan checkpoint for your workflow (text-to-video or image-to-video). Setup takes an afternoon; generation after that is free and unlimited.
Guías prácticas
Modelos relacionados
Recibe las nuevas guías por email
Un email cuando publicamos nuevas guías y análisis de modelos. Sin spam, cancela cuando quieras.
