Model Details

Domain:

Video

Task:

Video generation

Text-to-video

Model Access:

Open weights (unrestricted)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

This repository contains our T2V-A14B model, which supports generating 5s videos at both 480P and 720P resolutions. Built with a Mixture-of-Experts (MoE) architecture, it delivers outstanding video generation quality. On our new benchmark Wan-Bench 2.0, the model surpasses leading commercial models across most key evaluation dimensions.

Benchmarking

FLOPs4.2e+23

Notes: 6 FLOP / token / parameter * 14 * 10^9 active parameters * 5 * 10^12 tokens [assumed, see dataset size notes] = 4.2e+23 FLOP

Training

Training Code AccessibilityApache 2.0 https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B INference code: https://github.com/Wan-Video/Wan2.2

Size Notes: "Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos." for Wan 2.1, they are reporting "Wan has seen large-scale data comprising billions of images and videos, amounting to O(1) trillions of tokens in total." --> this model was trained on ~5 trillion tokens (with "Likely" confidence)