Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

SOTA Performance: Wan2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks. Multiple Tasks: Wan2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation. Visual Text Generation: Wan2.1 is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications. Powerful Video VAE: Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.

Benchmarking

FLOPs2.5e+23

Notes: "Through extensive experimentation, the model is validated at scale, reaching 14 billion parameters. Subsequently, Wan has seen large-scale data comprising billions of images and videos, amounting to O(1) trillions of tokens in total." So likely between 1T and 10T tokens. Assume 3T. Transformer architecture, so 6ND should be a decent approximation. 6ND = 6 * 14e9 * 3e12 ~= 2.5e+23 FLOP

Training

Training Code AccessibilityApache 2.0 https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P Inference code: https://github.com/Wan-Video/Wan2.1

Size Notes: "Wan has seen large-scale data comprising billions of images and videos, amounting to O(1) trillions of tokens in total." with "Likely" confidence, assuming ~3 trillion

Parameters

Parameters14000000000

Notes: 14B

Authors

Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jingren Zhou, Jinkai Wang, Jixuan Chen, Kai Zhu, Kang Zhao, Keyu Yan, Lianghua Huang, Mengyang Feng, Ningyi Zhang, Pandeng Li, Pingyu Wu, Ruihang Chu, Ruili Feng, Shiwei Zhang, Siyang Sun, Tao Fang, Tianxing Wang, Tianyi Gui, Tingyu Weng, Tong Shen, Wei Lin, Wei Wang, Wei Wang, Wenmeng Zhou, Wente Wang, Wenting Shen, Wenyuan Yu, Xianzhong Shi, Xiaoming Huang, Xin Xu, Yan Kou, Yangyu Lv, Yifei Li, Yijing Liu, Yiming Wang, Yingya Zhang, Yitong Huang, Yong Li, You Wu, Yu Liu, Yulin Pan, Yun Zheng, Yuntao Hong, Yupeng Shi, Yutong Feng, Zeyinzi Jiang, Zhen Han, Zhi-Fan Wu, Ziyu Liu

Related ModelsView all models

Qwen3-Omni-30B-A3BBy Alibaba

Multimodal

Language

Vision+2

Qwen3-Next-80B-A3BBy Alibaba

Language

Wan 2.2 14B I2VBy Alibaba

Video

Vision

Wan 2.2 14B T2VBy Alibaba

Video

Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Benchmarking

FLOPs2.5e+23

Training

Training Code AccessibilityApache 2.0 https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P Inference code: https://github.com/Wan-Video/Wan2.1

Size Notes: "Wan has seen large-scale data comprising billions of images and videos, amounting to O(1) trillions of tokens in total." with "Likely" confidence, assuming ~3 trillion