This repository contains our T2V-A14B model, which supports generating 5s videos at both 480P and 720P resolutions. Built with a Mixture-of-Experts (MoE) architecture, it delivers outstanding video generation quality. On our new benchmark Wan-Bench 2.0, the model surpasses leading commercial models across most key evaluation dimensions.
Notes: 6 FLOP / token / parameter * 14 * 10^9 active parameters * 5 * 10^12 tokens [assumed, see dataset size notes] = 4.2e+23 FLOP
Size Notes: "Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos." for Wan 2.1, they are reporting "Wan has seen large-scale data comprising billions of images and videos, amounting to O(1) trillions of tokens in total." --> this model was trained on ~5 trillion tokens (with "Likely" confidence)
Notes: 14B