We are releasing Kandinsky 5.0 Video Lite, the first model in the new Kandinsky 5 series. The model runs at 768x512 resolution and, despite its compact size of just 2 billion parameters, demonstrates quality superior to previous versions of Kandinsky and most current open-source state-of-the-art solutions. A key focus is efficiency: the model is compact, requires fewer resources, and generates faster. This result was achieved through a comprehensive approach—from data collection and preparation to pretraining and fine-tuning. We explored modern architecture optimization methods and applied our own developments to balance quality and speed.
Size Notes: To create the pretrain dataset, we collected a massive dataset of 6 billion images and 35 million videos, which we then sliced (using the pyscenedetect scene change detector) into 1.5 billion short scenes ranging from 2 to 60 seconds. We then filtered out samples that: were too low-resolution: up to 256 pixels on the shortest side; were duplicates and very similar; were watermarked; were overloaded with text (document photos, etc.); were not dynamic enough. From the remaining data, we selected 124 million scenes and 520 million images that were the most aesthetically pleasing and technically sound.
Notes: 2B