Training Physical AI systems in digital environments requires a physical world simulator. We introduce Cosmos-Predict2, the latest version of the Cosmos world model, designed for simulating and predicting the future state of the world as video. Cosmos-Predict2 features four models: Cosmos-Predict2-2B-Text2Image and Cosmos-Predict2-14B-Text2Image for text-to-image generation for creating high-quality images from text descriptions, and Cosmos-Predict2-2B-Video2World and Cosmos-Predict2-14B-Video2World for video-to-world generation for producing visual simulations from image or video inputs. To accelerate the development of world models for Physical AI, we make our code, model weights, and the benchmark (PBench) available under the NVIDIA Open Model License.
Notes: Maybe over 1e25 FLOP as the original "Cosmos-1.0-Diffusion-14B-Video2World" was just under 1e25 FLOP, and this appears to be an improved model. But details not yet released.