Training Physical AI systems in digital environments requires a physical world simulator. We introduce Cosmos-Predict2, the latest version of the Cosmos world model, designed for simulating and predicting the future state of the world as video. Cosmos-Predict2 features four models: Cosmos-Predict2-2B-Text2Image and Cosmos-Predict2-14B-Text2Image for text-to-image generation for creating high-quality images from text descriptions, and Cosmos-Predict2-2B-Video2World and Cosmos-Predict2-14B-Video2World for video-to-world generation for producing visual simulations from image or video inputs. To accelerate the development of world models for Physical AI, we make our code, model weights, and the benchmark (PBench) available under the NVIDIA Open Model License.
Notes: High quality (but not SOTA) text-to-image generation, plausibly over 1e23 FLOP, unlikely to be over 1e25 FLOP.