Model Details

Domain:

Image generation

Task:

Text-to-image

Image generation

Model Access:

Open weights (restricted use)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Training Physical AI systems in digital environments requires a physical world simulator. We introduce Cosmos-Predict2, the latest version of the Cosmos world model, designed for simulating and predicting the future state of the world as video. Cosmos-Predict2 features four models: Cosmos-Predict2-2B-Text2Image and Cosmos-Predict2-14B-Text2Image for text-to-image generation for creating high-quality images from text descriptions, and Cosmos-Predict2-2B-Video2World and Cosmos-Predict2-14B-Video2World for video-to-world generation for producing visual simulations from image or video inputs. To accelerate the development of world models for Physical AI, we make our code, model weights, and the benchmark (PBench) available under the NVIDIA Open Model License.

Benchmarking

Notes: High quality (but not SOTA) text-to-image generation, plausibly over 1e23 FLOP, unlikely to be over 1e25 FLOP.

Training

Training Code AccessibilityNVIDIA license (termination clause + attribution requirements) https://huggingface.co/nvidia/Cosmos-Predict2-14B-Text2Image Apache 2.0 for code https://github.com/nvidia-cosmos/cosmos-predict2/tree/main