Introduction

We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128×128, 4.59 on ImageNet 256×256, and 7.72 on ImageNet 512×512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256×256 and 3.85 on ImageNet 512×512.

Benchmarking

FLOPs

6.2e+21

Notes: Largest run with their architecture improvements is the ImageNet 512 variant. Table 7 suggests utilization is around 30% for largest models (though we only see 256 x 256 and 128 -> 512) Table 10: ImageNet 512 variant took 1914 V100-days of training 125e12 FLOP/sec * 1914 days * 24 h/day * 3600 sec/h * 0.3 = 6.2e21

Training

Training Code Accessibility

These models are intended to be used for research purposes only. In particular, they can be used as a baseline for generative modeling research, or as a starting point to build off of for such research. These models are not intended to be commercially deployed. Additionally, they are not intended to be used to create propaganda or offensive imagery. repo is here with training code, MIT License https://github.com/openai/guided-diffusion

Hardware

NVIDIA V100

Size Notes: Biggest models are trained on ImageNet 512x512. ImageNet ILSVRC has 1,281,167 images in the training set, but it is possible some were filtered due to size. Note that a smaller model was trained on LSUN {bedroom, horse, cat}, which forms a larger dataset: 3,033,042 + 2,000,340 + 1,657,266 = 6,690,648 images Epochs ≈ (1,940,000 * 256) / 1,300,000 ≈ 381 epochs

Introduction

Benchmarking

FLOPs

6.2e+21

Training

Training Code Accessibility

Hardware

NVIDIA V100

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime

ADM - Use Model

ADM - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime

ADM - Use Model

ADM - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime