Introduction

Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full finetuning, matching the top supervised pre-trained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of our features.

Benchmarking

FLOPs

3.3e+22

Notes: Taken from here https://www.lesswrong.com/posts/wfpdejMWog4vEDLDg/ai-and-compute-trend-isn-t-predictive-of-what-is-happening ("There's no compute data for the largest model, iGPT-XL. But based on the FLOP/s increase from GPT-3 XL (same num of params as iGPT-L) to GPT-3 6.7B (same num of params as iGPT-XL), I think it required 5 times more compute: 3.3 * 10^22 FLOP.")

Training

Training Code Accessibility

Modified MIT, code and weights: https://github.com/openai/image-gpt?tab=License-1-ov-file#readme train code: https://github.com/openai/image-gpt/blob/master/src/run.py

Hardware

NVIDIA Tesla V100 DGXS 32 GB

Size Notes: "We use the ImageNet ILSVRC 2012 training dataset, splitting off 4% as our experimental validation set and report results on the ILSVRC 2012 validation set as our test set." https://image-net.org/challenges/LSVRC/2012/ "The goal of this competition is to estimate the content of photographs for the purpose of retrieval and automatic annotation using a subset of the large hand-labeled ImageNet dataset (10,000,000 labeled images depicting 10,000+ object categories) as training."

Introduction

Benchmarking

FLOPs

3.3e+22

Training

Training Code Accessibility

Modified MIT, code and weights: https://github.com/openai/image-gpt?tab=License-1-ov-file#readme train code: https://github.com/openai/image-gpt/blob/master/src/run.py

Hardware

NVIDIA Tesla V100 DGXS 32 GB

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime

iGPT-XL - Use Model

iGPT-XL - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime

iGPT-XL - Use Model

iGPT-XL - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime