Model Details

Domain:

Speech

Task:

Speech recognition ASR

Model Access:

Open weights (unrestricted)

Citations:

5363

Introduction

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. The original code repository can be found here. Compared to the Whisper large model, the large-v2 model is trained for 2.5x more epochs with added regularization for improved performance.

Benchmarking

FLOPs

1.1e+23

Notes: "Compared to the Whisper large model, the large-v2 model is trained for 2.5x more epochs with added regularization for improved performance." We (roughly) estimated Whisper v1 as 4.65e22. 2.5x that is 1.16e23 or ~1.1e23

Training

Training Code Accessibility

Apache 2.0 for weights code for v1 is MIT: https://github.com/openai/whisper

Size Notes: "When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zeroshot transfer setting without the need for any finetuning." 13,680 words/h (estimate) * 680,000h = 9,302,400,000 words

Parameters

1550000000

Notes: 1550M

Authors

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever

Related Models

OpenAI | Whisper v2 , Capabilities, Benchmarks and Use Cases, 2026

Whisper v2 - Use Model

Whisper v2 - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime

Whisper v2 - Use Model

Whisper v2 - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

GPT-5.1

GPT-5 Pro

Sora 2.0

gpt-realtime