Model Details

Domain:

Language

Mathematics

Task:

Automated theorem proving

Language modeling

Language generation

Question answering

Mathematical reasoning

Model Access:

API access

Citations:

142

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads.

Benchmarking

FLOPs1.79e+22

Notes: pretraining: 6 FLOP/parameter/token * 774000000 parameters * 372000000000 tokens = 1.727568e+21 FLOP unknown number of epochs -> "Likely" confidence expert iteration: 312000000000000 FLOP/GPU/sec * 48000 GPU-hours * 3600 sec / hour * 0.3 [assumed utilization] = 1.617408e+22 FLOP 1.617408e+22 FLOP + 1.727568e+21 FLOP = 1.7901648e+22 FLOP

Training

Training Code AccessibilityIt seems that only inference code is here, no model weights or pre-training code: https://github.com/openai/lean-gym Apache 2.0 "We present lean-gym’s API"

HardwareNVIDIA A100

Size Notes: Table on p12 gives WebMath dataset size in GB of code. Uncompressed code probably has a similar number of tokens per gigabyte as natural language text, on the order of 3e8 tokens per GB.

Parameters

Parameters774000000

Authors

Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, Ilya Sutskever

Related ModelsView all models

gpt-realtimeBy OpenAI

Speech

Vision

Language

Introduction

Benchmarking

FLOPs1.79e+22

Training

Training Code AccessibilityIt seems that only inference code is here, no model weights or pre-training code: https://github.com/openai/lean-gym Apache 2.0 "We present lean-gym’s API"

HardwareNVIDIA A100

Size Notes: Table on p12 gives WebMath dataset size in GB of code. Uncompressed code probably has a similar number of tokens per gigabyte as natural language text, on the order of 3e8 tokens per GB.

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors