We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads.
Notes: pretraining: 6 FLOP/parameter/token * 774000000 parameters * 372000000000 tokens = 1.727568e+21 FLOP unknown number of epochs -> "Likely" confidence expert iteration: 312000000000000 FLOP/GPU/sec * 48000 GPU-hours * 3600 sec / hour * 0.3 [assumed utilization] = 1.617408e+22 FLOP 1.617408e+22 FLOP + 1.727568e+21 FLOP = 1.7901648e+22 FLOP
Size Notes: Table on p12 gives WebMath dataset size in GB of code. Uncompressed code probably has a similar number of tokens per gigabyte as natural language text, on the order of 3e8 tokens per GB.