Introduction

Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning1,2,3,4, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges1,5, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004.

Benchmarking

Notes: Training details. Don't think there's enough info for a FLOP estimate. "Our customized tokenizer is trained with ‘word’ mode using SentencePiece36 and has a vocabulary size of 757. We limit the maximum context length to 1,024 tokens and use T5-style relative position embedding37. Sequence packing38,39 is also used because more than 90% of our sequences are under 200 in length. During training, a dropout40 rate of 5% is applied pre-attention and post-dense. A 4 × 4 slice of TPUv3 (ref. 41) is used as its hardware accelerator. For pretraining, we train the transformer with a batch size of 16 per core and a cosine learning-rate schedule that decays from 0.01 to 0.001 in 10,000,000 steps. For fine-tuning, we maintain the final learning rate of 0.001 for another 1,000,000 steps"

Training

Training Code AccessibilityApache 2.0: https://github.com/google-deepmind/alphageometry Data is synthetic so can be reproduced using open code

HardwareGoogle TPU v3

Size Notes: 100m examples of theorem-proofs "By using existing symbolic engines on a diverse set of random theorem premises, we extracted 100 million synthetic theorems and their proofs, many with more than 200 proof steps, four times longer than the average proof length of olympiad theorems."

Introduction

Benchmarking

Training

Training Code AccessibilityApache 2.0: https://github.com/google-deepmind/alphageometry Data is synthetic so can be reproduced using open code

HardwareGoogle TPU v3

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors