Model Details

Domain:

Biology

Task:

Protein folding prediction

Proteins

Model Access:

Open weights (unrestricted)

Citations:

31052

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort, the structures of around 100,000 unique proteins have been determined, but this represents a small fraction of the billions of known protein sequences. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’—has been an important open research problem for more than 50 years. Despite recent progress, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14), demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.

Benchmarking

FLOPs2.99e+21

Notes: 123 teraFLOPS / TPU v3 chip * 128 cores * (1 chip / 2 cores) * 11 days * 40% utilization = 2.99e21 FLOP https://www.wolframalpha.com/input?i=123+teraFLOPS+*+128+*+11+days+*+0.4 "Training regimen" section: "We train the model on Tensor Processing Unit (TPU) v3 with a batch size of 1 per TPU core, hence the model uses 128 TPU v3 cores. [...] The initial training stage takes approximately 1 week, and the fine-tuning stage takes approximately 4 additional days."

Training

Training Code AccessibilityWhile the AlphaFold code is licensed under the Apache 2.0 License, the AlphaFold parameters and CASP15 prediction data are made available under the terms of the CC BY 4.0 license code in this repo is inference code: https://github.com/google-deepmind/alphafold

HardwareGoogle TPU v3

Size Notes: 3 different types of input data to the network: (1) Amino acid sequence (2) Multiple sequence alignments (MSA) to sequences from evolutionarily related proteins (3) Template structures (3D atom coordinates of homologous structures), where available Training data is processed into the following two datasets that are sampled with different probabilities. Supplementary Material, Section 1.2.4. Training data: "With 75% probability a training example comes from the self-distillation set (see subsection 1.3) and with 25% probability the training example is a known structure from the Protein Data Bank" Supplementary Material, Section 1.3 Self-distillation dataset: "This gives a final dataset of 355,993 sequences". An initial model was used to predict structures for these sequences. PDB dataset size in 2020: https://www.rcsb.org/stats/growth/growth-released-structures 172788 Therefore, estimate for number of protein structures available for training (for which amino acid sequence, MSA and homologue template info is also available as input to network): 528781 [~530k]

Parameters

Parameters93000000

Notes: https://arxiv.org/abs/2207.05477 reimplements AlphaFold 2 in a more efficient way, and states there are 93M parameters in the original version (Table 1)

Authors

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Kathryn Tunyasuvunakool, Olaf Ronneberger, Russ Bates, Augustin Žídek, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Anna Potapenko, Andrew J Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Martin Steinegger, Michalina Pacholska, David Silver, Oriol Vinyals, Andrew W Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis.

Related ModelsView all models

EnzymeFlowBy McGill University

Biology

Octo-BaseBy University of California UC Berkeley

Robotics

Octo-SmallBy University of California UC Berkeley

Robotics

Distilled GrandmasterBy DeepMind

Games

DeepMind | AlphaFold 2 - Capabilities, Benchmarks and Use Cases

Model Details

Domain:

Biology

Task:

Protein folding prediction

Proteins

Model Access:

Open weights (unrestricted)

Citations:

31052

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Benchmarking

FLOPs2.99e+21

Training

HardwareGoogle TPU v3

Parameters

Parameters93000000

Notes: https://arxiv.org/abs/2207.05477 reimplements AlphaFold 2 in a more efficient way, and states there are 93M parameters in the original version (Table 1)