Deep learning models that predict functional genomic measurements from DNA sequence are powerful tools for deciphering the genetic regulatory code. Existing methods trade off between input sequence length and prediction resolution, thereby limiting their modality scope and performance. We present AlphaGenome, which takes as input 1 megabase of DNA sequence and predicts thousands of functional genomic tracks up to single base pair resolution across diverse modalities – including gene expression, transcription initiation, chromatin accessibility, histone modifications, transcription factor binding, chro- matin contact maps, splice site usage, and splice junction coordinates and strength. Trained on human and mouse genomes, AlphaGenome matches or exceeds the strongest respective available external models on 24 out of 26 evaluations on variant effect prediction. AlphaGenome’s ability to simultaneously score variant effects across all modalities accurately recapitulates the mechanisms of clinically-relevant variants near the TAL1 oncogene. To facilitate broader use, we provide tools for making genome track and variant effect predictions from sequence.

Get API key | Quick start | Installation | Documentation | Community | Terms of Use
The AlphaGenome API provides access to AlphaGenome, Google DeepMind’s unifying model for deciphering the regulatory code within DNA sequences. This repository contains client-side code, examples and documentation to help you use the AlphaGenome API.
AlphaGenome offers multimodal predictions, encompassing diverse functional outputs such as gene expression, splicing patterns, chromatin features, and contact maps (see diagram below). The model analyzes DNA sequences of up to 1 million base pairs in length and can deliver predictions at single base-pair resolution for most outputs. AlphaGenome achieves state-of-the-art performance across a range of genomic prediction benchmarks, including numerous diverse variant effect prediction tasks (detailed in Avsec et al. 2025).
The API is offered free of charge for non-commercial use (subject to the terms of use). Query rates vary based on demand – it is well suited for smaller to medium-scale analyses such as analysing a limited number of genomic regions or variants requiring 1000s of predictions, but is likely not suitable for large scale analyses requiring more than 1 million predictions. Once you obtain your API key, you can easily get started by following our Quick Start Guide, or watching our AlphaGenome 101 tutorial.

The documentation also covers a set of comprehensive tutorials, variant scoring
strategies to efficiently score variant effects, and a visualization library to
generate matplotlib figures for the different output modalities.
We cover additional details of the capabilities and limitations in our documentation. For support and feedback:
The quickest way to get started is to run our example notebooks in Google Colab. Here are some starter notebooks:
Alternatively, you can dive straight in by following the installation guide and start writing code! Here's an example of making a variant prediction:
from alphagenome.data import genome
from alphagenome.models import dna_client
from alphagenome.visualization import plot_components
import matplotlib.pyplot as plt
API_KEY = 'MyAPIKey'
model = dna_client.create(API_KEY)
interval = genome.Interval(chromosome='chr22', start=35677410, end=36725986)
variant = genome.Variant(
chromosome='chr22',
position=36201698,
reference_bases='A',
alternate_bases='C',
)
outputs = model.predict_variant(
interval=interval,
variant=variant,
ontology_terms=['UBERON:0001157'],
requested_outputs=[dna_client.OutputType.RNA_SEQ],
)
plot_components.plot(
[
plot_components.OverlaidTracks(
tdata={
'REF': outputs.reference.rna_seq,
'ALT': outputs.alternate.rna_seq,
},
colors={'REF': 'dimgrey', 'ALT': 'red'},
),
],
interval=outputs.reference.rna_seq.interval.resize(2**15),
# Annotate the location of the variant as a vertical line.
annotations=[plot_components.VariantAnnotation([variant], alpha=0.8)],
)
plt.show()
[!TIP] You may optionally wish to create a Python Virtual Environment to prevent conflicts with your system's Python environment.
To install alphagenome, clone a local copy of the repository and run pip install:
$ git clone https://github.com/google-deepmind/alphagenome.git
$ pip install ./alphagenome
See the documentation for information on alternative installation strategies.
alphagenomeIf you use AlphaGenome in your research, please cite using:
@article{alphagenome,
title={{AlphaGenome}: advancing regulatory variant effect prediction with a unified {DNA} sequence model},
author={Avsec, {\v Z}iga and Latysheva, Natasha and Cheng, Jun and Novati, Guido and Taylor, Kyle R. and Ward, Tom and Bycroft, Clare and Nicolaisen, Lauren and Arvaniti, Eirini and Pan, Joshua and Thomas, Raina and Dutordoir, Vincent and Perino, Matteo and De, Soham and Karollus, Alexander and Gayoso, Adam and Sargeant, Toby and Mottram, Anne and Wong, Lai Hong and Drot{\'a}r, Pavol and Kosiorek, Adam and Senior, Andrew and Tanburn, Richard and Applebaum, Taylor and Basu, Souradeep and Hassabis, Demis and Kohli, Pushmeet},
year={2025},
doi={https://doi.org/10.1101/2025.06.25.661532},
publisher={Cold Spring Harbor Laboratory},
journal={bioRxiv}
}
AlphaGenome communicates with and/or references the following separate libraries and packages:
We thank all their contributors and maintainers!
Copyright 2024 Google LLC
All software in this repository is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0. For the avoidance of doubt, as noted above, the API is offered free of charge for non-commercial use subject to the Terms of Use.
Examples and documentation to help you use the AlphaGenome API are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode.
Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.
This is not an official Google product.
Your use of any third-party software, libraries or code referenced in the materials in this repository (including the libraries listed in the Acknowledgments section) may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.
A modified version of the GENCODE dataset (which can be found here: https://www.gencodegenes.org/human/releases.html) is released with the client code package for illustrative purposes, and is available with reference to the following:
Notes: Pre-training: "Each gradient step processed a batch size of 64 samples using 8-way sequence parallelism, requiring 512 TPUv3 cores, with pre-training runs typically completing in approximately 4 hours." "Distillation using many teacher models (e.g., 64; orange crosses)" 123000000000000 FLOP / TPUv3 chip / sec * (512 TPUv3 cores / 2) * 4 hours * 3600 sec / hour * 0.3 [assumed utilization] * 64 training runs ["Likely" confidence] = 8.7058022e+21 FLOP Distillation: "Distillation training was performed without sequence parallelism across 64 NVIDIA H100 GPUs, with a batch size of 64 (effectively one sample per GPU). Each GPU loaded a different frozen teacher model from the pool of 64 pretrained all-folds models. <..> The distillation process ran for 250,000 steps, taking approximately 3 days" 989400000000000 FLOP / GPU / sec * 64 GPUs * 3 days * 24 hours / day * 3600 sec / hour * 0.3 [assumed utilization] = 4.9238876e+21 FLOP 8.7058022e+21 FLOP + 4.9238876e+21 FLOP = 1.362969e+22 FLOP "Likely" confidence, because I am not sure how many pre-training runs were there.
Notes: " AlphaGenome has approximately 450 million trainable parameters (20% in the encoder, 28% in the sequence transformer, 15% in the pairwise blocks, 25% in the decoder, and 12% in the output embedding and prediction heads)"