Now we’re officially releasing Gemma 2 to researchers and developers globally. Available in both 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 is higher-performing and more efficient at inference than the first generation, with significant safety advancements built in. In fact, at 27B, it offers competitive alternatives to models more than twice its size, delivering the kind of performance that was only possible with proprietary models as recently as December. And that’s now achievable on a single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs.
FLOPs2.106e+24
Notes: For the 27B model, we train on an 8x24x32 configuration of TPUv5p, totaling 6144 chips" trained on 13T tokens 6ND = 6*27000000000*13000000000000=2.106e+24
Training Code AccessibilityGemma 2 is available under our commercially-friendly Gemma license, giving developers and researchers the ability to share and commercialize their innovations.
Training DatasetUnspecified unreleased
Dataset Size13000000000000
HardwareGoogle TPU v5p
Hardware Quantity6144
Dataset Notes: Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content. Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions. Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.
Size Notes: We train Gemma 2 27B on 13 trillion tokens of primarily-English data
Parameters27000000000