Now we’re officially releasing Gemma 2 to researchers and developers globally. Available in both 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 is higher-performing and more efficient at inference than the first generation, with significant safety advancements built in. In fact, at 27B, it offers competitive alternatives to models more than twice its size, delivering the kind of performance that was only possible with proprietary models as recently as December. And that’s now achievable on a single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs.
Notes: "For the 27B model, we train on an 8x24x32 configuration of TPUv5p, totaling 6144 chips" trained on 13T tokens 6ND = 6*27000000000*13000000000000=2.106e+24
Size Notes: "We train Gemma 2 27B on 13 trillion tokens of primarily-English data"