Now we’re officially releasing Gemma 2 to researchers and developers globally. Available in both 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 is higher-performing and more efficient at inference than the first generation, with significant safety advancements built in. In fact, at 27B, it offers competitive alternatives to models more than twice its size, delivering the kind of performance that was only possible with proprietary models as recently as December. And that’s now achievable on a single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs.
Notes: "For the 2.6B model, we train on a 2x16x16 configuration of TPUv5e, totaling 512 chips" 6ND = 6 FLOP / token / parameter * 2600000000 parameters * 2000000000000 tokens = 3.12e+22 FLOP
Size Notes: "the 2.6B on 2 trillion tokens"