>
Notes: 6ND = 6*2506434560.00 parameters * 3*10^12 training tokens = 4.5115822e+22 (assuming 1 epoch)
Size Notes: "Gemma 2B and 7B are trained on 3T and 6T tokens respectively of primarily-English data from web documents, mathematics, and code." Not explicitly stated that this doesn't involve multiple epochs, but I expect it does not.
Notes: Table 2, sum of embedding and non-embedding parameters: 2B 524,550,144 + 1,981,884,416 = 2506434560