Notes: 6 * 1.6B * 2T = 19200000000000000000000
Size Notes: "model pre-trained on 2 trillion tokens of diverse multilingual and code datasets for two epochs."
Notes: Table under Model Architecture gives exact parameter count