>
In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.
FLOPs1.092e+23
Notes: 7b * 2.6t * 6 = 1.092e23 Also mentions 1,024 NVIDIA A800 GPUs at 180 TFLOPS per GPU
Dataset Size2600000000000
HardwareNVIDIA A800
Hardware Quantity1024
Parameters7000000000