>
FLOPs1.79e+24
Notes: V2.5 is a merge of V2-coder and V2-chat V2-coder is trained for 6T additional tokens from an intermediate checkpoint of V2, which had been trained for 4.2T tokens. Total: 10.2T V2-chat is fine-tuned from V2, saw 8.2T tokens in pre-training Unique steps: 8.2T + 6T = 14.2T FLOPs: 6 * 21B * 14.2T = 1.7892e24
Size Notes: The original V2 had a dataset of 8.1T unique tokens, and coder-V2 added an additional 1.391T unique tokens of code and math. But it appears no additional training was done to combine them into this model.
Parameters236000000000
Notes: 21B active params, 236B total