On October 27, 2023, at the 2023 China Computer Conference (CNCC), Zhipu AI launched the fully self-developed third-generation large base model ChatGLM3 and related series of products.
Notes: Highly speculative. Assume 1 epoch on 1.4T tokens. 6 FLOP/token/param * 1.4T tokens * 6B params=50.4 * 10 ^(12+9) = 5.04*10^(22)
Size Notes: "ChatGLM-6B was pre-trained on approximately one trillion tokens of Chinese and English corpus" "By further realizing more diverse training datasets, more sufficient training steps, and more optimized training strategies, ChatGLM3-6B topped 42 benchmarks across semantics, mathematics, reasoning, code, and knowledge." The ChatGLM website states that the latest ChatGLM service is based on (and upgraded from) ChatGLM2, which was trained on 1.4T tokens. Assume that ChatGLM3 is trained on at least the same number of tokens. Sources: https://chatglm.cn/ https://github.com/THUDM/ChatGLM2-6B/blob/main/README_EN.md https://www.zhipuai.cn/en/news/76 here (https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf) they confirm the dataset size "Consequently, in Kolors, we utilize the open-source ChatGLM3-6B-Base as text encoder, which has been pre-trained with over 1.4 trillion bilingual tokens, resulting in a robust capability for Chinese language understanding."
Notes: 6B from https://arxiv.org/abs/2406.12793