>
Notes: "It took us 65 days to train the model on a pool of 800 A100 graphics cards and 1.7 TB of online texts, books, and countless other sources."
Size Notes: 1.7TB of data 300B tokens – from github https://github.com/yandex/YaLM-100B I've assumed that 1 token correspond to 1 word in russian language.
Notes: 100B