>
FLOPs2.2e+23
Notes: "It took us 65 days to train the model on a pool of 800 A100 graphics cards and 1.7 TB of online texts, books, and countless other sources."
Training Code AccessibilityApache 2.0 for weights. training details, but no code: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6
HardwareNVIDIA A100
Hardware Quantity800
Size Notes: 1.7TB of data 300B tokens – from github https://github.com/yandex/YaLM-100B I've assumed that 1 token correspond to 1 word in russian language.
Parameters100000000000
Notes: 100B