>
Notes: 6ND = 6 * 7B * 1.5T = 6.3e22 "Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license." Table 1 from https://arxiv.org/pdf/2311.16867 Falcon paper 730 petaflop-days * 1e15 * 24 * 3600 = 6.3072e+22 FLOPs
Size Notes: 1125000000000.0 words assuming 0.75 words per token (1.5T tokens) "Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license."
Notes: 7B