>
FLOPs2.4e+23
Notes: C = 6ND = 6 * 40B * 1000B = 2.4e+23 FLOP (assuming one epoch) Table 1 from https://arxiv.org/pdf/2311.16867 Falcon paper 2,800 petaflop-days * 1e15 * 24 * 3600 = 2.4192e+23 FLOPs
Training Code Accessibilityapache 2.0
Training DatasetRefinedWeb
Dataset Size1000000000000
HardwareNVIDIA A100
Hardware Quantity384
Dataset Notes: Falcon-40B was trained on 1,000B tokens of RefinedWeb, a high-quality filtered and deduplicated web dataset which we enhanced with curated corpora. Significant components from our curated copora were inspired by The Pile (Gao et al., 2020).
Size Notes: 1000B tokens ~= 750B words
Parameters40000000000
Notes: Model comes in 7B and 40B variants.