>
DCLM-Baseline-7B is a 7 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
Notes: 6 FLOP / parameter / token * 7 * 10^9 parameters * 2.5 * 10^12 tokens = 1.05e+23 FLOP
Size Notes: Total Training Tokens: 2.5T
Notes: 7B