>
DCLM-Baseline-7B is a 7 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. This model is designed to showcase the effectiveness of systematic data curation techniques for improving language model performance.
FLOPs1.05e+23
Notes: 6 FLOP / parameter / token * 7 * 10^9 parameters * 2.5 * 10^12 tokens = 1.05e+23 FLOP
Training Code AccessibilityApple Sample Code license (no patent rights, copyright-only) https://huggingface.co/apple/DCLM-7B
HardwareNVIDIA H100 SXM5 80GB
Size Notes: Total Training Tokens: 2.5T
Parameters7000000000
Notes: 7B