>
FLOPs4.2e+22
Notes: "As shown in Table 3, nearly all of the training budget was spent on the base MPT-7B model, which took ~9.5 days to train on 440xA100-40GB GPUs, and cost ~$200k."
Training Code AccessibilityApache 2.0 "Our MPT model series is: Licensed for commercial use (unlike LLaMA). code here: https://github.com/mosaicml/llm-foundry/tree/main/scripts/train/yamls/pretrain
HardwareNVIDIA A100 SXM4 40 GB
Hardware Quantity440
Parameters7000000000