Notes: "As shown in Table 3, nearly all of the training budget was spent on the base MPT-7B model, which took ~9.5 days to train on 440xA100-40GB GPUs, and cost ~$200k."
Training
Training Code Accessibility
Apache 2.0
"Our MPT model series is:
Licensed for commercial use (unlike LLaMA).
code here: https://github.com/mosaicml/llm-foundry/tree/main/scripts/train/yamls/pretrain