Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

Introduction

Benchmarking

FLOPs

4.2e+22

Notes: "As shown in Table 3, nearly all of the training budget was spent on the base MPT-7B model, which took ~9.5 days to train on 440xA100-40GB GPUs, and cost ~$200k."

Training

Training Code Accessibility

Apache 2.0 "Our MPT model series is: Licensed for commercial use (unlike LLaMA). code here: https://github.com/mosaicml/llm-foundry/tree/main/scripts/train/yamls/pretrain

Hardware

NVIDIA A100 SXM4 40 GB

Hardware Quantity

440

Parameters

7000000000

Authors

MosaicML NLP Team

Related Models

MPT-30B

By MosaicML

Language

MPT-7B - Use Model

MPT-7B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

MPT-30B

MPT-7B - Use Model

MPT-7B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

MPT-30B