Model Details

Domain:

Language

Task:

Language generation

Code generation

Model Access:

Open weights (unrestricted)

Introduction

MPT-30B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by MosaicML. MPT-30B is part of the family of Mosaic Pretrained Transformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. MPT-30B comes with special features that differentiate it from other LLMs, including an 8k token context window (which can be further extended via finetuning; see MPT-7B-StoryWriter), support for context-length extrapolation via ALiBi, and efficient inference + training via FlashAttention. It also has strong coding abilities thanks to its pretraining mix. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA's FasterTransformer. The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU—either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision.

Benchmarking

FLOPs

1.89e+23

Notes: According to their blog post, "MPT-30B FLOPs ~= 6 * 30e9 [params] * 1.05e12 [tokens] = 1.89e23 FLOPs"

Training

Training Code Accessibility

apache 2.0 for weights. pretrain code here: https://github.com/mosaicml/llm-foundry/tree/main/scripts/train/yamls/pretrain

Hardware

NVIDIA H100 SXM5 80GB

Hardware Quantity

512

Size Notes: ~4T tokens across sources, but only trained on 1.05T of these

Parameters

30000000000

Notes: 30B

Related Models

MPT-7B

By MosaicML

Language

MosaicML | MPT-30B , Capabilities, Benchmarks and Use Cases, 2026

MPT-30B - Use Model

MPT-30B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Related Models

MPT-7B

MPT-30B - Use Model

MPT-30B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Related Models

MPT-7B