Model Details

Domain:

Language

Task:

Chat

Code generation

Model Access:

Open weights (restricted use)

Introduction

Today, we are excited to introduce DBRX, an open, general-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established open LLMs. Moreover, it provides the open community and enterprises building their own LLMs with capabilities that were previously limited to closed model APIs; according to our measurements, it surpasses GPT-3.5, and it is competitive with Gemini 1.0 Pro. It is an especially capable code model, surpassing specialized models like CodeLLaMA-70B on programming, in addition to its strength as a general-purpose LLM.

Benchmarking

FLOPs

2.6e+24

Notes: Mixture of Experts (MoE) 36 billion active params * 12 trillion tokens * 6 ~= 2.6e24 https://www.wolframalpha.com/input?i=6+FLOP+*+36+billion+*+12+trillion also, it was trained on 3072 NVIDIA H100s, but with an unclear timeframe (end-end process was three months, including evals and red-teaming).

Training

Training Code Accessibility

license: https://www.databricks.com/legal/open-model-license conditions based on monthly users

Hardware

NVIDIA H100 SXM5 80GB

Size Notes: 12T tokens is equivalent to 9T words. Though it includes code data, so not very literally 9T words

Parameters

132000000000

Notes: 132B mixture of experts. 36B parameters active per inference

Authors

Mosaic Research Team

DBRX - Use Model

DBRX - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

DBRX - Use Model

DBRX - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors