Today, we are excited to introduce DBRX, an open, general-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established open LLMs. Moreover, it provides the open community and enterprises building their own LLMs with capabilities that were previously limited to closed model APIs; according to our measurements, it surpasses GPT-3.5, and it is competitive with Gemini 1.0 Pro. It is an especially capable code model, surpassing specialized models like CodeLLaMA-70B on programming, in addition to its strength as a general-purpose LLM.
Notes: Mixture of Experts (MoE) 36 billion active params * 12 trillion tokens * 6 ~= 2.6e24 https://www.wolframalpha.com/input?i=6+FLOP+*+36+billion+*+12+trillion also, it was trained on 3072 NVIDIA H100s, but with an unclear timeframe (end-end process was three months, including evals and red-teaming).
Size Notes: 12T tokens is equivalent to 9T words. Though it includes code data, so not very literally 9T words
Notes: 132B mixture of experts. 36B parameters active per inference