Model Details

Domain:

Task:

Visual question answering

Translation

Language modeling

Language generation

Quantitative reasoning

Question answering

Model Access:

Unreleased

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

We’re sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences. ... Llama 4 Behemoth, a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs. Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight. Download the Llama 4 Scout and Llama 4 Maverick models today on llama.com and Hugging Face. Try Meta AI built with Llama 4 in WhatsApp, Messenger, Instagram Direct, and on the web.

Benchmarking

FLOPs5.18e+25

Notes: Behemoth's training dataset is at least 30T tokens: https://ai.meta.com/blog/llama-4-multimodal-intelligence/ 6 FLOP / parameter / token * 288 * 10^9 activated parameters * 30 * 10^12 tokens = 5.184e+25 FLOP

Training

Training Code Accessibility"While we’re not yet releasing Llama 4 Behemoth as it is still training"

HardwareNVIDIA H100 SXM5 80GB

Hardware Quantity32000

Size Notes: "The overall data mixture for training consisted of more than 30 trillion tokens, which is more than double the Llama 3 pre-training mixture and includes diverse text, image, and video datasets."