We’re sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences. ... Llama 4 Behemoth, a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs. Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight. Download the Llama 4 Scout and Llama 4 Maverick models today on llama.com and Hugging Face. Try Meta AI built with Llama 4 in WhatsApp, Messenger, Instagram Direct, and on the web.
Notes: Behemoth's training dataset is at least 30T tokens: https://ai.meta.com/blog/llama-4-multimodal-intelligence/ 6 FLOP / parameter / token * 288 * 10^9 activated parameters * 30 * 10^12 tokens = 5.184e+25 FLOP
Size Notes: "The overall data mixture for training consisted of more than 30 trillion tokens, which is more than double the Llama 3 pre-training mixture and includes diverse text, image, and video datasets."
Notes: "Llama 4 Behemoth, a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs."