Model Details

Domain:

Task:

Model Access:

Open weights (restricted use)

Introduction

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Model developer: Meta Model Architecture: Llama 3.3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Benchmarking

FLOPs

6.86e+24

Notes: 6ND = 6 FLOP / parameter / token * 70*10^9 parameters * 15*10^12 tokens = 6.3e+24 FLOP 7000000 GPU-hours * 3600 sec / hour * 989500000000000 FLOP / second * 0.3 [assumed utilization]= 7.48062e+24 FLOP sqrt(7.48062e+24*6.3e+24) = 6.8649768e+24

Training

Training Code Accessibility

License A custom commercial license, the Llama 3.3 Community License Agreement, is available at: https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/LICENSE "Llama 3.3 is intended for commercial and research use in multiple languages."

Hardware

NVIDIA H100 SXM5 80GB

Size Notes: "Overview: Llama 3.3 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. Data Freshness: The pretraining data has a cutoff of December 2023."