Model Details

Domain:

Language

Task:

Language modeling

Code generation

Model Access:

Open weights (non-commercial)

Citations:

16922

Introduction

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

Benchmarking

FLOPs

5.5e+23

Notes: 1.4e12 tokens * 6.52e10 parameters * 6 FLOP/token/parameter = 5.5e23 FLOP

Training

Training Code Accessibility

"we are releasing our model under a noncommercial license focused on research use cases" https://ai.meta.com/blog/large-language-model-llama-meta-ai/

Hardware

NVIDIA A100

Hardware Quantity

2048

Size Notes: Table 1 indicates that 1.4T tokens involved sampling sub-datasets at more or less than one epoch. Correcting for this: (1.1 epoch * 3.3TB) + (1.06 epoch * 0.783TB) + ... = 1.4T tokens 5.24 epoch-TBs = 1.4T tokens 5.24 epoch-TB * 1000 GB/TB * 200M token/GB = 1.4T tokens 1.05T epoch*token = 1.4T tokens 1 epoch = 1.34T tokens

Parameters

65200000000

Notes: Model card, table 1: https://github.com/facebookresearch/llama/blob/53011c3d7946dadb8274a4c5c7586ab54edf792d/MODEL_CARD.md

Authors

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample