Model Details

Domain:

Language

Task:

Language modeling

Code generation

Model Access:

Open weights (non-commercial)

Citations:

16922

Introduction

"We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community."

Benchmarking

FLOPs

4e+22

Notes: 1T tokens * 6.7B parameters * 6 FLOP/token/parameter = 4e22 FLOP

Training

Training Code Accessibility

"we are releasing our model under a noncommercial license focused on research use cases" https://ai.meta.com/blog/large-language-model-llama-meta-ai/

Hardware

NVIDIA A100

Size Notes: 1 trillion tokens * 0.75 words/token = 750 billion words

Parameters

6700000000

Notes: 6.7B parameters, per table 2: https://arxiv.org/pdf/2302.13971.pdf

Authors

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample

Related Models

LLaMA-7B - Use Model

LLaMA-7B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

Llama 4 Behemoth preview

Llama 4 Maverick

Llama 4 Scout

Llama 3.3 70B

LLaMA-7B - Use Model

LLaMA-7B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

Llama 4 Behemoth preview

Llama 4 Maverick

Llama 4 Scout

Llama 3.3 70B