>
"We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community."
Notes: 1T tokens * 6.7B parameters * 6 FLOP/token/parameter = 4e22 FLOP
Size Notes: 1 trillion tokens * 0.75 words/token = 750 billion words
Notes: 6.7B parameters, per table 2: https://arxiv.org/pdf/2302.13971.pdf