"We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community."
FLOPs4e+22
Notes: 1T tokens * 6.7B parameters * 6 FLOP/token/parameter = 4e22 FLOP
Training Code Accessibility"we are releasing our model under a noncommercial license focused on research use cases" https://ai.meta.com/blog/large-language-model-llama-meta-ai/
HardwareNVIDIA A100
Size Notes: 1 trillion tokens * 0.75 words/token = 750 billion words
Parameters6700000000
Notes: 6.7B parameters, per table 2: https://arxiv.org/pdf/2302.13971.pdf