Model Description GPT-Neo 2.7B is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. GPT-Neo refers to the class of models, while 2.7B represents the number of parameters of this particular pre-trained model. Training data GPT-Neo 2.7B was trained on the Pile, a large scale curated dataset created by EleutherAI for the purpose of training this model. Training procedure This model was trained for 420 billion tokens over 400,000 steps. It was trained as a masked autoregressive language model, using cross-entropy loss.
Notes: source: https://www.aitracker.org/ 6 FLOP / token / parameter * 2.7 * 10^9 parameters * 420000000000 tokens [see dataset size notes] = 6.804e+21 FLOP
Size Notes: "In aggregate, the Pile consists of over 825GiB of raw text data" (see GPT-NeoX) "This model was trained for 420 billion tokens over 400,000 steps. It was trained as a masked autoregressive language model, using cross-entropy loss." https://huggingface.co/EleutherAI/gpt-neo-2.7B
Notes: source: https://www.eleuther.ai/projects/gpt-neo/ Note: Directory of LLMs (https://docs.google.com/spreadsheets/d/1gc6yse74XCwBx028HV_cvdxwXkmXejVjkO-Mz2uwE0k/edit#gid=0) gives a somewhat lower estimate (2e9)