Jurassic-1 is a pair of auto-regressive language models recently released by AI21 Labs, consisting of J1-Jumbo, a 178B-parameter model, and J1-Large, a 7B-parameter model. We describe their architecture and training, and evaluate their performance relative to GPT-3. The evaluation is in terms of perplexity, as well as zero-shot and few-shot learning. To that end, we developed a zeroshot and few-shot test suite, which we made publicly available (https://github.com/ai21labs/ lm-evaluation) as a shared resource for the evaluation of mega language models.
Notes: see here https://docs.google.com/document/d/1B8x6XYcmB1u6Tmq3VcbAtj5bzhDaj2TcIPyK6Wpupx4/edit 6 * 178B * 300B = 3.204000e+23
Size Notes: "Our model was trained with the conventional self-supervised auto-regressive training objective on 300B tokens drawn from publicly available resources" 1 token ~ 0.75 words
Notes: "Jurassic-1 models come in two sizes, where the Jumbo version, at 178B parameters, is the largest and most sophisticated language model ever released for general use by developers."