Model Details

Domain:

Task:

Model Access:

API access

Citations:

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Jurassic-1 is a pair of auto-regressive language models recently released by AI21 Labs, consisting of J1-Jumbo, a 178B-parameter model, and J1-Large, a 7B-parameter model. We describe their architecture and training, and evaluate their performance relative to GPT-3. The evaluation is in terms of perplexity, as well as zero-shot and few-shot learning. To that end, we developed a zeroshot and few-shot test suite, which we made publicly available (https://github.com/ai21labs/ lm-evaluation) as a shared resource for the evaluation of mega language models.

Benchmarking

FLOPs3.7e+23

Notes: see here https://docs.google.com/document/d/1B8x6XYcmB1u6Tmq3VcbAtj5bzhDaj2TcIPyK6Wpupx4/edit 6 * 178B * 300B = 3.204000e+23

Training

HardwareNVIDIA A100

Size Notes: "Our model was trained with the conventional self-supervised auto-regressive training objective on 300B tokens drawn from publicly available resources" 1 token ~ 0.75 words

Parameters

Parameters178000000000

Notes: "Jurassic-1 models come in two sizes, where the Jumbo version, at 178B parameters, is the largest and most sophisticated language model ever released for general use by developers."