>
The Yi series models are large language models trained from scratch by developers at 01.AI.
Notes: "The dataset we use contains Chinese & English only. We used approximately 3T tokens" sounds like this means it was trained on 3T tokens, not necessarily that the dataset contains 3T tokens? If so, 34b * 3T * 6 = 6.1e23
Size Notes: "language models pretrained from scratch on 3.1T highly-engineered large amount of data, and finetuned on a small but meticulously polished alignment data."
Notes: 34b