The Yi series models are large language models trained from scratch by developers at 01.AI.
FLOPs6.1e+23
Notes: "The dataset we use contains Chinese & English only. We used approximately 3T tokens" sounds like this means it was trained on 3T tokens, not necessarily that the dataset contains 3T tokens? If so, 34b * 3T * 6 = 6.1e23
Training Code Accessibilityapply for commercial license: no training code https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt the model https://huggingface.co/01-ai/Yi-34B-Chat Apache 2.0 "If you create derivative works based on this model, please include the following attribution in your derivative works: ...."
HardwareNVIDIA A100
Hardware Quantity128
Size Notes: "language models pretrained from scratch on 3.1T highly-engineered large amount of data, and finetuned on a small but meticulously polished alignment data."
Parameters34000000000
Notes: 34b