Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
FLOPs3.65664e+23
Notes: https://bigscience.huggingface.co/blog/bloom Blog post says 117 days. 384 A100 GPUs * 314 TFLOPS throughput per GPU * 117 days * 0.3 (utilization assumption) = 3.65664e23 https://www.wolframalpha.com/input?i=384+*+314+TFLOPS+*+117+days+*+0.3
Training Code Accessibilityresponsible use restrictions: https://bigscience.huggingface.co/blog/the-bigscience-rail-license
Training DatasetBigScience ROOTS Corpus
Dataset Size379000000000
HardwareNVIDIA A100 SXM4 80 GB
Hardware Quantity384
Dataset Notes: In total, 1.6 terabytes of pre-processed text was converted into 350 billion unique tokens as BLOOM's training datasets. arXiv:2210.15424 "BLOOM was trained on the ROOTS corpus (Lauren¸con et al., 2022), a composite collection of 498 Hugging Face datasets (Lhoest et al., 2021) amounting to 1.61 terabytes of text that span 46 natural languages and 13 programming languages. A high-level overview of this dataset can be seen in Figure 3, while a detailed itemized list of every language along with its linguistic genus, family and macroarea is presented in Table 1
Size Notes: Table 3.5 https://arxiv.org/pdf/2211.05100 366B (pretrain) + 13B (finetune) = 379B tokens total
Parameters176247271424
Notes: See "Technical Specifications" on Hugging Face: https://huggingface.co/bigscience/bloom