Introducing the latest additions to our Stable LM 2 language model series: a 12 billion parameter base model and an instruction-tuned variant, trained on 2 trillion tokens in seven languages: English, Spanish, German, Italian, French, Portuguese, and Dutch. This medium-sized model balances strong performance, efficiency, memory requirements, and speed, following our established Stable LM 2 1.6B framework as detailed in our previously released technical report. With this release, we’re extending our model range, offering a transparent and powerful tool for developers to innovate in AI language technology. Soon, we plan to introduce a long-context variant of these models which will be available on Hugging Face upon release. From Hugging Face: Stable LM 2 12B is a 12.1 billion parameter decoder-only language model pre-trained on 2 trillion tokens of diverse multilingual and code datasets for two epochs.
FLOPs2.91e+23
Notes: 2* 12143605760 params * 3* 2T tokens * 2 epochs = 2.91e23. Trained on 384 H100s (AWS P5 instances).
Training Code AccessibilityRequires Stability AI Membership. Free for non-commercial use, $20/month for commercial use if less than $1M in annual revenue, $1M in institutional funding, and 1M monthly active users. Apache 2.0 license for repo, which includes detailed hyperparams and training details: https://github.com/Stability-AI/StableLM/blob/main/LICENSE
Training DatasetRefinedWeb,RedPajama-Data,The Pile,StarCoder,CulturaX
Dataset Size2000000000000
HardwareNVIDIA H100 SXM5 80GB
Dataset Notes: The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: Falcon RefinedWeb extract (Penedo et al., 2023), RedPajama-Data (Together Computer., 2023) and The Pile (Gao et al., 2020) both without the Books3 subset, and StarCoder (Li et al., 2023). We further supplement our training with multi-lingual data from CulturaX (Nguyen et al., 2023) and, in particular, from its OSCAR corpora, as well as restructured data in the style of Yuan & Liu (2022).
Size Notes: 2T tokens
Parameters12143605760
Notes: Precise number given in HF model card