Description Pleias-nano-1.2b-Preview is a transformer base model, entirely pretrained from scratch, using an architecture similar to Llama/GPT-Neox for easier deployment/inference. It includes the following features, that would apply to any responsibly trained variant: Only trained on open data under a permissible license and in compliance with the European AI Act. By design, all Pleias model are unable to output copyrighted content. Extensive multilingual support for main European languages. A new tokenizer designed for enhanced document processing tasks and better multilingual support. Extremely low level of toxicity and problematic content. Pleias-nano-1.2b-Preview has demonstrated unusual abilities for multilingual generation in its size range. Fully supported languages include English, French, Spanish, German, Italian, Dutch, Latin and Portuguese. Given its size, Pleias-nano-1.2b-Preview can run on CPU without any compression loss. We provide a first GGUF variant as part of our release.
Notes: 6 FLOP / parameter / token * 1.2 * 10^9 parameters * 5 * 10^12 tokens = 3.6e+22 FLOP 989400000000000 FLOP / GPU / sec [bf16 assumed] * 192 GPUs * 5 days * 24 hour / day * 3600 sec / hour * 0.3 [assumed utilization] = 2.4619438e+22 FLOP sqrt(3.6e+22*2.4619438e+22) = 2.9770787e+22
Size Notes: "Training schedule includes 518,000 steps (batch size 1,024) on over three epochs (nearly 5 trillions tokens):"
Notes: 1.2B