This report presents Granite 3.0, a new set of lightweight, state-of-the-art, open foundation models ranging in scale from 400 million to 8 billion active parameters. Equipped with native support of multilingual, coding, function calling, and strong safety performance, these models target enterprise use cases, including on-premise and on-device settings. Evaluations on a comprehensive set of tasks demonstrate that our models consistently reach state-of-the-art performance for their size (as shown in Figure 1 and 2). This report also discloses technical details of pre-training and post-training that may help the research community accelerate the collective efforts to develop open foundation models. We publicly release pre-trained and post-trained versions of all our Granite 3.0 models under a standard permissive Apache 2.0 license allowing both research and commercial use. With support from the open source community, the Granite 3.0 models have been integrated with a range of existing tools for quantization, fine-tuning, and deployment.
Notes: 6ND = 6 FLOP / token / parameter * 2.5*10^9 parameters * 12*10^12 tokens = 1.8e+23 FLOP ""All our Granite 3.0 models are trained using a compute budget of 8.35 × 10^23 FLOPS." 8.35 × 10^23 * 174.6 (model's power consumption) / (174.6+757.0+64.5+121.2) =1.304851e+23 hardware estimation: 192030 GPU-hours * 3600 sec / hour *989500000000000 FLOP / GPU / sec * 0.3 [assumed utilization] = 2.0521478e+23 FLOP
Size Notes: 12T tokens
Notes: 2.5B