This report presents Granite 3.0, a new set of lightweight, state-of-the-art, open foundation models ranging in scale from 400 million to 8 billion active parameters. Equipped with native support of multilingual, coding, function calling, and strong safety performance, these models target enterprise use cases, including on-premise and on-device settings. Evaluations on a comprehensive set of tasks demonstrate that our models consistently reach state-of-the-art performance for their size (as shown in Figure 1 and 2). This report also discloses technical details of pre-training and post-training that may help the research community accelerate the collective efforts to develop open foundation models. We publicly release pre-trained and post-trained versions of all our Granite 3.0 models under a standard permissive Apache 2.0 license allowing both research and commercial use. With support from the open source community, the Granite 3.0 models have been integrated with a range of existing tools for quantization, fine-tuning, and deployment.
Notes: 6ND = 6 FLOP / parameter / token * 8.1*10^9 parameters * 12*10^12 tokens = 5.832e+23 FLOP "All our Granite 3.0 models are trained using a compute budget of 8.35 × 10^23 FLOPS." 8.35 × 10^23 * 757.0 (model's power consumption) / (174.6+757.0+64.5+121.2) = 5.6573436e+23 hardware estimation: 832102 GPU-hours * 3600 sec / hour * 989500000000000 FLOP / GPU / sec * 0.3 [assumed utilization] = 8.8923412e+23
Size Notes: 12T tokens
Notes: 8.1B