>
Notes: "The training run for gpt-oss-120b required 2.1 million H100-hours to complete" (2.1e6 hours)*(1,979 H100 FLOP/s)*(30% utilization)*(60*60) = 4.49e24 They also do post training similar to o3, which we assume adds at least 10% as much compute, so we multiply this estimate by 1.1 to get 4.94e24
Size Notes: (pretraining FLOPs)/(6*5.1B active parameters)
Notes: Total parameters: 116.83B