>
Notes: "The training run for gpt-oss-120b required 2.1 million H100-hours to complete, with gpt-oss-20b needing almost 10x fewer" assuming "almost 10x fewer" means ~9x fewer: 4.94e24/9 = 5.49e23
Size Notes: (pretraining FLOPs)/(6*3.6B active parameters)
Notes: Total parameters: 20.91B