>
FLOPs5.49e+23
Notes: "The training run for gpt-oss-120b required 2.1 million H100-hours to complete, with gpt-oss-20b needing almost 10x fewer" assuming "almost 10x fewer" means ~9x fewer: 4.94e24/9 = 5.49e23
HardwareNVIDIA H100 SXM5 80GB
Size Notes: (pretraining FLOPs)/(6*3.6B active parameters)
Parameters20910000000
Notes: Total parameters: 20.91B