>
We introduce Tri-21B, our flagship large language model that redefines the efficiency frontier in LLM training. By achieving state-of-the-art performance with only 2.3T training tokens, we demonstrate that exceptional capabilities don't require excessive computational resources.
Notes: 2.95E+23 FLOPs (reported) 6 FLOP/parameter/token * 20730000000 parameters * 2300000000000 tokens = 2.86074e+23 FLOP
Size Notes: 2.3T training tokens
Notes: 20.73B