>
Notes: trained using NVIDIA NeMo: https://blogs.nvidia.com/blog/nemo-amazon-titan/ 13,760 NVIDIA A100 chips (using 1,720 P4d nodes). It took 48 days to train. from https://importai.substack.com/p/import-ai-365-wmd-benchmark-amazon counting operations: 6*200000000000*4000000000000=4.8e+24 gpu usage: 312000000000000(FLOP/s)*0.3*13760*1152*3600=5.3413281792e+24
Size Notes: 4T tokens of data, based on comments from amazon engineer James Hamilton at a 2024 talk: https://perspectives.mvdirona.com/2024/01/cidr-2024/ Also cited here: https://lifearchitect.ai/titan/
Notes: 200B dense model https://importai.substack.com/p/import-ai-365-wmd-benchmark-amazon