GLM-130B (ICLR 2023) is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the General Language Model (GLM) algorithm1. It is designed to support inference tasks with the 130B parameters on a single A100 (40G * 8) or V100 (32G * 8) server. As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English)
Notes: "96 NVIDIA A100 (40G * 8) servers for 2 months" 312 TFLOPS/GPU * 96 servers * 8 GPU/server * 2 months * 32.5% utilization = 4.037e23 utilization rate - citation from the paper: "we report hardware FLOPs utilization (HFU) of 43.3% and model FLOPs utilization (MFU) of 32.5% due to re-materialization." Aligns pretty well with 6ND: 6 * 400B * 130B = 3.12E23 Geometric mean: sqrt(4.037e23 * 3.12e23) = 3.549e23
Size Notes: 400B "We completed the 400B-token training and evaluation of GLM-130B in July, and subsequently released the model and pre-training details in August 2022. " from https://arxiv.org/pdf/2406.12793 "As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English)"
Notes: Dense model