The Nanbeige2-16B-Chat is the latest 16B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase. During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Direct Preference Optimization (DPO), culminating in the development of Nanbeige2-16B-Chat. Nanbeige2-16B-Chat has achieved superior performance across various authoritative benchmark datasets.
Notes: The model has 15.8B parameters and was trained on 4.5T tokens during the training phrase (https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat). It appears to be entirely transformer-based (https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat/blob/main/modeling_nanbeige.py). Assuming training was done for 1 epoch, the 6ND approximation yields Training compute = # of active parameters / forward pass * # of tokens * 6 FLOPS / token = 1.5e10 parameters * 4.5e12 tokens * 6 FLOPS / token = 4.05e23 FLOPS
Size Notes: The model was trained on 4.5T tokens during the training phrase (https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat).
Notes: https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat