Introduction

The Nanbeige2-16B-Chat is the latest 16B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase. During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Direct Preference Optimization (DPO), culminating in the development of Nanbeige2-16B-Chat. Nanbeige2-16B-Chat has achieved superior performance across various authoritative benchmark datasets.

Benchmarking

FLOPs

4.05e+23

Notes: The model has 15.8B parameters and was trained on 4.5T tokens during the training phrase (https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat). It appears to be entirely transformer-based (https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat/blob/main/modeling_nanbeige.py). Assuming training was done for 1 epoch, the 6ND approximation yields Training compute = # of active parameters / forward pass * # of tokens * 6 FLOPS / token = 1.5e10 parameters * 4.5e12 tokens * 6 FLOPS / token = 4.05e23 FLOPS

Training

Training Code Accessibility

Apache 2.0 though you need to ask to use commercially "If you intend to use the Nanbeige Models or its derivatives for commercial purposes, please submit application materials to meet the requirements of the Nanbeige Models Community License Agreement by contacting nanbeige@126.com" https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat no training code here https://github.com/Nanbeige/Nanbeige/blob/main/README_EN.md

Size Notes: The model was trained on 4.5T tokens during the training phrase (https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat).

Introduction

Benchmarking

FLOPs

4.05e+23

Training

Training Code Accessibility

Size Notes: The model was trained on 4.5T tokens during the training phrase (https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat).

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Nanbeige2-16B-Chat - Use Model

Nanbeige2-16B-Chat - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Nanbeige2-16B-Chat - Use Model

Nanbeige2-16B-Chat - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors