Today, we are excited to announce the open-sourcing of Ling 2.0 — a family of MoE-based large language models that combine SOTA performance with high efficiency. The first released version, Ling-mini-2.0, is compact yet powerful. It has 16B total parameters, but only 1.4B are activated per input token (non-embedding 789M). Trained on more than 20T tokens of high-quality data and enhanced through multi-stage supervised fine-tuning and reinforcement learning, Ling-mini-2.0 achieves remarkable improvements in complex reasoning and instruction following. With just 1.4B activated parameters, it still reaches the top-tier level of sub-10B dense LLMs and even matches or surpasses much larger MoE models.
Notes: 6 FLOP/parameter/token * 1400000000 active parameters * 20000000000000 tokens = 1.68e+23 FLOP
Size Notes: Trained on more than 20T tokens of high-quality data and enhanced through multi-stage supervised fine-tuning and reinforcement learning
Notes: It has 16B total parameters, but only 1.4B are activated per input token (non-embedding 789M).