Today, Ling-flash-2.0 is officially open-sourced! 🚀 Following the release of the language model Ling-mini-2.0 and the thinking model Ring-mini-2.0, we are now open-sourcing the third MoE LLM under the Ling 2.0 architecture: Ling-flash-2.0, a language model with 100B total parameters and 6.1B activated parameters (4.8B non-embedding). Trained on 20T+ tokens of high-quality data, together with supervised fine-tuning and multi-stage reinforcement learning, Ling-flash-2.0 achieves SOTA performance among dense models under 40B parameters, despite activating only ~6B parameters. Compared to MoE models with larger activation/total parameters, it also demonstrates strong competitiveness. Notably, it delivers outstanding performance in complex reasoning, code generation, and frontend development.
Notes: 6 FLOP/parameter/token * 6100000000 active parameters * 20000000000000 tokens = 7.32e+23 FLOP
Size Notes: Trained on 20T+ tokens of high-quality data, together with supervised fine-tuning and multi-stage reinforcement learning
Notes: 100B total parameters and 6.1B activated parameters (4.8B non-embedding).