X

>

Ant Group | Ling-flash-base-2.0-20T , Capabilities, Benchmarks and Use Cases, 2026

L

Model Details

Domain:

Task:

Language modeling

Language generation

Question answering

Model Access:

Open weights (unrestricted)

Introduction

Today, Ling-flash-2.0 is officially open-sourced! 🚀 Following the release of the language model Ling-mini-2.0 and the thinking model Ring-mini-2.0, we are now open-sourcing the third MoE LLM under the Ling 2.0 architecture: Ling-flash-2.0, a language model with 100B total parameters and 6.1B activated parameters (4.8B non-embedding). Trained on 20T+ tokens of high-quality data, together with supervised fine-tuning and multi-stage reinforcement learning, Ling-flash-2.0 achieves SOTA performance among dense models under 40B parameters, despite activating only ~6B parameters. Compared to MoE models with larger activation/total parameters, it also demonstrates strong competitiveness. Notably, it delivers outstanding performance in complex reasoning, code generation, and frontend development.

Benchmarking

FLOPs

7.32e+23

Notes: 6 FLOP/parameter/token * 6100000000 active parameters * 20000000000000 tokens = 7.32e+23 FLOP

Training

Training Code Accessibility

MIT license https://huggingface.co/inclusionAI/Ling-flash-base-2.0

Size Notes: Trained on 20T+ tokens of high-quality data, together with supervised fine-tuning and multi-stage reinforcement learning

Parameters

Parameters

100000000000

Notes: 100B total parameters and 6.1B activated parameters (4.8B non-embedding).

Authors

Today, Ling-flash-2.0 is officially open-sourced! 🚀 Following the release of the language model Ling-mini-2.0 and the thinking model Ring-mini-2.0, we are now open-sourcing the third MoE LLM under the Ling 2.0 architecture: Ling-flash-2.0, a language model with 100B total parameters and 6.1B activated parameters (4.8B non-embedding). Trained on 20T+ tokens of high-quality data, together with supervised fine-tuning and multi-stage reinforcement learning, Ling-flash-2.0 achieves SOTA performance among dense models under 40B parameters, despite activating only ~6B parameters. Compared to MoE models with larger activation/total parameters, it also demonstrates strong competitiveness. Notably, it delivers outstanding performance in complex reasoning, code generation, and frontend development.

Related Models

Ling-1T

Ling-mini-base-2.0-20T

Ling-Plus Bailing

Ling-lite-1.5 Bailing