Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

Introduction

Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on 11.2T high-quality tokens and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints at every one trillion tokens, providing valuable insights into the learning dynamics of large language models.

Benchmarking

FLOPs

1.22e+24

Notes: 6 FLOP/parameter/token * 14000000000 active parameters * 11328000000000 tokens = 9.51552e+23 FLOP 989000000000000 FLOP/GPU/sec [H800 assumed] * 1456000 GPU-hours * 3600 sec / hour * 0.3 [assumed utilization] = 1.55518272e+24 FLOP sqrt(9.51552e+23*1.55518272e+24) = 1.2164856e+24

Training

Training Code Accessibility

MIT license https://huggingface.co/rednote-hilab/dots.llm1.inst

Hardware

NVIDIA H800 SXM5

Size Notes: pre-training: "11.2T high-quality tokens" long context: 128B tokens 11.328T tokens total

Parameters

142000000000

Notes: a large-scale MoE model that activates 14 billion parameters out of a total of 142 billion parameters

Authors

Bi Huo, Bin Tu, Cheng Qin, Da Zheng, Debing Zhang, Dongjie Zhang, En Li, Fu Guo, Jian Yao, Jie Lou, Junfeng Tian, Li Hu, Ran Zhu, Shengdong Chen, Shuo Liu, Su Guang, Te Wo, Weijun Zhang, Xiaoming Shi, Xinxin Peng, Xing Wu, Yawen Liu, Yuqiu Ji, Ze Wen, Zhenhai Liu, Zichao Li, Zilong Liao

Rednote | dots.llm1 , Capabilities, Benchmarks and Use Cases, 2026

Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

Introduction

Benchmarking

FLOPs

1.22e+24

Training

Training Code Accessibility

MIT license https://huggingface.co/rednote-hilab/dots.llm1.inst

Hardware

NVIDIA H800 SXM5

Size Notes: pre-training: "11.2T high-quality tokens" long context: 128B tokens 11.328T tokens total

Parameters

142000000000

Notes: a large-scale MoE model that activates 14 billion parameters out of a total of 142 billion parameters

dots.llm1 - Use Model

dots.llm1 - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

dots.llm1 - Use Model

dots.llm1 - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors