The Qwen1.5-110B is the largest model in the Qwen1.5 series, and it is also the first one with over 100 billion parameters in the series. It demonstrates competitive performance against the very recently released SOTA model Llama-3-70B and it is significantly better than the 72B model. This tells us that there is still a lot of room in model size scaling for better performance. While the releease of Llama-3 indicates the significance of data scaling to an extremely large scale, we believe we can get the best of both worlds by scaling both data and model size in our future release. Stay tuned for Qwen2!
Notes: lower bound is taken from Qwen1.5 72B training compute estimation
Size Notes: A Qwen developer gave token counts for other models in the series at this github issue: https://github.com/QwenLM/Qwen2/issues/97 110B was asked but got no response. 7B, 14B, and 72B got 4T, 4T, and 3T tokens respectively. In another issue from Qwen2: "We are not authorized to share the details right now but the rough number is over 3T tokens for Qwen1.5 and over 7T tokens for Qwen2." https://github.com/QwenLM/Qwen2/issues/562
Notes: 110B