Model Details

Domain:

Task:

Quantitative reasoning

Code generation

Translation

Model Access:

Open weights (unrestricted)

Introduction

The Qwen1.5-110B is the largest model in the Qwen1.5 series, and it is also the first one with over 100 billion parameters in the series. It demonstrates competitive performance against the very recently released SOTA model Llama-3-70B and it is significantly better than the 72B model. This tells us that there is still a lot of room in model size scaling for better performance. While the releease of Llama-3 indicates the significance of data scaling to an extremely large scale, we believe we can get the best of both worlds by scaling both data and model size in our future release. Stay tuned for Qwen2!

Benchmarking

Notes: lower bound is taken from Qwen1.5 72B training compute estimation

Training

Training Code Accessibility

https://huggingface.co/Qwen/Qwen1.5-110B

Size Notes: A Qwen developer gave token counts for other models in the series at this github issue: https://github.com/QwenLM/Qwen2/issues/97 110B was asked but got no response. 7B, 14B, and 72B got 4T, 4T, and 3T tokens respectively. In another issue from Qwen2: "We are not authorized to share the details right now but the rough number is over 3T tokens for Qwen1.5 and over 7T tokens for Qwen2." https://github.com/QwenLM/Qwen2/issues/562

Parameters

110000000000

Notes: 110B

Authors

Qwen Team

Related Models

Qwen1.5-110B - Use Model

Qwen1.5-110B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

Qwen3-Omni-30B-A3B

Qwen3-Next-80B-A3B

Wan 2.2 14B I2V

Wan 2.2 14B T2V

Qwen1.5-110B - Use Model

Qwen1.5-110B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

Qwen3-Omni-30B-A3B

Qwen3-Next-80B-A3B

Wan 2.2 14B I2V

Wan 2.2 14B T2V