>
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: 8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated; Significant performance improvement in human preference for chat models; Multilingual support of both base and chat models; Stable support of 32K context length for models of all sizes No need of trust_remote_code.
Notes: 6 FLOP / parameter / token * 14*10^9 parameters * 4*10^12 tokens = 3.36e+23 FLOP
Size Notes: 4 trillion tokens from this response https://github.com/QwenLM/Qwen2/issues/97
Notes: 14B