In recent months, our focus has been on developing a “good” model while optimizing the developer experience. As we progress towards Qwen1.5, the next iteration in our Qwen series, this update arrives just before the Chinese New Year. With Qwen1.5, we are open-sourcing base and chat models across six sizes: 0.5B, 1.8B, 4B, 7B, 14B, and 72B. In line with tradition, we’re also providing quantized models, including Int4 and Int8 GPTQ models, as well as AWQ and GGUF quantized models. To enhance the developer experience, we’ve merged Qwen1.5’s code into Hugging Face transformers, making it accessible with transformers>=4.37.0 without needing trust_remote_code.
Notes: 3T training tokens: https://github.com/QwenLM/Qwen2/issues/97 6 * 72 billion * 3 trillion = ~1.3e24
Size Notes: 3 trillion tokens from this response https://github.com/QwenLM/Qwen2/issues/97
Notes: 72B