After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: - Pretrained and instruction-tuned models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B; - Having been trained on data in 27 additional languages besides English and Chinese; - State-of-the-art performance in a large number of benchmark evaluations; - Significantly improved performance in coding and mathematics; - Extended context length support up to 128K tokens with Qwen2-7B-Instruct and Qwen2-72B-Instruct. (Technical report to follow)
Notes: 72 billion params, 7 trillion tokens 6 * 72 billion * 7 trillion ~= 3.02e24
Size Notes: "All models were pre-trained on a high-quality, large-scale dataset comprising over 7 trillion tokens, covering a wide range of domains and languages."
Notes: 72.71B parameters in total, of which 70.21B are non-embedding parameters