>
After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: - Pretrained and instruction-tuned models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B; - Having been trained on data in 27 additional languages besides English and Chinese; - State-of-the-art performance in a large number of benchmark evaluations; - Significantly improved performance in coding and mathematics; - Extended context length support up to 128K tokens with Qwen2-7B-Instruct and Qwen2-72B-Instruct. (Technical report to follow)
Notes: 6 FLOP/parameter/token * 500000000 parameters * 12000000000000 tokens = 3.6e+22 FLOP
Size Notes: Table 1