Model Details

Domain:

Task:

Mathematical reasoning

Quantitative reasoning

Code generation

Translation

Model Access:

Open weights (unrestricted)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.5-72B-Instruct.

Benchmarking

FLOPs4.75e+24

Notes: 6 FLOP / parameter / token * 22*10^9 active parameters * 36000000000000 tokens = 4.752e+24 FLOP

Training

Training Code AccessibilityApache 2.0 https://huggingface.co/Qwen/Qwen3-235B-A22B

Size Notes: 36T

Parameters

Parameters235000000000

Notes: 235 billion total parameters and 22 billion activated parameters Number of Layers: 94 Number of Attention Heads (GQA): 64 for Q and 4 for KV Number of Experts: 128 Number of Activated Experts: 8 Context Length: 32,768 natively and 131,072 tokens with YaRN.

Related ModelsView all models