GigaChat-20B-A3B is the first open MoE model in Russia. In world practice, few people in the world manage to train MoE architectures with good quality. GigaChat is trained mainly in Russian, so unlike the Qwen and LLaMa tunes, it does not make grammar and punctuation errors and does not switch to other languages during a conversation In GigaChat-20B-A3B, cheap inference is combined with good metrics The model is great for research, for example, concentration
Notes: 6 FLOP / token / parameter * 3.3 * 10^9 active parameters * 5 * 10^12 tokens [speculatively] = 9.9e+22 FLOP
Size Notes: "GigaChat-20B-A3B обучался на триллионах токенов преимущественно русского текста" - "GigaChat-20B-A3B was trained on trillions of tokens of mostly Russian text" Speculatively assuming ~5T
Notes: 20B total parameters 3.3B active parameters