GigaChat-20B-A3B is the first open MoE model in Russia. In world practice, few people in the world manage to train MoE architectures with good quality. GigaChat is trained mainly in Russian, so unlike the Qwen and LLaMa tunes, it does not make grammar and punctuation errors and does not switch to other languages during a conversation In GigaChat-20B-A3B, cheap inference is combined with good metrics The model is great for research, for example, concentration
FLOPs9.9e+22
Notes: 6 FLOP / token / parameter * 3.3 * 10^9 active parameters * 5 * 10^12 tokens [speculatively] = 9.9e+22 FLOP
Training Code AccessibilityMIT license https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct-v1.5
Size Notes: "GigaChat-20B-A3B обучался на триллионах токенов преимущественно русского текста" - "GigaChat-20B-A3B was trained on trillions of tokens of mostly Russian text" Speculatively assuming ~5T
Parameters20000000000
Notes: 20B total parameters 3.3B active parameters