Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

GigaChat-20B-A3B is the first open MoE model in Russia. In world practice, few people in the world manage to train MoE architectures with good quality. GigaChat is trained mainly in Russian, so unlike the Qwen and LLaMa tunes, it does not make grammar and punctuation errors and does not switch to other languages during a conversation In GigaChat-20B-A3B, cheap inference is combined with good metrics The model is great for research, for example, concentration

Benchmarking

FLOPs9.9e+22

Notes: 6 FLOP / token / parameter * 3.3 * 10^9 active parameters * 5 * 10^12 tokens [speculatively] = 9.9e+22 FLOP

Training

Training Code AccessibilityMIT license https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct-v1.5

Size Notes: "GigaChat-20B-A3B обучался на триллионах токенов преимущественно русского текста" - "GigaChat-20B-A3B was trained on trillions of tokens of mostly Russian text" Speculatively assuming ~5T