Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

Introduction

Benchmarking

FLOPs

1.07e+23

Notes: "Model was trained using Deepspeed and Megatron libraries, on 300B tokens dataset for 3 epochs, around 45 days on 512 V100. After that model was finetuned 1 epoch with sequence length 2048 around 20 days on 200 GPU A100 on additional data" 512 GPUs * 125000000000000 FLOPs/s [peak] * 45 days * 24 hours * 3600 s * 0.3 + 200 GPUs * 312000000000000 FLOPs/s [peak for fp16] * 20 days * 24 hours * 3600 s * 0.3 = 1.0699776e+23 they probably used fp16 as in their similar project: https://habr.com/ru/companies/sberdevices/articles/780334/ 6ND = 6*13B*300B*3 = 70200*10^18 = 7*10^24

Training

Training Code Accessibility

MIT license https://huggingface.co/ai-forever/ruGPT-3.5-13B/discussions

Hardware

NVIDIA A100,NVIDIA Tesla V100 SXM2

Hardware Quantity

512

Parameters

13000000000

Notes: 13B

Related Models

ruGPT-3.5 13B - Use Model

ruGPT-3.5 13B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Related Models

Kandinsky 5.0 Video Lite

GigaChat Lite GigaChat-20B-A3B

FRED-T5-XL

ruDalle Kandinsky 3.0

ruGPT-3.5 13B - Use Model

ruGPT-3.5 13B - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Related Models

Kandinsky 5.0 Video Lite

GigaChat Lite GigaChat-20B-A3B

FRED-T5-XL

ruDalle Kandinsky 3.0