Model Details

Domain:

Language

Task:

Document classification

Named entity recognition NER

Text classification

Model Access:

Open weights (unrestricted)

Citations:

304

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German language models, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and utilizing WWM improve model performance. By benchmarking against existing German models, we show that these models are the best German models to date. Our trained models will be made publicly available to the research community.

Benchmarking

FLOPs1.43e+21

Notes: flops = (64) * (123* 10**12) * (7 * 24 * 3600) * (0.3) = 1.4e21 (num gpu) * (peak flops) * (time in seconds) * (assumed utilization rate) 'large models were trained on pods of 16 TPUs v3 (128 cores).' - from section 4.1 it was trained for 7 days from Table 2 Agrees with 6CN: Tokens seen: 512 (seq len) * 1024 (batch size) * 1 million (steps) = 5.24e11 FLOPs: 6 * 335M * 5.24e11 = 1.05e21

Training

Training Code AccessibilityMIT: https://huggingface.co/deepset/gelectra-large

HardwareGoogle TPU v3

Hardware Quantity64

Size Notes: 163.4GB from Table 1 in the paper assuming 167M words per GB (German Language) we have 163.4 * 167M * 4/3 tokens per word = 36,383,733,333

Parameters

Parameters335000000

Notes: 335M from Table 5

Authors

Branden Chan, Stefan Schweter, Timo Möller

deepset,Bayerische Staatsbibliothek Muenchen | German ELECTRA Large - Capabilities, Benchmarks and Use Cases

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors