Introduction

AraBERT is an Arabic pretrained language model based on Google's BERT architechture. AraBERT uses the same BERT-Base config. More details are available in the AraBERT Paper and in the AraBERT Meetup There are two versions of the model, AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the Farasa Segmenter. We evaluate AraBERT models on different downstream tasks and compare them to mBERT, and other state of the art models (To the extent of our knowledge). The Tasks were Sentiment Analysis on 6 different datasets (HARD, ASTD-Balanced, ArsenTD-Lev, LABR), Named Entity Recognition with the ANERcorp, and Arabic Question Answering on Arabic-SQuAD and ARCD

Benchmarking

FLOPs

1.54e+21

Notes: 6 FLOP / parameter / token * (128*13440*250000 + 512*2056*300000) total training tokens [see dataset size notes] * 371000000 parameters = 1.6603324e+21 FLOP 123000000000000 FLOP / chip / sec * 64 chips [=128 cores] * 168 hours * 3600 sec / hour * 0.3 [assumed utilization] = 1.4282957e+21 FLOP sqrt(1.6603324e+21*1.4282957e+21) = 1.5399499e+21 FLOP

Training

Training Code Accessibility

https://huggingface.co/aubmindlab/bert-large-arabertv2

Hardware

Google TPU v3

Hardware Quantity

128

Size Notes: num of examples with seq len (128 / 512): 520M / 245M 128 (Batch Size/ Num of Steps): 13440 / 250K 512 (Batch Size/ Num of Steps): 2056 / 300K 9932800000 tokens - size of the dataset (see AraGPT2-Mega dataset size notes) (128*13440*250000 + 512*2056*300000) / 9932800000 = 75 epochs

Introduction

Benchmarking

FLOPs

1.54e+21

Training

Training Code Accessibility

https://huggingface.co/aubmindlab/bert-large-arabertv2

Hardware

Google TPU v3

Hardware Quantity

128

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

AraELECTRA

AraGPT2-Mega

AraBERT

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

AraELECTRA

AraGPT2-Mega

AraBERT

AraBERT LArge v2 - Use Model

AraBERT LArge v2 - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

AraELECTRA

AraGPT2-Mega

AraBERT

AraBERT LArge v2 - Use Model

AraBERT LArge v2 - Use Model

Model Details

Introduction

Benchmarking

Training

Parameters

Authors

Related Models

AraELECTRA

AraGPT2-Mega

AraBERT