AraBERT is an Arabic pretrained language model based on Google's BERT architechture. AraBERT uses the same BERT-Base config. More details are available in the AraBERT Paper and in the AraBERT Meetup There are two versions of the model, AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the Farasa Segmenter. We evaluate AraBERT models on different downstream tasks and compare them to mBERT, and other state of the art models (To the extent of our knowledge). The Tasks were Sentiment Analysis on 6 different datasets (HARD, ASTD-Balanced, ArsenTD-Lev, LABR), Named Entity Recognition with the ANERcorp, and Arabic Question Answering on Arabic-SQuAD and ARCD
Notes: 6 FLOP / parameter / token * (128*13440*250000 + 512*2056*300000) total training tokens [see dataset size notes] * 371000000 parameters = 1.6603324e+21 FLOP 123000000000000 FLOP / chip / sec * 64 chips [=128 cores] * 168 hours * 3600 sec / hour * 0.3 [assumed utilization] = 1.4282957e+21 FLOP sqrt(1.6603324e+21*1.4282957e+21) = 1.5399499e+21 FLOP
Size Notes: num of examples with seq len (128 / 512): 520M / 245M 128 (Batch Size/ Num of Steps): 13440 / 250K 512 (Batch Size/ Num of Steps): 2056 / 300K 9932800000 tokens - size of the dataset (see AraGPT2-Mega dataset size notes) (128*13440*250000 + 512*2056*300000) / 9932800000 = 75 epochs
Notes: 371M