This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. To develop this family of models, we use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency. The models were searched from the search space enriched with new ops such as Fused-MBConv. Our experiments show that EfficientNetV2 models train much faster than state-of-the-art models while being up to 6.8x smaller. Our training can be further sped up by progressively increasing the image size during training, but it often causes a drop in accuracy. To compensate for this accuracy drop, we propose to adaptively adjust regularization (e.g., dropout and data augmentation) as well, such that we can achieve both fast training and good accuracy. With progressive learning, our EfficientNetV2 significantly outperforms previous models on ImageNet and CIFAR/Cars/Flowers datasets. By pretraining on the same ImageNet21k, our EfficientNetV2 achieves 87.3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2.0% accuracy while training 5x-11x faster using the same computing resources. Code will be available at this https URL.
Notes: Table 7, page 7: 45 hours on 32 TPUv3 cores. "Each v3 TPU chip contains two TensorCores." TPU performance per chip = 123e12 FLOP/s 32 cores = 16 chips 123e12 FLOP/s per chip * (32 cores / 2 cores per chip) * 45 hours * 3600 seconds/hour * 0.30 utilization = 9.56e19 FLOP https://www.wolframalpha.com/input?i=123+terahertz+*+16+*+45+hours+*+0.3
Size Notes: "ImageNet21k (Russakovsky et al., 2015) contains about 13M training images with 21,841 classes. The original ImageNet21k doesn’t have train/eval split, so we reserve randomly picked 100,000 images as validation set and use the remaining as training set... After pretrained on ImageNet21k, each model is finetuned on ILSVRC2012 for 15 epochs using cosine learning rate decay." 12.9M + 1.28M ~= 14,180,000
Notes: 208M for XL version (Table 7, page 7)