Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: this https URL
Notes: Numbers from section 3.2, they specifically mention using mixed precision training. 6125 ms / batch * 114890 batches = 8.14 days (they round to 8 in the text) 512 GPUs * 8.14 days * 24h/day * 3600s/h * 125 TFLOP/s * 0.4 (assumed utilization) = 1.800e22 "on 512 V100 32GB NVIDIA GPUs. Training this model on 1 billion images requires 114, 890 training iterations for a batch size of 8, 704 images, summing to 8 days of training over 512 GPUs."
Size Notes: "Overall, we train on 1B images for a total of 122K iterations."
Notes: From abstract: " Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters..."