Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

Citations:

913

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved a new record 65.4 mAP on COCO test-dev and 62.9 mIoU on ADE20K, outperforming current leading CNNs and ViTs. The code will be released at this https URL.

Benchmarking

FLOPs2.41e+21

Notes: InternImage-H is pre-trained on a 427 million joint dataset of public Laion-400M [61], YFCC-15M [62], and CC12M [63] for 30 epochs, and then fine-tuned the model on ImageNet-1K for 20 epochs. ImageNet-1K has 1,281,167 images. Table 2 says InternImage-H uses 188 GFLOP per forward pass at 224 resolution, and 1478 GFLOP at 640 Table 7 indicates training InternImage-H was done at a scale of "224/640" so presumably there was pretraining at 224x224 resolution and then some fine-tuning at 640x640. It's not clear how much training was done at each resolution, but typically this is a small fraction of total training (e.g. Noisy Student finds it sufficient to train for 350 epochs at smaller resolution, and then fine-tune at the higher resolution for 1.5 epochs). We'll ignore the additional FLOPs from high resolution training. Total training FLOPs: 188e9 FLOP/image * (427M images * 30 epochs) + (1.281M images * 20 epochs) * 3 (additional FLOPs for backward pass) = 2.408e21

Training

Training Code Accessibilityhttps://github.com/OpenGVLab/InternImage MIT license https://huggingface.co/OpenGVLab/internimage_h_jointto22k_384

Parameters

Parameters1080000000

Notes: 1.08B, table 1

Authors

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Related ModelsView all models

InternLM2.5By Shanghai AI Lab

Language

Model Details

Domain:

Task:

Model Access:

Open weights (unrestricted)

Citations:

913

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Benchmarking

FLOPs2.41e+21

Training

Training Code Accessibilityhttps://github.com/OpenGVLab/InternImage MIT license https://huggingface.co/OpenGVLab/internimage_h_jointto22k_384

Parameters

Parameters1080000000

Notes: 1.08B, table 1

Authors

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Related ModelsView all models

InternLM2.5By Shanghai AI Lab

Language

Shanghai AI Lab,Tsinghua University,Nanjing University,SenseTime,Chinese University of Hong Kong (CUHK) | InternImage - Capabilities, Benchmarks and Use Cases

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors