opengvlab / internimage Goto Github PK

View Code? Open in Web Editor NEW

2.4K 34.0 229.0 22.89 MB

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Home Page: https://arxiv.org/abs/2211.05778

License: MIT License

Python 62.09% Shell 0.24% C++ 1.20% Cuda 6.39% Jupyter Notebook 30.08%

backbone deformable-convolution object-detection semantic-segmentation foundation-model

internimage's Introduction

[中文版本]

We currently receive a bunch of issues, our team will check and solve them one by one, please stay tuned.

INTERN-2.5: Multimodal Multitask General Large Model

The official implementation of

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.

[Paper] [Blog in Chinese]

Highlights

👍 The strongest open-source visual universal backbone model with up to 3 billion parameters
🏆 Achieved 90.1% Top1 accuracy in ImageNet, the most accurate among open-source models
🏆 Achieved 65.5 mAP on the COCO benchmark dataset for object detection, the only model that exceeded 65.0 mAP

Related Projects

Foundation Models

Uni-Perceiver: A Pre-training unified architecture for generic perception for zero-shot and few-shot tasks
Uni-Perceiver v2: A generalist model for large-scale vision and vision-language tasks
M3I-Pretraining: One-stage pre-training paradigm via maximizing multi-modal mutual information
InternVL: The largest open-source vision/vision-language foundation model (14B) to date

Autonomous Driving

BEVFormer: A cutting-edge baseline for camera-based 3D detection
BEVFormer v2: Adapting modern image backbones to Bird's-Eye-View recognition via perspective supervision

Application in Challenges

2022 Waymo 3D Camera-Only Detection Challenge: BEVFormer++ Ranks 1st based on InternImage
nuScenes 3D detection task: BEVFormer v2 achieves SOTA performance of 64.8 NDS on nuScenes Camera Only
CVPR 2023 Workshop End-to-End Autonomous Driving: InternImage supports the baseline of the 3D Occupancy Prediction Challenge and OpenLane Topology Challenge

News

Jan 22, 2024: 🚀 Support DCNv4 in InternImage!
Mar 14, 2023: 🚀 "INTERN-2.5" is released！
Feb 28, 2023: 🚀 InternImage is accepted to CVPR 2023!
Nov 18, 2022: 🚀 InternImage-XL merged into BEVFormer v2 achieves state-of-the-art performance of 63.4 NDS on nuScenes Camera Only.
Nov 10, 2022: 🚀 InternImage-H achieves a new record 65.4 mAP on COCO detection test-dev and 62.9 mIoU on ADE20K, outperforming previous models by a large margin.

History

Introduction

"INTERN-2.5" is a powerful multimodal multitask general model jointly released by SenseTime and Shanghai AI Laboratory. It consists of large-scale vision foundation model "InternImage", pre-training method "M3I-Pretraining", generic decoder "Uni-Perceiver" series, and generic encoder for autonomous driving perception "BEVFormer" series.

Applications

🌅 Image Modality Tasks

"INTERN-2.5" achieved an impressive Top-1 accuracy of 90.1% on the ImageNet benchmark dataset using only publicly available data for image classification. Apart from two undisclosed models trained with additional datasets by Google and Microsoft, "INTERN-2.5" is the only open-source model that achieves a Top-1 accuracy of over 90.0%, and it is also the largest model in scale worldwide.

"INTERN-2.5" outperformed all other models worldwide on the COCO object detection benchmark dataset with a remarkable mAP of 65.5, making it the only model that surpasses 65 mAP in the world.

"INTERN-2.5" also demonstrated world's best performance on 16 other important visual benchmark datasets, covering a wide range of tasks such as classification, detection, and segmentation, making it the top-performing model across multiple domains.

Performance

Classification

Image Classification	Scene Classification		Long-Tail Classification
ImageNet	Places365	Places 205	iNaturalist 2018
90.1	61.2	71.7	92.3

Detection

Conventional Object Detection				Long-Tail Object Detection			Autonomous Driving Object Detection	Dense Object Detection
COCO	VOC 2007	VOC 2012	OpenImage	LVIS minival	LVIS val	BDD100K	nuScenes	CrowdHuman
65.5	94.0	97.2	74.1	65.8	63.2	38.8	64.8	97.2

Segmentation

Semantic Segmentation			Street Segmentation	RGBD Segmentation
ADE20K	COCO Stuff-10K	Pascal Context	CityScapes	NYU Depth V2
62.9	59.6	70.3	86.1	69.7

🌁 📖 Image and Text Cross-Modal Tasks

Image-Text Retrieval: "INTERN-2.5" can quickly locate and retrieve the most semantically relevant images based on textual content requirements. This capability can be applied to both videos and image collections and can be further combined with object detection boxes to enable a variety of applications, helping users quickly and easily find the required image resources. For example, it can return the relevant images specified by the text in the album.

Image-To-Text: "INTERN-2.5" has a strong understanding capability in various aspects of visual-to-text tasks such as image captioning, visual question answering, visual reasoning, and optical character recognition. For example, in the context of autonomous driving, it can enhance the scene perception and understanding capabilities, assist the vehicle in judging traffic signal status, road signs, and other information, and provide effective perception information support for vehicle decision-making and planning.

Performance

Image Captioning	Fine-tuning Image-Text Retrieval		Zero-shot Image-Text Retrieval
COCO Caption	COCO Caption	Flickr30k	Flickr30k
148.2	76.4	94.8	89.1

Released Models

Open-source Visual Pretrained Models

name	pretrain	pre-training resolution	#param	download
InternImage-L	ImageNet-22K	384x384	223M	ckpt
InternImage-XL	ImageNet-22K	384x384	335M	ckpt
InternImage-H	Joint 427M	384x384	1.08B	ckpt
InternImage-G	-	384x384	3B	ckpt

ImageNet-1K Image Classification

name	pretrain	resolution	acc@1	#param	FLOPs	download
InternImage-T	ImageNet-1K	224x224	83.5	30M	5G	ckpt \| cfg
InternImage-S	ImageNet-1K	224x224	84.2	50M	8G	ckpt \| cfg
InternImage-B	ImageNet-1K	224x224	84.9	97M	16G	ckpt \| cfg
InternImage-L	ImageNet-22K	384x384	87.7	223M	108G	ckpt \| cfg
InternImage-XL	ImageNet-22K	384x384	88.0	335M	163G	ckpt \| cfg
InternImage-H	Joint 427M	640x640	89.6	1.08B	1478G	ckpt \| cfg
InternImage-G	-	512x512	90.1	3B	2700G	ckpt \| cfg

COCO Object Detection and Instance Segmentation

backbone	method	schd	box mAP	mask mAP	#param	FLOPs	download
InternImage-T	Mask R-CNN	1x	47.2	42.5	49M	270G	ckpt \| cfg
InternImage-T	Mask R-CNN	3x	49.1	43.7	49M	270G	ckpt \| cfg
InternImage-S	Mask R-CNN	1x	47.8	43.3	69M	340G	ckpt \| cfg
InternImage-S	Mask R-CNN	3x	49.7	44.5	69M	340G	ckpt \| cfg
InternImage-B	Mask R-CNN	1x	48.8	44.0	115M	501G	ckpt \| cfg
InternImage-B	Mask R-CNN	3x	50.3	44.8	115M	501G	ckpt \| cfg
InternImage-L	Cascade	1x	54.9	47.7	277M	1399G	ckpt \| cfg
InternImage-L	Cascade	3x	56.1	48.5	277M	1399G	ckpt \| cfg
InternImage-XL	Cascade	1x	55.3	48.1	387M	1782G	ckpt \| cfg
InternImage-XL	Cascade	3x	56.2	48.8	387M	1782G	ckpt \| cfg

backbone	method	box mAP (val/test)	#param	FLOPs	download
InternImage-H	DINO (TTA)	65.0 / 65.4	2.18B	TODO	TODO
InternImage-G	DINO (TTA)	65.3 / 65.5	3B	TODO	TODO

ADE20K Semantic Segmentation

backbone	method	resolution	mIoU (ss/ms)	#param	FLOPs	download
InternImage-T	UperNet	512x512	47.9 / 48.1	59M	944G	ckpt \| cfg
InternImage-S	UperNet	512x512	50.1 / 50.9	80M	1017G	ckpt \| cfg
InternImage-B	UperNet	512x512	50.8 / 51.3	128M	1185G	ckpt \| cfg
InternImage-L	UperNet	640x640	53.9 / 54.1	256M	2526G	ckpt \| cfg
InternImage-XL	UperNet	640x640	55.0 / 55.3	368M	3142G	ckpt \| cfg
InternImage-H	UperNet	896x896	59.9 / 60.3	1.12B	3566G	ckpt \| cfg
InternImage-H	Mask2Former	896x896	62.5 / 62.9	1.31B	4635G	ckpt \| cfg

Main Results of FPS

Export classification model from pytorch to tensorrt

Export detection model from pytorch to tensorrt

Export segmentation model from pytorch to tensorrt

name	resolution	#param	FLOPs	batch 1 FPS (TensorRT)
InternImage-T	224x224	30M	5G	156
InternImage-S	224x224	50M	8G	129
InternImage-B	224x224	97M	16G	116
InternImage-L	384x384	223M	108G	56
InternImage-XL	384x384	335M	163G	47

Before using mmdeploy to convert our PyTorch models to TensorRT, please make sure you have the DCNv3 custom operator builded correctly. You can build it with the following command:

export MMDEPLOY_DIR=/the/root/path/of/MMDeploy

# prepare our custom ops, you can find it at InternImage/tensorrt/modulated_deform_conv_v3
cp -r modulated_deform_conv_v3 ${MMDEPLOY_DIR}/csrc/mmdeploy/backend_ops/tensorrt

# build custom ops
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} -DCUDNN_DIR=${CUDNN_DIR} ..
make -j$(nproc) && make install

# install the mmdeploy after building custom ops
cd ${MMDEPLOY_DIR}
pip install -e .

For more details on building custom ops, please refering to this document.

Citations

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{wang2022internimage,
  title={InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions},
  author={Wang, Wenhai and Dai, Jifeng and Chen, Zhe and Huang, Zhenhang and Li, Zhiqi and Zhu, Xizhou and Hu, Xiaowei and Lu, Tong and Lu, Lewei and Li, Hongsheng and others},
  journal={arXiv preprint arXiv:2211.05778},
  year={2022}
}

@inproceedings{zhu2022uni,
  title={Uni-perceiver: Pre-training unified architecture for generic perception for zero-shot and few-shot tasks},
  author={Zhu, Xizhou and Zhu, Jinguo and Li, Hao and Wu, Xiaoshi and Li, Hongsheng and Wang, Xiaohua and Dai, Jifeng},
  booktitle={CVPR},
  pages={16804--16815},
  year={2022}
}

@article{zhu2022uni,
  title={Uni-perceiver-moe: Learning sparse generalist models with conditional moes},
  author={Zhu, Jinguo and Zhu, Xizhou and Wang, Wenhai and Wang, Xiaohua and Li, Hongsheng and Wang, Xiaogang and Dai, Jifeng},
  journal={arXiv preprint arXiv:2206.04674},
  year={2022}
}

@article{li2022uni,
  title={Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks},
  author={Li, Hao and Zhu, Jinguo and Jiang, Xiaohu and Zhu, Xizhou and Li, Hongsheng and Yuan, Chun and Wang, Xiaohua and Qiao, Yu and Wang, Xiaogang and Wang, Wenhai and others},
  journal={arXiv preprint arXiv:2211.09808},
  year={2022}
}

@article{yang2022bevformer,
  title={BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision},
  author={Yang, Chenyu and Chen, Yuntao and Tian, Hao and Tao, Chenxin and Zhu, Xizhou and Zhang, Zhaoxiang and Huang, Gao and Li, Hongyang and Qiao, Yu and Lu, Lewei and others},
  journal={arXiv preprint arXiv:2211.10439},
  year={2022}
}

@article{su2022towards,
  title={Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information},
  author={Su, Weijie and Zhu, Xizhou and Tao, Chenxin and Lu, Lewei and Li, Bin and Huang, Gao and Qiao, Yu and Wang, Xiaogang and Zhou, Jie and Dai, Jifeng},
  journal={arXiv preprint arXiv:2211.09807},
  year={2022}
}

@inproceedings{li2022bevformer,
  title={Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers},
  author={Li, Zhiqi and Wang, Wenhai and Li, Hongyang and Xie, Enze and Sima, Chonghao and Lu, Tong and Qiao, Yu and Dai, Jifeng},
  booktitle={ECCV},
  pages={1--18},
  year={2022},
}

internimage's People

Contributors

Stargazers

Watchers

Forkers

wstchhwp leayz-888 deyh2020 ligfo dengjian-cn go-ahead-maker hinsjane linhong00316 marenan miguel-jimenezmartinez ladder123 thanhhung0112 radenbimo nameongithub saeed-anwar kevinmtanadi jackwilson9999 k-h-ismail conscious-choi wen0320 overbestfitting chenjie04 wofmanaf 985082022 lxt98 seungyoungshin andreimihalea cv-ip ujhwang wstchh ottolu hanzc989 dhkim36 toilaluan dttutty milort bill007bill wusongyuan manzhihuangnian haoyanlong qi-chuan mbyase lattic pbdahzou hongbo-sun lwdebug samson-wang theatm 0x738a zengyijie guichenguang yuanwei0908 sjtuly david-y-e broadswordzhang rentainhe aliman80 josianemouawad brucekyle99 give-hinatahajime-sakuramochi liuxinhai franciszero qiangtang2017 rajpurkarlab dongbo811 mbeoo zeqiang-lai sngn-libby huamiao1012 0x1of1 li-qingyun kchen116 mengxyokok zyc573823770 gavinwang668 zzs4026 ljqcn101 ningz7 h-hui2277 shuowang-ai xyjxjzf vin9196 superf0sh yirui-fafa hjm999999 nobugw 827346462 jinghere11 jdekun rollrollroll kikyowu russ76 fengd dangerous-xu ai-jie01 wangjuenew tomproud ccsvd newbieesaibot joberzheng

internimage's Issues

when to release the Intern-H object detection model weights

Any chance for the requirements.txt file?

Won't this create a circular reference?

dcnv3.py need to import DCNv3Function(including DCNv3), while dcnv3_func.py need to import DCNv3. Won't this create a circular reference?

Running code with cpu instead of cuda

I have error install cuda to run code. Could I run it with cpu instead? I found out folder cpu and cuda in folder src in code but not sure how to switch to run code with cpu? Thanks so much for your support.

TypeError: forward() missing 1 required positional argument: 'im2col_step'

from future import absolute_import
from future import print_function
from future import division

import torch
import torch.nn.functional as F
from torch.autograd import Function
from torch.autograd.function import once_differentiable
from torch.cuda.amp import custom_bwd, custom_fwd
from ops_dcnv3.modules import dcnv3

class DCNv3Function(Function):
@staticmethod
@custom_fwd
def forward(
ctx, input, offset, mask,
kernel_h, kernel_w, stride_h, stride_w,
pad_h, pad_w, dilation_h, dilation_w,
group, group_channels, offset_scale, im2col_step):
ctx.kernel_h = kernel_h
ctx.kernel_w = kernel_w
ctx.stride_h = stride_h
ctx.stride_w = stride_w
ctx.pad_h = pad_h
ctx.pad_w = pad_w
ctx.dilation_h = dilation_h
ctx.dilation_w = dilation_w
ctx.group = group
ctx.group_channels = group_channels
ctx.offset_scale = offset_scale
ctx.im2col_step = im2col_step
output = DCNv3Function.forward(
input, offset, mask, kernel_h, kernel_w, stride_h, stride_w,
pad_h, pad_w, dilation_h, dilation_w,
group, group_channels, offset_scale, im2col_step)
ctx.save_for_backward(input, offset, mask)

    return output

ImageNet pretrained cfg not found error

eval

Hello, I use internimage as my model backone. Why does the training loss drop normally, but the map in the verification set is all 0

如何体验或测试“图文检索”能力

您好，目前公开的能力中，仅提供了图像检测、分类、分割任务方向。
在介绍中包括了“图文检索”能力，请问如何测试体验？

检测infer的代码发布了吗

About segmentation inference

Thanks for your great work! Is the inference code on segmentation task available? Thanks!

Pure Python implemented DCNv3 ?

Could you provide pure Python/PyTorch implemented DCNv3? It will be easier to migrate and run on a Non-CUDA device.

When will InternetImage-H and the best COCO detection model be released?

I don't understand this shared weight in the article, can you explain it in detail, please?

To remedy this problem, we borrow the idea from the separable convolution [56] and detach the original convolution weights wk into depth-wise and point-wise parts, where the depth-wise part is responsible by the original location-aware modulation scalar mk, and the point-wise part is the shared projection weights w among sampling points.

Can you provide some information? Thank you very much!

tensorrt and FPS

Hi all, quite impressed with your great work!
I noticed that you put the table of Main Results of FPS, would you provide the code for time measurement?
Also, would you plan to upload the models for edge application? tensorrt, onnx ...

error: ATen/OpMathType.h: No such file or directory

export.py for segmentation model

Hello! I'm attempting to build an onnx for segmentation inference, and I noticed there's a export.py for the classification folder, but not for segmentation folder...

Is this a possibility in the future? Or, a release of onnx direct?

Thank you!

output image size in segmentation

Hello,

thanks for the awesome repo! I am trying to adopt this code into a segmentation task for our lab. I needed to strip the model part out and insert it into our script, but I encountered a small issue here.

I started by copying the entire segmentation/mmseg_custom/models/backbones/intern_image.py into the notebook, and initialized the network as shown in your training script with config in upernet_internimage_l_640_160k_ade20k.py

However, when I tried to inference on a (1, 3, 640, 640) image using the below code

model = dict(
    backbone=dict(
        _delete_=True,
        type="InternImage",
        core_op="DCNv3",
        channels=160,
        depths=[5, 5, 22, 5],
        groups=[10, 20, 40, 80],
        mlp_ratio=4.0,
        drop_path_rate=0.4,
        norm_layer="LN",
        layer_scale=1.0,
        offset_scale=2.0,
        post_norm=True,
        with_cp=False,
        out_indices=(0, 1, 2, 3),
        # init_cfg=dict(type="Pretrained", checkpoint=pretrained),
    ),
    decode_head=dict(num_classes=150, in_channels=[160, 320, 640, 1280]),
    auxiliary_head=dict(num_classes=150, in_channels=640),
    test_cfg=dict(mode="whole"),
)
model['type'] = 'InternImage'

net = build_segmentor(model).cuda()
data = torch.rand(1, 3, 640, 640).cuda()
out = net(data)
[each.shape for each in out]

i got these as shape.

[torch.Size([1, 160, 160, 160]),
 torch.Size([1, 320, 80, 80]),
 torch.Size([1, 640, 40, 40]),
 torch.Size([1, 1280, 20, 20])]

I understand this is because of the stem at the begining of the network, and the model is designed to output result at each resolution scale.

my questions is how do i get a segmented result with the same x and y shape? e.g. 640 by 640

i couldnt figure out how to build my own decoder_head as speced in upernet_internimage_l_640_160k_ade20k.py using the mmseg package.

many thanks,
Michael

_pickle.UnpicklingError: invalid load key, '\xda'.

hello, thanks for your reply. Last question had solved. But I meet another problem. When I tried to load pre-train model, I met an error: _pickle.UnpicklingError: invalid load key, '\xda'. It seems that the pre-train model file has broken， isn't it?

Traceback (most recent call last):
File "/home/zyp/下载/pytorch/InternImage-master/segmentation/mytest.py", line 3, in
model = torch.load(r'checkpoint_dir/upernet_internimage_t_512_160k_ade20k.pth')
File "/home/zyp/anaconda3/envs/pytorch_cp37/lib/python3.7/site-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/zyp/anaconda3/envs/pytorch_cp37/lib/python3.7/site-packages/torch/serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\xda'.

Consulting AI Infra information for InternImage.

Hi, I can't learn about training infrastructure information in this paper. Can you provide the following information?

the number of GPU
the type of GPU
the type of training framework( pytorch, tensorflow, etc)
cost time for full training.

训练结果与论文展示有区别

尊敬的作者，您好！
我尝试着在学校的服务器上跑了下您的代码，使用的是InternImage-T | Mask R-CNN | 1x，用了两块A100，在预训练模型上跑了12个eopch，配置文件里的参数没有改过，但最后的结果却和论文实验的结果有比较大的出入。
您的结果box mAP 47.2 | mask mAP ：42.5
我这边的输出结果：

请问这是怎么回事呢？

代码整理得咋样啦

无法运行问题

我在windows平台尝试调试segmentation里的train.py, 但是遇到了如下问题

我单步调试后发现当尝试import mmcv时会触发该bug, 但是我已经按照readme安装了该库
我在错误处尝试换gbk编码解释该字符串, 能成功解析

我在全盘搜索了该字符串, 发现它位于安装目录Python\Python39\Lib\site-packages\torch\utils下的cpp_extension.py中

我不太清楚是否是环境的问题, 以及是否能在windows上修复该问题
谢谢

DCNV3安装遇到的问题

你好，我这里遇到了以下问题
1.直接使用 sh ./make.sh命令会出现以下错误

2.我查看了make.sh命令发现只是以下的python命令，所以我直接运行了该命令，但是又出现了以下错误。

我的cuda路径确实是这里

从报错的地方看，好像路径的前边多了一个:，但是我对相关代码不熟悉，找不出报错的地方。
不知道有没有好的解决方法
谢谢

`CUBLAS_STATUS_INTERNAL_ERROR` on training segmentation

Hi there, firstly thank you very much for your work. Upon trying to use your backbone to train a segmentation model, I run into a CUBLAS_STATUS_INTERNAL_ERROR:

2023-03-10 22:05:40,534 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
2023-03-10 22:05:40,534 - mmseg - INFO - Checkpoints will be saved to mmsegmentation/work_dirs/internimage_base_512 by HardDiskBackend.
2023-03-10 22:05:46,860 - mmseg - INFO - Iter [20/160000]       lr: 7.600e-07, eta: 13:43:03, time: 0.309, data_time: 0.014, memory: 6998, decode.loss_ce: nan, decode.acc_seg: 7.1505, aux.loss_ce: nan, aux.acc_seg: 7.1649, loss: nan
Traceback (most recent call last):
  File "mmsegmentation/train.py", line 162, in <module>
    train_segmentor(model, datasets, cfg, distributed=False, validate=True, 
  File "mmsegmentation/mmseg/apis/train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 138, in run
    iter_runner(iter_loaders[i], **kwargs)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 62, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "mmsegmentation/mmseg/models/segmentors/base.py", line 138, in train_step
    losses = self(**data_batch)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
    return old_func(*args, **kwargs)
  File "mmsegmentation/mmseg/models/segmentors/base.py", line 108, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 139, in forward_train
    x = self.extract_feat(img)
  File "mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 65, in extract_feat
    x = self.backbone(img)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 479, in forward
    x, x_ = level(x, return_wo_downsample=True)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 316, in forward
    x = blk(x)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 252, in forward
    x = _inner_forward(x)
  File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 242, in _inner_forward
    x = x + self.drop_path(self.gamma1 * self.norm1(self.dcn(x)))
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mmsegmentation/ops_dcnv3/modules/dcnv3.py", line 276, in forward
    x = self.output_proj(x)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

I compiled DCNv3 and test.py runs without an error.
CUBLAS_STATUS_INTERNAL_ERROR does not occur with other native mmsegmentation configs/backbones.

Do you know what could be the cause of this issue?
Thank you very much!

CUDA 11.3
pytorch 11.1.0
cudnn8.2.0
torchvision0.12.0

All conda packages:

# Name                    Version                   Build  Channel
addict                    2.4.0                    pypi_0    pypi
blas                      1.0                         mkl  
brotlipy                  0.7.0           py39h27cfd23_1003  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2022.4.26            h06a4308_0  
certifi                   2022.6.15        py39h06a4308_0  
cffi                      1.15.0           py39hd667e15_1  
charset-normalizer        2.0.12                   pypi_0    pypi
click                     7.1.2                    pypi_0    pypi
colorama                  0.4.5                    pypi_0    pypi
cryptography              37.0.1           py39h9ce1e76_0  
cudatoolkit               11.3.1               h2bc3f7f_2  
cycler                    0.11.0                   pypi_0    pypi
dcnv3                     1.0                      pypi_0    pypi
ffmpeg                    4.3                  hf484d3e_0    pytorch
filelock                  3.9.0                    pypi_0    pypi
fonttools                 4.33.3                   pypi_0    pypi
freetype                  2.11.0               h70c0345_0  
giflib                    5.2.1                h7b6447c_0  
gmp                       6.2.1                h295c915_3  
gnutls                    3.6.15               he1e5248_0  
huggingface-hub           0.13.1                   pypi_0    pypi
idna                      3.3                pyhd3eb1b0_0  
importlib-metadata        4.11.4                   pypi_0    pypi
intel-openmp              2021.4.0          h06a4308_3561  
jpeg                      9e                   h7f8727e_0  
kiwisolver                1.4.3                    pypi_0    pypi
lame                      3.100                h7b6447c_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 11.2.0               h1234567_1  
libiconv                  1.16                 h7f8727e_2  
libidn2                   2.3.2                h7f8727e_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtasn1                  4.16.0               h27cfd23_0  
libtiff                   4.2.0                h2818925_1  
libunistring              0.9.10               h27cfd23_0  
libuv                     1.40.0               h7b6447c_0  
libwebp                   1.2.2                h55f646e_0  
libwebp-base              1.2.2                h7f8727e_0  
lz4-c                     1.9.3                h295c915_1  
markdown                  3.3.7                    pypi_0    pypi
matplotlib                3.5.2                    pypi_0    pypi
mkl                       2021.4.0           h06a4308_640  
mkl-service               2.4.0            py39h7f8727e_0  
mkl_fft                   1.3.1            py39hd3c417c_0  
mkl_random                1.2.2            py39h51133e4_0  
mmcls                     0.23.1                   pypi_0    pypi
mmcv-full                 1.5.3                    pypi_0    pypi
mmdet                     2.28.1                   pypi_0    pypi
mmsegmentation            0.25.0                    dev_0    <develop>
model-index               0.1.11                   pypi_0    pypi
ncurses                   6.3                  h7f8727e_2  
nettle                    3.7.3                hbbd107a_1  
numpy                     1.23.0                   pypi_0    pypi
numpy-base                1.22.3           py39hf524024_0  
opencv-python             4.6.0.66                 pypi_0    pypi
openh264                  2.1.1                h4ff587b_0  
openmim                   0.1.6                    pypi_0    pypi
openssl                   1.1.1o               h7f8727e_0  
ordered-set               4.1.0                    pypi_0    pypi
packaging                 21.3                     pypi_0    pypi
pandas                    1.4.3                    pypi_0    pypi
pillow                    9.1.1                    pypi_0    pypi
pip                       21.2.4           py39h06a4308_0  
prettytable               3.3.0                    pypi_0    pypi
pycocotools               2.0.6                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0  
pyopenssl                 22.0.0             pyhd3eb1b0_0  
pyparsing                 3.0.9                    pypi_0    pypi
pysocks                   1.7.1            py39h06a4308_0  
python                    3.9.12               h12debd9_1  
python-dateutil           2.8.2                    pypi_0    pypi
pytorch                   1.11.0          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pytz                      2022.1                   pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
readline                  8.1.2                h7f8727e_1  
requests                  2.28.0                   pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
setuptools                61.2.0           py39h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.38.5               hc218d9a_0  
tabulate                  0.8.10                   pypi_0    pypi
termcolor                 2.2.0                    pypi_0    pypi
terminaltables            3.1.10                   pypi_0    pypi
timm                      0.6.11                   pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
torchaudio                0.11.0               py39_cu113    pytorch
torchvision               0.12.0               py39_cu113    pytorch
tqdm                      4.65.0                   pypi_0    pypi
typing-extensions         4.2.0                    pypi_0    pypi
typing_extensions         4.1.1              pyh06a4308_0  
tzdata                    2022a                hda174b7_0  
urllib3                   1.26.9           py39h06a4308_0  
wcwidth                   0.2.5                    pypi_0    pypi
wheel                     0.37.1             pyhd3eb1b0_0  
xz                        5.2.5                h7f8727e_1  
yacs                      0.1.8                    pypi_0    pypi
yapf                      0.32.0                   pypi_0    pypi
zipp                      3.8.0                    pypi_0    pypi
zlib                      1.2.12               h7f8727e_2  
zstd                      1.5.2                ha4553b6_0

How to use the pre-trained or self trained detection model on CoCo image?

Is there a example for using the dectection model?
Eg. how to encapsulate it as a function ?

Occluded object detection

How good the model can be in detecting the occluded objects lying over each other from different categories ?

Is there any simple way to segment a single image?

Thank you for your awesome repo. I am looking for ways to segment a single image. However, it seems codes provide only valuation on the entire ade20k. Is there any way for me to segment a single image. I look into the code but it is quite complicated :D

想复现一些结果但是显存不够

您好，尊敬的作者，我现在有8张3090显卡，但是每个显存只有22G, 发现internimage-H的参数远远大于我一张卡的显存，请问怎么设置参数，才能复现您的结果呢

你好，代码啥时候公开？

Maybe better to keep the feature layout without permute ops ?

Use 1x1 conv to replace mlp equivalently ...
Use GroupNorm(1, channels) to relace LayerNorm(channels) equivalently ...
Then the feature layout could be always keeping (b, c, h, w) and need not permute ops to reduce memory usage ?

cannot export trt

export onnx is ok, export trt failed as follow:

[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:773: While parsing node number 78 [TRTDCNv3 -> "onnx::MatMul_732"]:
[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:774: --- Begin node ---
[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:775: input: "mmdeploy::TRTDCNv3_685"
input: "mmdeploy::TRTDCNv3_710"
input: "mmdeploy::TRTDCNv3_731"
output: "onnx::MatMul_732"
name: "TRTDCNv3_78"
op_type: "TRTDCNv3"
attribute {
  name: "dilation_h"
  i: 1
  type: INT
}
attribute {
  name: "dilation_w"
  i: 1
  type: INT
}
attribute {
  name: "group_channels"
  i: 16
  type: INT
}
attribute {
  name: "group"
  i: 4
  type: INT
}
attribute {
  name: "im2col_step"
  i: 256
  type: INT
}
attribute {
  name: "kernel_h"
  i: 3
  type: INT
}
attribute {
  name: "kernel_w"
  i: 3
  type: INT
}
attribute {
  name: "offset_scale"
  f: 1
  type: FLOAT
}
attribute {
  name: "pad_h"
  i: 1
  type: INT
}
attribute {
  name: "pad_w"
  i: 1
  type: INT
}
attribute {
  name: "stride_h"
  i: 1
  type: INT
}
attribute {
  name: "stride_w"
  i: 1
  type: INT
}
domain: "mmdeploy"

[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:776: --- End node ---
[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:4870 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
Traceback (most recent call last):
  File "export.py", line 123, in <module>
    main()
  File "export.py", line 118, in main
    onnx2trt(args)
  File "export.py", line 85, in onnx2trt
    max_workspace_size=2**30,
  File "****/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 177, in from_onnx
    raise RuntimeError(f'Failed to parse onnx, {error_msgs}')
RuntimeError: Failed to parse onnx, In node 78 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

dcn build with python test.py result as follow

foward time cost: 0.04141324281692505
>>> time cost: im2col_step 256; input torch.Size([512, 64, 64, 64]); points 9 
foward time cost: 0.042035584449768064
>>> time cost: im2col_step 512; input torch.Size([512, 64, 64, 64]); points 9 
foward time cost: 0.042629106044769285

mmdeloy 0.13 build from code,. python tools/check_env.py result as follow

2023-03-15 14:57:51,724 - mmdeploy - INFO - 

2023-03-15 14:57:51,725 - mmdeploy - INFO - **********Environmental information**********
2023-03-15 14:57:52,004 - mmdeploy - INFO - sys.platform: linux
2023-03-15 14:57:52,004 - mmdeploy - INFO - Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
2023-03-15 14:57:52,004 - mmdeploy - INFO - CUDA available: True
2023-03-15 14:57:52,004 - mmdeploy - INFO - GPU 0: Tesla T4
2023-03-15 14:57:52,004 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2023-03-15 14:57:52,004 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.3, V11.3.58
2023-03-15 14:57:52,004 - mmdeploy - INFO - GCC: gcc (GCC) 7.5.0
2023-03-15 14:57:52,004 - mmdeploy - INFO - PyTorch: 1.11.0
2023-03-15 14:57:52,004 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

2023-03-15 14:57:52,004 - mmdeploy - INFO - TorchVision: 0.12.0
2023-03-15 14:57:52,004 - mmdeploy - INFO - OpenCV: 4.5.4
2023-03-15 14:57:52,004 - mmdeploy - INFO - MMCV: 1.5.0
2023-03-15 14:57:52,005 - mmdeploy - INFO - MMCV Compiler: GCC 7.3
2023-03-15 14:57:52,005 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2023-03-15 14:57:52,005 - mmdeploy - INFO - MMDeploy: 0.13.0+02d5a09
2023-03-15 14:57:52,005 - mmdeploy - INFO - 

2023-03-15 14:57:52,005 - mmdeploy - INFO - **********Backend information**********
2023-03-15 14:57:52,065 - mmdeploy - INFO - tensorrt:   8.2.4.2
2023-03-15 14:57:52,065 - mmdeploy - INFO - tensorrt custom ops:        Available
2023-03-15 14:57:52,100 - mmdeploy - INFO - ONNXRuntime:        1.14.1
2023-03-15 14:57:52,100 - mmdeploy - INFO - ONNXRuntime-gpu:    None
2023-03-15 14:57:52,100 - mmdeploy - INFO - ONNXRuntime custom ops:     NotAvailable
2023-03-15 14:57:52,100 - mmdeploy - INFO - pplnn:      None
2023-03-15 14:57:52,101 - mmdeploy - INFO - ncnn:       None
2023-03-15 14:57:52,103 - mmdeploy - INFO - snpe:       None
2023-03-15 14:57:52,104 - mmdeploy - INFO - openvino:   2022.3.0
2023-03-15 14:57:52,105 - mmdeploy - INFO - torchscript:        1.11.0
2023-03-15 14:57:52,105 - mmdeploy - INFO - torchscript custom ops:     NotAvailable
2023-03-15 14:57:52,139 - mmdeploy - INFO - rknn-toolkit:       None
2023-03-15 14:57:52,139 - mmdeploy - INFO - rknn2-toolkit:      None
2023-03-15 14:57:52,140 - mmdeploy - INFO - ascend:     None
2023-03-15 14:57:52,140 - mmdeploy - INFO - coreml:     None
2023-03-15 14:57:52,141 - mmdeploy - INFO - tvm:        None
2023-03-15 14:57:52,141 - mmdeploy - INFO - 

2023-03-15 14:57:52,141 - mmdeploy - INFO - **********Codebase information**********
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmdet:      2.20.0
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmseg:      0.30.0
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmcls:      0.23.0
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmocr:      0.4.1
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmedit:     0.16.1
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmdet3d:    None
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmpose:     0.25.1
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmrotate:   None
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmaction:   None

关于onnx模型导出并用C++/ONNXRuntime部署过程中的问题

我比较喜欢ORT部署，因为可以快捷切换CUDA/Tensorrt/DML/OpenVINO作为推理后端。这是我个人部署过程中出现的一些问题，如果有解决方案最好。

以internimage_t_1k_224的分类模型为例，一开始按照教程导出onnx，会出现大量警告：
WARNING: The shape inference of mmdeploy::TRTDCNv3 type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
不过最终还是导出成功。

导出语句就是README里的：
python export.py --model_name internimage_t_1k_224 --ckpt_dir /path/to/ckpt/dir --onnx

但是用C++/ORT部署时，出现错误：
Fatal error: mmdeploy:TRTDCNv3(-1) is not a registered function/op
这应该是“DCNv3”属于自定义算子，没有用ONNXRuntime Custom operators的C++格式来实现。

根据 issue #41 ，我把"./classification/configs/internimage_t_1k_224.yaml"里的CORE_OP: 'DCNv3'改成CORE_OP: 'DCNv3_pytorch'（我不确信是否可以这样子做😥），使用纯pytorch的DCNv3实现，但是导出过程中还是出现大量警告并且导出失败：

Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Traceback (most recent call last):
  File ".\export.py", line 122, in <module>
    main()
  File ".\export.py", line 113, in main
    torch2onnx(args, cfg)
  File ".\export.py", line 61, in torch2onnx
    torch.onnx.export(model,
  File "D:\Python\Python38\lib\site-packages\torch\onnx\__init__.py", line 350, in export
    return utils.export(
  File "D:\Python\Python38\lib\site-packages\torch\onnx\utils.py", line 163, in export
    _export(
  File "D:\Python\Python38\lib\site-packages\torch\onnx\utils.py", line 1110, in _export
    ) = graph._export_onnx(  # type: ignore[attr-defined]
RuntimeError: Could not allocate bytes object!

Using InternImage for Object Detection without Segmentation

Hello,

I hope you are doing well. I am working on a project where I would like to use the InternImage dataset solely for object detection without involving segmentation. I attempted to use it with Cascade RCNN, but I encountered an error during the process.

Here is the error message I received:

2023-03-15 13:39:35,378 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
2023-03-15 13:39:35,422 - mmdet - INFO - Checkpoints will be saved to /content/drive/MyDrive/FETP/HealthSit/Phase_02_1/InternImage/detection/work_dirs/mod_cascade_internimage_l_fpn_3x_coco by HardDiskBackend.
Traceback (most recent call last):
  File "/content/drive/MyDrive/FETP/HealthSit/Phase_02_1/InternImage/detection/./train.py", line 247, in <module>
    main()
  File "/content/drive/MyDrive/FETP/HealthSit/Phase_02_1/InternImage/detection/./train.py", line 237, in main
    train_detector(model,
  File "/usr/local/lib/python3.9/dist-packages/mmdet/apis/train.py", line 246, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.9/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.9/dist-packages/torch/_utils.py", line 457, in reraise
    raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/custom.py", line 220, in __getitem__
    data = self.prepare_train_img(idx)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/custom.py", line 243, in prepare_train_img
    return self.pipeline(results)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/loading.py", line 398, in __call__
    results = self._load_masks(results)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/loading.py", line 350, in _load_masks
    [self._poly2mask(mask, h, w) for mask in gt_masks], h, w)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/loading.py", line 350, in <listcomp>
    [self._poly2mask(mask, h, w) for mask in gt_masks], h, w)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/loading.py", line 308, in _poly2mask
    elif isinstance(mask_ann['counts'], list):
TypeError: 'NoneType' object is not subscriptable

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4737) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-15_13:39:43
  host      : 9677141c6259
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 4737)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================"

From my understanding, it seems that the CascadeRoIHead might require segmentation annotations. I tried using Faster RCNN with InternImage as well but was unsuccessful. I believe that being able to use InternImage for object detection without segmentation could potentially improve performance in certain scenarios.

Could you please provide any guidance or suggestions on how to achieve this? I would really appreciate your help in resolving this issue.

Thank you very much for your time and assistance.

Best regards,
Suppasit Srisaeng

[Error] Inference with onnxruntime

Hi, thanks for sharing this excellent works. I'm trying to use InterImage in onnx format. When i export the model from pytorch to onnx, warnings WARNING: The shape inference of mmdeploy::TRTDCNv3 type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. appear.

Besides, when i inference with the exported onnx by onnxruntime, error onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /data/123/internimage_t_1k_224.onnx failed:Fatal error: mmdeploy:TRTDCNv3(-1) is not a registered function/op occurs . Can you give me some advice for solving this problem?

classification train failed, no such file meta_data/train.txt

When train clissification model as doc: it failed.

Traceback (most recent call last):
File "/home/liuzhe/github/InternImage/classification/main.py", line 661, in
main(config)
File "/home/liuzhe/github/InternImage/classification/main.py", line 170, in main
data_loader_val, data_loader_test, mixup_fn = build_loader(config)
File "/home/liuzhe/github/InternImage/classification/dataset/build.py", line 58, in build_loader
dataset_train, config.MODEL.NUM_CLASSES = build_dataset('train',
File "/home/liuzhe/github/InternImage/classification/dataset/build.py", line 158, in build_dataset
dataset = ImageCephDataset(root,
File "/home/liuzhe/github/InternImage/classification/dataset/cached_image_folder.py", line 310, in init
parser = ParserCephImage(root=root,
File "/home/liuzhe/github/InternImage/classification/dataset/cached_image_folder.py", line 383, in init
with open(osp.join(annotation_root, f'{split}.txt'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'meta_data/train.txt'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 846478) of binary: /home/miniconda3/envs/lz_ray/bin/python

Mismatch of configuration files in Cityscapes Segmentation

There seems to mismatch of configuration in cityscapes's segmentation folder. I am getting resnetback bone from InternImage XL model and There are some missing keys in the checkpoint file. please check it

Interest in adapating InternImage for 3D segmentation

Hi, I am very interested in your work! I am wondering if it'd be possible to adapt this backbone for 3D segmentation tasks? Any advice would be great. Thank you!

Release tensorrt inference code?

Hi ,
Can you release tensorrt inference code for semantic segmentation? If or not support muilt batch_size inference?
Thanks

cuda error when running dcnv3

what is the error meaning for?and how can i do to sovle this problem?Thanks！

No module named 'DCNv3'

ModuleNotFoundError: No module named 'DCNv3'

Encounter error in dcnv3

I try to ensure if model could run and gpu1 is empty，but in forward funcation, error occurs:

ATen/OpMathType.h no such file or dictionary.

Trying to compile DCNv3 with PyTorch 1.9.0. and the compiler give me this error. After checking pytorch code in github, it appears that OpMathType.h were added after PyTorch 1.10. But the Readme.md in detection folder says pytorch >= 1.8.0. Or there is a solution for my error? Im not sure.

it's my mistake

I can't import DCNv3 in the file dcnv3_func.py,can you tell me how to compile the operator?

Coda Availability

Kindly can you tell when will the code be available?

potential error of segmentation config (base model) ?

Hi Authors,

Thanks for your great work! I noticed that the num_layers in the config of base model upernet_internimage_b_512_160k_ade20k.py is inconsistent with the depths in upernet_internimage_b_512_160k_ade20k.py. However, these two parameters are consistent in other variants (e.g., tiny, small, large). Is this a specific configuration of the base model or is it human error？

Thanks

Check point file

I get error could not find checkpoint when run test.py file in detection folder as README.md:
For example, to evaluate the InternImage-T with a single GPU:

python test.py configs/mask_rcnn/mask_rcnn_internimage_t_fpn_1x_coco.py checkpoint_dir/det/mask_rcnn_internimage_t_fpn_1x_coco.pth --eval bbox segm
Error: 
 File "test.py", line 208, in main
    checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
  File "/home/huyen/anaconda3/envs/internimage/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 581, in load_checkpoint
    checkpoint = _load_checkpoint(filename, map_location, logger)
  File "/home/huyen/anaconda3/envs/internimage/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 520, in _load_checkpoint
    return CheckpointLoader.load_checkpoint(filename, map_location, logger)
  File "/home/huyen/anaconda3/envs/internimage/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 285, in load_checkpoint
    return checkpoint_loader(filename, map_location)
  File "/home/huyen/anaconda3/envs/internimage/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 301, in load_from_local
    raise FileNotFoundError(f'{filename} can not be found.')
FileNotFoundError: checkpoint_dir/det/mask_rcnn_internimage_t_fpn_1x_coco.py can not be found.