Code Monkey home page Code Monkey logo

internimage's Introduction

[中文版本]

We currently receive a bunch of issues, our team will check and solve them one by one, please stay tuned.

INTERN-2.5: Multimodal Multitask General Large Model

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

The official implementation of

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.

[Paper] [Blog in Chinese]

Highlights

  • 👍 The strongest open-source visual universal backbone model with up to 3 billion parameters
  • 🏆 Achieved 90.1% Top1 accuracy in ImageNet, the most accurate among open-source models
  • 🏆 Achieved 65.5 mAP on the COCO benchmark dataset for object detection, the only model that exceeded 65.0 mAP

Related Projects

Foundation Models

  • Uni-Perceiver: A Pre-training unified architecture for generic perception for zero-shot and few-shot tasks
  • Uni-Perceiver v2: A generalist model for large-scale vision and vision-language tasks
  • M3I-Pretraining: One-stage pre-training paradigm via maximizing multi-modal mutual information
  • InternVL: The largest open-source vision/vision-language foundation model (14B) to date

Autonomous Driving

  • BEVFormer: A cutting-edge baseline for camera-based 3D detection
  • BEVFormer v2: Adapting modern image backbones to Bird's-Eye-View recognition via perspective supervision

Application in Challenges

News

  • Jan 22, 2024: 🚀 Support DCNv4 in InternImage!
  • Mar 14, 2023: 🚀 "INTERN-2.5" is released!
  • Feb 28, 2023: 🚀 InternImage is accepted to CVPR 2023!
  • Nov 18, 2022: 🚀 InternImage-XL merged into BEVFormer v2 achieves state-of-the-art performance of 63.4 NDS on nuScenes Camera Only.
  • Nov 10, 2022: 🚀 InternImage-H achieves a new record 65.4 mAP on COCO detection test-dev and 62.9 mIoU on ADE20K, outperforming previous models by a large margin.

History

  • Models/APIs for other downstream tasks
  • Support CVPR 2023 Workshop on End-to-End Autonomous Driving, see here
  • Support Segment Anything
  • Support extracting intermediate features, see here
  • Low-cost training with DeepSpeed, see here
  • Compiling-free .whl package of DCNv3 operator, see here
  • InternImage-H(1B)/G(3B)
  • TensorRT inference for classification/detection/segmentation models
  • Classification code of the InternImage series
  • InternImage-T/S/B/L/XL ImageNet-1K pretrained model
  • InternImage-L/XL ImageNet-22K pretrained model
  • InternImage-T/S/B/L/XL detection and instance segmentation model
  • InternImage-T/S/B/L/XL semantic segmentation model

Introduction

"INTERN-2.5" is a powerful multimodal multitask general model jointly released by SenseTime and Shanghai AI Laboratory. It consists of large-scale vision foundation model "InternImage", pre-training method "M3I-Pretraining", generic decoder "Uni-Perceiver" series, and generic encoder for autonomous driving perception "BEVFormer" series.

Applications

🌅 Image Modality Tasks

"INTERN-2.5" achieved an impressive Top-1 accuracy of 90.1% on the ImageNet benchmark dataset using only publicly available data for image classification. Apart from two undisclosed models trained with additional datasets by Google and Microsoft, "INTERN-2.5" is the only open-source model that achieves a Top-1 accuracy of over 90.0%, and it is also the largest model in scale worldwide.

"INTERN-2.5" outperformed all other models worldwide on the COCO object detection benchmark dataset with a remarkable mAP of 65.5, making it the only model that surpasses 65 mAP in the world.

"INTERN-2.5" also demonstrated world's best performance on 16 other important visual benchmark datasets, covering a wide range of tasks such as classification, detection, and segmentation, making it the top-performing model across multiple domains.

Performance

  • Classification
Image Classification Scene Classification Long-Tail Classification
ImageNetPlaces365Places 205iNaturalist 2018
90.161.271.792.3
  • Detection
Conventional Object DetectionLong-Tail Object Detection Autonomous Driving Object DetectionDense Object Detection
COCOVOC 2007VOC 2012OpenImageLVIS minivalLVIS valBDD100KnuScenesCrowdHuman
65.594.097.274.165.863.238.864.897.2
  • Segmentation
Semantic SegmentationStreet SegmentationRGBD Segmentation
ADE20KCOCO Stuff-10KPascal ContextCityScapesNYU Depth V2
62.959.670.386.169.7

🌁 📖 Image and Text Cross-Modal Tasks

Image-Text Retrieval: "INTERN-2.5" can quickly locate and retrieve the most semantically relevant images based on textual content requirements. This capability can be applied to both videos and image collections and can be further combined with object detection boxes to enable a variety of applications, helping users quickly and easily find the required image resources. For example, it can return the relevant images specified by the text in the album.

Image-To-Text: "INTERN-2.5" has a strong understanding capability in various aspects of visual-to-text tasks such as image captioning, visual question answering, visual reasoning, and optical character recognition. For example, in the context of autonomous driving, it can enhance the scene perception and understanding capabilities, assist the vehicle in judging traffic signal status, road signs, and other information, and provide effective perception information support for vehicle decision-making and planning.

Performance

Image CaptioningFine-tuning Image-Text RetrievalZero-shot Image-Text Retrieval
COCO CaptionCOCO CaptionFlickr30kFlickr30k
148.276.494.889.1

Released Models

Open-source Visual Pretrained Models
name pretrain pre-training resolution #param download
InternImage-L ImageNet-22K 384x384 223M ckpt
InternImage-XL ImageNet-22K 384x384 335M ckpt
InternImage-H Joint 427M 384x384 1.08B ckpt
InternImage-G - 384x384 3B ckpt
ImageNet-1K Image Classification
name pretrain resolution acc@1 #param FLOPs download
InternImage-T ImageNet-1K 224x224 83.5 30M 5G ckpt | cfg
InternImage-S ImageNet-1K 224x224 84.2 50M 8G ckpt | cfg
InternImage-B ImageNet-1K 224x224 84.9 97M 16G ckpt | cfg
InternImage-L ImageNet-22K 384x384 87.7 223M 108G ckpt | cfg
InternImage-XL ImageNet-22K 384x384 88.0 335M 163G ckpt | cfg
InternImage-H Joint 427M 640x640 89.6 1.08B 1478G ckpt | cfg
InternImage-G - 512x512 90.1 3B 2700G ckpt | cfg
COCO Object Detection and Instance Segmentation
backbone method schd box mAP mask mAP #param FLOPs download
InternImage-T Mask R-CNN 1x 47.2 42.5 49M 270G ckpt | cfg
InternImage-T Mask R-CNN 3x 49.1 43.7 49M 270G ckpt | cfg
InternImage-S Mask R-CNN 1x 47.8 43.3 69M 340G ckpt | cfg
InternImage-S Mask R-CNN 3x 49.7 44.5 69M 340G ckpt | cfg
InternImage-B Mask R-CNN 1x 48.8 44.0 115M 501G ckpt | cfg
InternImage-B Mask R-CNN 3x 50.3 44.8 115M 501G ckpt | cfg
InternImage-L Cascade 1x 54.9 47.7 277M 1399G ckpt | cfg
InternImage-L Cascade 3x 56.1 48.5 277M 1399G ckpt | cfg
InternImage-XL Cascade 1x 55.3 48.1 387M 1782G ckpt | cfg
InternImage-XL Cascade 3x 56.2 48.8 387M 1782G ckpt | cfg
backbone method box mAP (val/test) #param FLOPs download
InternImage-H DINO (TTA) 65.0 / 65.4 2.18B TODO TODO
InternImage-G DINO (TTA) 65.3 / 65.5 3B TODO TODO
ADE20K Semantic Segmentation
backbone method resolution mIoU (ss/ms) #param FLOPs download
InternImage-T UperNet 512x512 47.9 / 48.1 59M 944G ckpt | cfg
InternImage-S UperNet 512x512 50.1 / 50.9 80M 1017G ckpt | cfg
InternImage-B UperNet 512x512 50.8 / 51.3 128M 1185G ckpt | cfg
InternImage-L UperNet 640x640 53.9 / 54.1 256M 2526G ckpt | cfg
InternImage-XL UperNet 640x640 55.0 / 55.3 368M 3142G ckpt | cfg
InternImage-H UperNet 896x896 59.9 / 60.3 1.12B 3566G ckpt | cfg
InternImage-H Mask2Former 896x896 62.5 / 62.9 1.31B 4635G ckpt | cfg
Main Results of FPS

Export classification model from pytorch to tensorrt

Export detection model from pytorch to tensorrt

Export segmentation model from pytorch to tensorrt

name resolution #param FLOPs batch 1 FPS (TensorRT)
InternImage-T 224x224 30M 5G 156
InternImage-S 224x224 50M 8G 129
InternImage-B 224x224 97M 16G 116
InternImage-L 384x384 223M 108G 56
InternImage-XL 384x384 335M 163G 47

Before using mmdeploy to convert our PyTorch models to TensorRT, please make sure you have the DCNv3 custom operator builded correctly. You can build it with the following command:

export MMDEPLOY_DIR=/the/root/path/of/MMDeploy

# prepare our custom ops, you can find it at InternImage/tensorrt/modulated_deform_conv_v3
cp -r modulated_deform_conv_v3 ${MMDEPLOY_DIR}/csrc/mmdeploy/backend_ops/tensorrt

# build custom ops
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} -DCUDNN_DIR=${CUDNN_DIR} ..
make -j$(nproc) && make install

# install the mmdeploy after building custom ops
cd ${MMDEPLOY_DIR}
pip install -e .

For more details on building custom ops, please refering to this document.

Citations

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{wang2022internimage,
  title={InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions},
  author={Wang, Wenhai and Dai, Jifeng and Chen, Zhe and Huang, Zhenhang and Li, Zhiqi and Zhu, Xizhou and Hu, Xiaowei and Lu, Tong and Lu, Lewei and Li, Hongsheng and others},
  journal={arXiv preprint arXiv:2211.05778},
  year={2022}
}

@inproceedings{zhu2022uni,
  title={Uni-perceiver: Pre-training unified architecture for generic perception for zero-shot and few-shot tasks},
  author={Zhu, Xizhou and Zhu, Jinguo and Li, Hao and Wu, Xiaoshi and Li, Hongsheng and Wang, Xiaohua and Dai, Jifeng},
  booktitle={CVPR},
  pages={16804--16815},
  year={2022}
}

@article{zhu2022uni,
  title={Uni-perceiver-moe: Learning sparse generalist models with conditional moes},
  author={Zhu, Jinguo and Zhu, Xizhou and Wang, Wenhai and Wang, Xiaohua and Li, Hongsheng and Wang, Xiaogang and Dai, Jifeng},
  journal={arXiv preprint arXiv:2206.04674},
  year={2022}
}

@article{li2022uni,
  title={Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks},
  author={Li, Hao and Zhu, Jinguo and Jiang, Xiaohu and Zhu, Xizhou and Li, Hongsheng and Yuan, Chun and Wang, Xiaohua and Qiao, Yu and Wang, Xiaogang and Wang, Wenhai and others},
  journal={arXiv preprint arXiv:2211.09808},
  year={2022}
}

@article{yang2022bevformer,
  title={BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision},
  author={Yang, Chenyu and Chen, Yuntao and Tian, Hao and Tao, Chenxin and Zhu, Xizhou and Zhang, Zhaoxiang and Huang, Gao and Li, Hongyang and Qiao, Yu and Lu, Lewei and others},
  journal={arXiv preprint arXiv:2211.10439},
  year={2022}
}

@article{su2022towards,
  title={Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information},
  author={Su, Weijie and Zhu, Xizhou and Tao, Chenxin and Lu, Lewei and Li, Bin and Huang, Gao and Qiao, Yu and Wang, Xiaogang and Zhou, Jie and Dai, Jifeng},
  journal={arXiv preprint arXiv:2211.09807},
  year={2022}
}

@inproceedings{li2022bevformer,
  title={Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers},
  author={Li, Zhiqi and Wang, Wenhai and Li, Hongyang and Xie, Enze and Sima, Chonghao and Lu, Tong and Qiao, Yu and Dai, Jifeng},
  booktitle={ECCV},
  pages={1--18},
  year={2022},
}

internimage's People

Contributors

charlie-cyw avatar czczup avatar duinodu avatar li-qingyun avatar masahiroogawa avatar tongwwt avatar weiyun1025 avatar whai362 avatar wofmanaf avatar yeshenglong1 avatar zeqiang-lai avatar zhenhanghuang avatar zhiqi-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

internimage's Issues

Running code with cpu instead of cuda

I have error install cuda to run code. Could I run it with cpu instead? I found out folder cpu and cuda in folder src in code but not sure how to switch to run code with cpu? Thanks so much for your support.

TypeError: forward() missing 1 required positional argument: 'im2col_step'

from future import absolute_import
from future import print_function
from future import division

import torch
import torch.nn.functional as F
from torch.autograd import Function
from torch.autograd.function import once_differentiable
from torch.cuda.amp import custom_bwd, custom_fwd
from ops_dcnv3.modules import dcnv3

class DCNv3Function(Function):
@staticmethod
@custom_fwd
def forward(
ctx, input, offset, mask,
kernel_h, kernel_w, stride_h, stride_w,
pad_h, pad_w, dilation_h, dilation_w,
group, group_channels, offset_scale, im2col_step):
ctx.kernel_h = kernel_h
ctx.kernel_w = kernel_w
ctx.stride_h = stride_h
ctx.stride_w = stride_w
ctx.pad_h = pad_h
ctx.pad_w = pad_w
ctx.dilation_h = dilation_h
ctx.dilation_w = dilation_w
ctx.group = group
ctx.group_channels = group_channels
ctx.offset_scale = offset_scale
ctx.im2col_step = im2col_step
output = DCNv3Function.forward(
input, offset, mask, kernel_h, kernel_w, stride_h, stride_w,
pad_h, pad_w, dilation_h, dilation_w,
group, group_channels, offset_scale, im2col_step)
ctx.save_for_backward(input, offset, mask)

    return output

eval

Hello, I use internimage as my model backone. Why does the training loss drop normally, but the map in the verification set is all 0

I don't understand this shared weight in the article, can you explain it in detail, please?

To remedy this problem, we borrow the idea from the separable convolution [56] and detach the original convolution weights wk into depth-wise and point-wise parts, where the depth-wise part is responsible by the original location-aware modulation scalar mk, and the point-wise part is the shared projection weights w among sampling points.

Can you provide some information? Thank you very much!

tensorrt and FPS

Hi all, quite impressed with your great work!
I noticed that you put the table of Main Results of FPS, would you provide the code for time measurement?
Also, would you plan to upload the models for edge application? tensorrt, onnx ...

export.py for segmentation model

Hello! I'm attempting to build an onnx for segmentation inference, and I noticed there's a export.py for the classification folder, but not for segmentation folder...

Is this a possibility in the future? Or, a release of onnx direct?

Thank you!

output image size in segmentation

Hello,

thanks for the awesome repo! I am trying to adopt this code into a segmentation task for our lab. I needed to strip the model part out and insert it into our script, but I encountered a small issue here.

I started by copying the entire segmentation/mmseg_custom/models/backbones/intern_image.py into the notebook, and initialized the network as shown in your training script with config in upernet_internimage_l_640_160k_ade20k.py

However, when I tried to inference on a (1, 3, 640, 640) image using the below code

model = dict(
    backbone=dict(
        _delete_=True,
        type="InternImage",
        core_op="DCNv3",
        channels=160,
        depths=[5, 5, 22, 5],
        groups=[10, 20, 40, 80],
        mlp_ratio=4.0,
        drop_path_rate=0.4,
        norm_layer="LN",
        layer_scale=1.0,
        offset_scale=2.0,
        post_norm=True,
        with_cp=False,
        out_indices=(0, 1, 2, 3),
        # init_cfg=dict(type="Pretrained", checkpoint=pretrained),
    ),
    decode_head=dict(num_classes=150, in_channels=[160, 320, 640, 1280]),
    auxiliary_head=dict(num_classes=150, in_channels=640),
    test_cfg=dict(mode="whole"),
)
model['type'] = 'InternImage'

net = build_segmentor(model).cuda()
data = torch.rand(1, 3, 640, 640).cuda()
out = net(data)
[each.shape for each in out]

i got these as shape.

[torch.Size([1, 160, 160, 160]),
 torch.Size([1, 320, 80, 80]),
 torch.Size([1, 640, 40, 40]),
 torch.Size([1, 1280, 20, 20])]

I understand this is because of the stem at the begining of the network, and the model is designed to output result at each resolution scale.

my questions is how do i get a segmented result with the same x and y shape? e.g. 640 by 640

i couldnt figure out how to build my own decoder_head as speced in upernet_internimage_l_640_160k_ade20k.py using the mmseg package.

many thanks,
Michael

_pickle.UnpicklingError: invalid load key, '\xda'.

hello, thanks for your reply. Last question had solved. But I meet another problem. When I tried to load pre-train model, I met an error: _pickle.UnpicklingError: invalid load key, '\xda'. It seems that the pre-train model file has broken, isn't it?

Traceback (most recent call last):
File "/home/zyp/下载/pytorch/InternImage-master/segmentation/mytest.py", line 3, in
model = torch.load(r'checkpoint_dir/upernet_internimage_t_512_160k_ade20k.pth')
File "/home/zyp/anaconda3/envs/pytorch_cp37/lib/python3.7/site-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/zyp/anaconda3/envs/pytorch_cp37/lib/python3.7/site-packages/torch/serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\xda'.

Consulting AI Infra information for InternImage.

Hi, I can't learn about training infrastructure information in this paper. Can you provide the following information?

  1. the number of GPU
  2. the type of GPU
  3. the type of training framework( pytorch, tensorflow, etc)
  4. cost time for full training.

训练结果与论文展示有区别

尊敬的作者,您好!
我尝试着在学校的服务器上跑了下您的代码,使用的是InternImage-T | Mask R-CNN | 1x, 用了两块A100,在预训练模型上跑了12个eopch,配置文件里的参数没有改过,但最后的结果却和论文实验的结果有比较大的出入。
您的结果box mAP 47.2 | mask mAP :42.5
我这边的输出结果:
Uploading QQ截图20230310150646.png…
请问这是怎么回事呢?

无法运行问题

我在windows平台尝试调试segmentation里的train.py, 但是遇到了如下问题
image
我单步调试后发现当尝试import mmcv时会触发该bug, 但是我已经按照readme安装了该库
我在错误处尝试换gbk编码解释该字符串, 能成功解析
image
我在全盘搜索了该字符串, 发现它位于安装目录Python\Python39\Lib\site-packages\torch\utils下的cpp_extension.py中
image
我不太清楚是否是环境的问题, 以及是否能在windows上修复该问题
谢谢

DCNV3安装遇到的问题

你好,我这里遇到了以下问题
1.直接使用 sh ./make.sh命令会出现以下错误
image
2.我查看了make.sh命令发现只是以下的python命令,所以我直接运行了该命令,但是又出现了以下错误。
image
我的cuda路径确实是这里
image
从报错的地方看,好像路径的前边多了一个:,但是我对相关代码不熟悉,找不出报错的地方。
不知道有没有好的解决方法
谢谢

`CUBLAS_STATUS_INTERNAL_ERROR` on training segmentation

Hi there, firstly thank you very much for your work. Upon trying to use your backbone to train a segmentation model, I run into a CUBLAS_STATUS_INTERNAL_ERROR:

2023-03-10 22:05:40,534 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
2023-03-10 22:05:40,534 - mmseg - INFO - Checkpoints will be saved to mmsegmentation/work_dirs/internimage_base_512 by HardDiskBackend.
2023-03-10 22:05:46,860 - mmseg - INFO - Iter [20/160000]       lr: 7.600e-07, eta: 13:43:03, time: 0.309, data_time: 0.014, memory: 6998, decode.loss_ce: nan, decode.acc_seg: 7.1505, aux.loss_ce: nan, aux.acc_seg: 7.1649, loss: nan
Traceback (most recent call last):
  File "mmsegmentation/train.py", line 162, in <module>
    train_segmentor(model, datasets, cfg, distributed=False, validate=True, 
  File "mmsegmentation/mmseg/apis/train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 138, in run
    iter_runner(iter_loaders[i], **kwargs)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 62, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "mmsegmentation/mmseg/models/segmentors/base.py", line 138, in train_step
    losses = self(**data_batch)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
    return old_func(*args, **kwargs)
  File "mmsegmentation/mmseg/models/segmentors/base.py", line 108, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 139, in forward_train
    x = self.extract_feat(img)
  File "mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 65, in extract_feat
    x = self.backbone(img)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 479, in forward
    x, x_ = level(x, return_wo_downsample=True)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 316, in forward
    x = blk(x)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 252, in forward
    x = _inner_forward(x)
  File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 242, in _inner_forward
    x = x + self.drop_path(self.gamma1 * self.norm1(self.dcn(x)))
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mmsegmentation/ops_dcnv3/modules/dcnv3.py", line 276, in forward
    x = self.output_proj(x)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

I compiled DCNv3 and test.py runs without an error.
CUBLAS_STATUS_INTERNAL_ERROR does not occur with other native mmsegmentation configs/backbones.

Do you know what could be the cause of this issue?
Thank you very much!

CUDA 11.3
pytorch 11.1.0
cudnn8.2.0
torchvision0.12.0

All conda packages:

# Name                    Version                   Build  Channel
addict                    2.4.0                    pypi_0    pypi
blas                      1.0                         mkl  
brotlipy                  0.7.0           py39h27cfd23_1003  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2022.4.26            h06a4308_0  
certifi                   2022.6.15        py39h06a4308_0  
cffi                      1.15.0           py39hd667e15_1  
charset-normalizer        2.0.12                   pypi_0    pypi
click                     7.1.2                    pypi_0    pypi
colorama                  0.4.5                    pypi_0    pypi
cryptography              37.0.1           py39h9ce1e76_0  
cudatoolkit               11.3.1               h2bc3f7f_2  
cycler                    0.11.0                   pypi_0    pypi
dcnv3                     1.0                      pypi_0    pypi
ffmpeg                    4.3                  hf484d3e_0    pytorch
filelock                  3.9.0                    pypi_0    pypi
fonttools                 4.33.3                   pypi_0    pypi
freetype                  2.11.0               h70c0345_0  
giflib                    5.2.1                h7b6447c_0  
gmp                       6.2.1                h295c915_3  
gnutls                    3.6.15               he1e5248_0  
huggingface-hub           0.13.1                   pypi_0    pypi
idna                      3.3                pyhd3eb1b0_0  
importlib-metadata        4.11.4                   pypi_0    pypi
intel-openmp              2021.4.0          h06a4308_3561  
jpeg                      9e                   h7f8727e_0  
kiwisolver                1.4.3                    pypi_0    pypi
lame                      3.100                h7b6447c_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 11.2.0               h1234567_1  
libiconv                  1.16                 h7f8727e_2  
libidn2                   2.3.2                h7f8727e_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtasn1                  4.16.0               h27cfd23_0  
libtiff                   4.2.0                h2818925_1  
libunistring              0.9.10               h27cfd23_0  
libuv                     1.40.0               h7b6447c_0  
libwebp                   1.2.2                h55f646e_0  
libwebp-base              1.2.2                h7f8727e_0  
lz4-c                     1.9.3                h295c915_1  
markdown                  3.3.7                    pypi_0    pypi
matplotlib                3.5.2                    pypi_0    pypi
mkl                       2021.4.0           h06a4308_640  
mkl-service               2.4.0            py39h7f8727e_0  
mkl_fft                   1.3.1            py39hd3c417c_0  
mkl_random                1.2.2            py39h51133e4_0  
mmcls                     0.23.1                   pypi_0    pypi
mmcv-full                 1.5.3                    pypi_0    pypi
mmdet                     2.28.1                   pypi_0    pypi
mmsegmentation            0.25.0                    dev_0    <develop>
model-index               0.1.11                   pypi_0    pypi
ncurses                   6.3                  h7f8727e_2  
nettle                    3.7.3                hbbd107a_1  
numpy                     1.23.0                   pypi_0    pypi
numpy-base                1.22.3           py39hf524024_0  
opencv-python             4.6.0.66                 pypi_0    pypi
openh264                  2.1.1                h4ff587b_0  
openmim                   0.1.6                    pypi_0    pypi
openssl                   1.1.1o               h7f8727e_0  
ordered-set               4.1.0                    pypi_0    pypi
packaging                 21.3                     pypi_0    pypi
pandas                    1.4.3                    pypi_0    pypi
pillow                    9.1.1                    pypi_0    pypi
pip                       21.2.4           py39h06a4308_0  
prettytable               3.3.0                    pypi_0    pypi
pycocotools               2.0.6                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0  
pyopenssl                 22.0.0             pyhd3eb1b0_0  
pyparsing                 3.0.9                    pypi_0    pypi
pysocks                   1.7.1            py39h06a4308_0  
python                    3.9.12               h12debd9_1  
python-dateutil           2.8.2                    pypi_0    pypi
pytorch                   1.11.0          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pytz                      2022.1                   pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
readline                  8.1.2                h7f8727e_1  
requests                  2.28.0                   pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
setuptools                61.2.0           py39h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.38.5               hc218d9a_0  
tabulate                  0.8.10                   pypi_0    pypi
termcolor                 2.2.0                    pypi_0    pypi
terminaltables            3.1.10                   pypi_0    pypi
timm                      0.6.11                   pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
torchaudio                0.11.0               py39_cu113    pytorch
torchvision               0.12.0               py39_cu113    pytorch
tqdm                      4.65.0                   pypi_0    pypi
typing-extensions         4.2.0                    pypi_0    pypi
typing_extensions         4.1.1              pyh06a4308_0  
tzdata                    2022a                hda174b7_0  
urllib3                   1.26.9           py39h06a4308_0  
wcwidth                   0.2.5                    pypi_0    pypi
wheel                     0.37.1             pyhd3eb1b0_0  
xz                        5.2.5                h7f8727e_1  
yacs                      0.1.8                    pypi_0    pypi
yapf                      0.32.0                   pypi_0    pypi
zipp                      3.8.0                    pypi_0    pypi
zlib                      1.2.12               h7f8727e_2  
zstd                      1.5.2                ha4553b6_0  

Occluded object detection

How good the model can be in detecting the occluded objects lying over each other from different categories ?

Is there any simple way to segment a single image?

Thank you for your awesome repo. I am looking for ways to segment a single image. However, it seems codes provide only valuation on the entire ade20k. Is there any way for me to segment a single image. I look into the code but it is quite complicated :D

想复现一些结果但是显存不够

您好,尊敬的作者,我现在有8张3090显卡,但是每个显存只有22G, 发现internimage-H的参数远远大于我一张卡的显存,请问怎么设置参数,才能复现您的结果呢

cannot export trt

export onnx is ok, export trt failed as follow:

[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:773: While parsing node number 78 [TRTDCNv3 -> "onnx::MatMul_732"]:
[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:774: --- Begin node ---
[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:775: input: "mmdeploy::TRTDCNv3_685"
input: "mmdeploy::TRTDCNv3_710"
input: "mmdeploy::TRTDCNv3_731"
output: "onnx::MatMul_732"
name: "TRTDCNv3_78"
op_type: "TRTDCNv3"
attribute {
  name: "dilation_h"
  i: 1
  type: INT
}
attribute {
  name: "dilation_w"
  i: 1
  type: INT
}
attribute {
  name: "group_channels"
  i: 16
  type: INT
}
attribute {
  name: "group"
  i: 4
  type: INT
}
attribute {
  name: "im2col_step"
  i: 256
  type: INT
}
attribute {
  name: "kernel_h"
  i: 3
  type: INT
}
attribute {
  name: "kernel_w"
  i: 3
  type: INT
}
attribute {
  name: "offset_scale"
  f: 1
  type: FLOAT
}
attribute {
  name: "pad_h"
  i: 1
  type: INT
}
attribute {
  name: "pad_w"
  i: 1
  type: INT
}
attribute {
  name: "stride_h"
  i: 1
  type: INT
}
attribute {
  name: "stride_w"
  i: 1
  type: INT
}
domain: "mmdeploy"

[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:776: --- End node ---
[03/15/2023-15:00:34] [TRT] [E] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:4870 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
Traceback (most recent call last):
  File "export.py", line 123, in <module>
    main()
  File "export.py", line 118, in main
    onnx2trt(args)
  File "export.py", line 85, in onnx2trt
    max_workspace_size=2**30,
  File "****/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 177, in from_onnx
    raise RuntimeError(f'Failed to parse onnx, {error_msgs}')
RuntimeError: Failed to parse onnx, In node 78 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

dcn build with python test.py result as follow

foward time cost: 0.04141324281692505
>>> time cost: im2col_step 256; input torch.Size([512, 64, 64, 64]); points 9 
foward time cost: 0.042035584449768064
>>> time cost: im2col_step 512; input torch.Size([512, 64, 64, 64]); points 9 
foward time cost: 0.042629106044769285

mmdeloy 0.13 build from code,. python tools/check_env.py result as follow

2023-03-15 14:57:51,724 - mmdeploy - INFO - 

2023-03-15 14:57:51,725 - mmdeploy - INFO - **********Environmental information**********
2023-03-15 14:57:52,004 - mmdeploy - INFO - sys.platform: linux
2023-03-15 14:57:52,004 - mmdeploy - INFO - Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
2023-03-15 14:57:52,004 - mmdeploy - INFO - CUDA available: True
2023-03-15 14:57:52,004 - mmdeploy - INFO - GPU 0: Tesla T4
2023-03-15 14:57:52,004 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2023-03-15 14:57:52,004 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.3, V11.3.58
2023-03-15 14:57:52,004 - mmdeploy - INFO - GCC: gcc (GCC) 7.5.0
2023-03-15 14:57:52,004 - mmdeploy - INFO - PyTorch: 1.11.0
2023-03-15 14:57:52,004 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

2023-03-15 14:57:52,004 - mmdeploy - INFO - TorchVision: 0.12.0
2023-03-15 14:57:52,004 - mmdeploy - INFO - OpenCV: 4.5.4
2023-03-15 14:57:52,004 - mmdeploy - INFO - MMCV: 1.5.0
2023-03-15 14:57:52,005 - mmdeploy - INFO - MMCV Compiler: GCC 7.3
2023-03-15 14:57:52,005 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2023-03-15 14:57:52,005 - mmdeploy - INFO - MMDeploy: 0.13.0+02d5a09
2023-03-15 14:57:52,005 - mmdeploy - INFO - 

2023-03-15 14:57:52,005 - mmdeploy - INFO - **********Backend information**********
2023-03-15 14:57:52,065 - mmdeploy - INFO - tensorrt:   8.2.4.2
2023-03-15 14:57:52,065 - mmdeploy - INFO - tensorrt custom ops:        Available
2023-03-15 14:57:52,100 - mmdeploy - INFO - ONNXRuntime:        1.14.1
2023-03-15 14:57:52,100 - mmdeploy - INFO - ONNXRuntime-gpu:    None
2023-03-15 14:57:52,100 - mmdeploy - INFO - ONNXRuntime custom ops:     NotAvailable
2023-03-15 14:57:52,100 - mmdeploy - INFO - pplnn:      None
2023-03-15 14:57:52,101 - mmdeploy - INFO - ncnn:       None
2023-03-15 14:57:52,103 - mmdeploy - INFO - snpe:       None
2023-03-15 14:57:52,104 - mmdeploy - INFO - openvino:   2022.3.0
2023-03-15 14:57:52,105 - mmdeploy - INFO - torchscript:        1.11.0
2023-03-15 14:57:52,105 - mmdeploy - INFO - torchscript custom ops:     NotAvailable
2023-03-15 14:57:52,139 - mmdeploy - INFO - rknn-toolkit:       None
2023-03-15 14:57:52,139 - mmdeploy - INFO - rknn2-toolkit:      None
2023-03-15 14:57:52,140 - mmdeploy - INFO - ascend:     None
2023-03-15 14:57:52,140 - mmdeploy - INFO - coreml:     None
2023-03-15 14:57:52,141 - mmdeploy - INFO - tvm:        None
2023-03-15 14:57:52,141 - mmdeploy - INFO - 

2023-03-15 14:57:52,141 - mmdeploy - INFO - **********Codebase information**********
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmdet:      2.20.0
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmseg:      0.30.0
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmcls:      0.23.0
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmocr:      0.4.1
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmedit:     0.16.1
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmdet3d:    None
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmpose:     0.25.1
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmrotate:   None
2023-03-15 14:57:52,143 - mmdeploy - INFO - mmaction:   None

关于onnx模型导出并用C++/ONNXRuntime部署过程中的问题

我比较喜欢ORT部署,因为可以快捷切换CUDA/Tensorrt/DML/OpenVINO作为推理后端。这是我个人部署过程中出现的一些问题,如果有解决方案最好。

internimage_t_1k_224的分类模型为例,一开始按照教程导出onnx,会出现大量警告:
WARNING: The shape inference of mmdeploy::TRTDCNv3 type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
不过最终还是导出成功。

导出语句就是README里的:
python export.py --model_name internimage_t_1k_224 --ckpt_dir /path/to/ckpt/dir --onnx

但是用C++/ORT部署时,出现错误:
Fatal error: mmdeploy:TRTDCNv3(-1) is not a registered function/op
这应该是“DCNv3”属于自定义算子,没有用ONNXRuntime Custom operators的C++格式来实现。

根据 issue #41 ,我把"./classification/configs/internimage_t_1k_224.yaml"里的CORE_OP: 'DCNv3'改成CORE_OP: 'DCNv3_pytorch'(我不确信是否可以这样子做😥),使用纯pytorch的DCNv3实现,但是导出过程中还是出现大量警告并且导出失败:

Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Traceback (most recent call last):
  File ".\export.py", line 122, in <module>
    main()
  File ".\export.py", line 113, in main
    torch2onnx(args, cfg)
  File ".\export.py", line 61, in torch2onnx
    torch.onnx.export(model,
  File "D:\Python\Python38\lib\site-packages\torch\onnx\__init__.py", line 350, in export
    return utils.export(
  File "D:\Python\Python38\lib\site-packages\torch\onnx\utils.py", line 163, in export
    _export(
  File "D:\Python\Python38\lib\site-packages\torch\onnx\utils.py", line 1110, in _export
    ) = graph._export_onnx(  # type: ignore[attr-defined]
RuntimeError: Could not allocate bytes object!

Using InternImage for Object Detection without Segmentation

Hello,

I hope you are doing well. I am working on a project where I would like to use the InternImage dataset solely for object detection without involving segmentation. I attempted to use it with Cascade RCNN, but I encountered an error during the process.

Here is the error message I received:

2023-03-15 13:39:35,378 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
2023-03-15 13:39:35,422 - mmdet - INFO - Checkpoints will be saved to /content/drive/MyDrive/FETP/HealthSit/Phase_02_1/InternImage/detection/work_dirs/mod_cascade_internimage_l_fpn_3x_coco by HardDiskBackend.
Traceback (most recent call last):
  File "/content/drive/MyDrive/FETP/HealthSit/Phase_02_1/InternImage/detection/./train.py", line 247, in <module>
    main()
  File "/content/drive/MyDrive/FETP/HealthSit/Phase_02_1/InternImage/detection/./train.py", line 237, in main
    train_detector(model,
  File "/usr/local/lib/python3.9/dist-packages/mmdet/apis/train.py", line 246, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.9/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.9/dist-packages/torch/_utils.py", line 457, in reraise
    raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/custom.py", line 220, in __getitem__
    data = self.prepare_train_img(idx)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/custom.py", line 243, in prepare_train_img
    return self.pipeline(results)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/loading.py", line 398, in __call__
    results = self._load_masks(results)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/loading.py", line 350, in _load_masks
    [self._poly2mask(mask, h, w) for mask in gt_masks], h, w)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/loading.py", line 350, in <listcomp>
    [self._poly2mask(mask, h, w) for mask in gt_masks], h, w)
  File "/usr/local/lib/python3.9/dist-packages/mmdet/datasets/pipelines/loading.py", line 308, in _poly2mask
    elif isinstance(mask_ann['counts'], list):
TypeError: 'NoneType' object is not subscriptable

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4737) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-15_13:39:43
  host      : 9677141c6259
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 4737)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================"

From my understanding, it seems that the CascadeRoIHead might require segmentation annotations. I tried using Faster RCNN with InternImage as well but was unsuccessful. I believe that being able to use InternImage for object detection without segmentation could potentially improve performance in certain scenarios.

Could you please provide any guidance or suggestions on how to achieve this? I would really appreciate your help in resolving this issue.

Thank you very much for your time and assistance.

Best regards,
Suppasit Srisaeng

[Error] Inference with onnxruntime

Hi, thanks for sharing this excellent works. I'm trying to use InterImage in onnx format. When i export the model from pytorch to onnx, warnings WARNING: The shape inference of mmdeploy::TRTDCNv3 type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. appear.

Besides, when i inference with the exported onnx by onnxruntime, error onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /data/123/internimage_t_1k_224.onnx failed:Fatal error: mmdeploy:TRTDCNv3(-1) is not a registered function/op occurs . Can you give me some advice for solving this problem?

classification train failed, no such file meta_data/train.txt

When train clissification model as doc: it failed.

Traceback (most recent call last):
File "/home/liuzhe/github/InternImage/classification/main.py", line 661, in
main(config)
File "/home/liuzhe/github/InternImage/classification/main.py", line 170, in main
data_loader_val, data_loader_test, mixup_fn = build_loader(config)
File "/home/liuzhe/github/InternImage/classification/dataset/build.py", line 58, in build_loader
dataset_train, config.MODEL.NUM_CLASSES = build_dataset('train',
File "/home/liuzhe/github/InternImage/classification/dataset/build.py", line 158, in build_dataset
dataset = ImageCephDataset(root,
File "/home/liuzhe/github/InternImage/classification/dataset/cached_image_folder.py", line 310, in init
parser = ParserCephImage(root=root,
File "/home/liuzhe/github/InternImage/classification/dataset/cached_image_folder.py", line 383, in init
with open(osp.join(annotation_root, f'{split}.txt'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'meta_data/train.txt'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 846478) of binary: /home/miniconda3/envs/lz_ray/bin/python

Encounter error in dcnv3

image
image
I try to ensure if model could run and gpu1 is empty,but in forward funcation, error occurs:

ATen/OpMathType.h no such file or dictionary.

Trying to compile DCNv3 with PyTorch 1.9.0. and the compiler give me this error. After checking pytorch code in github, it appears that OpMathType.h were added after PyTorch 1.10. But the Readme.md in detection folder says pytorch >= 1.8.0. Or there is a solution for my error? Im not sure.

it's my mistake

I can't import DCNv3 in the file dcnv3_func.py,can you tell me how to compile the operator?

About hardware

For the InternImage-XL and InternImage-H, how many A100s are you using?
And, how long do you take to complete the FULL pre-training on the large-scale joint dataset (e.g. how long for 8 × A100S or 32 × A100s ) ?

Also, how much RAM and hard drive storage is required to handle such a large dataset?

Thanks

Check point file

I get error could not find checkpoint when run test.py file in detection folder as README.md:
For example, to evaluate the InternImage-T with a single GPU:

python test.py configs/mask_rcnn/mask_rcnn_internimage_t_fpn_1x_coco.py checkpoint_dir/det/mask_rcnn_internimage_t_fpn_1x_coco.pth --eval bbox segm
Error: 
 File "test.py", line 208, in main
    checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
  File "/home/huyen/anaconda3/envs/internimage/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 581, in load_checkpoint
    checkpoint = _load_checkpoint(filename, map_location, logger)
  File "/home/huyen/anaconda3/envs/internimage/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 520, in _load_checkpoint
    return CheckpointLoader.load_checkpoint(filename, map_location, logger)
  File "/home/huyen/anaconda3/envs/internimage/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 285, in load_checkpoint
    return checkpoint_loader(filename, map_location)
  File "/home/huyen/anaconda3/envs/internimage/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 301, in load_from_local
    raise FileNotFoundError(f'{filename} can not be found.')
FileNotFoundError: checkpoint_dir/det/mask_rcnn_internimage_t_fpn_1x_coco.py can not be found.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.