swintransformer / swin-transformer-object-detection Goto Github PK

This project forked from open-mmlab/mmdetection

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Home Page: https://arxiv.org/abs/2103.14030

License: Apache License 2.0

Shell 0.07% Python 99.84% Dockerfile 0.09%

mscoco swin-transformer cascade mask-rcnn object-detection reppoints swin

swin-transformer-object-detection's Introduction

Swin Transformer for Object Detection

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on mmdetection.

Updates

05/11/2021 Models for MoBY are released

04/12/2021 Initial commits

Results and Models

Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	1x	43.7	39.8	48M	267G	config	github/baidu	github/baidu
Swin-T	ImageNet-1K	3x	46.0	41.6	48M	267G	config	github/baidu	github/baidu
Swin-S	ImageNet-1K	3x	48.5	43.3	69M	359G	config	github/baidu	github/baidu

Cascade Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	1x	48.1	41.7	86M	745G	config	github/baidu	github/baidu
Swin-T	ImageNet-1K	3x	50.4	43.7	86M	745G	config	github/baidu	github/baidu
Swin-S	ImageNet-1K	3x	51.9	45.0	107M	838G	config	github/baidu	github/baidu
Swin-B	ImageNet-1K	3x	51.9	45.0	145M	982G	config	github/baidu	github/baidu

RepPoints V2

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	3x	50.0	-	45M	283G	config	github	github

Mask RepPoints V2

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	3x	50.4	43.8	47M	292G	config	github	github

Notes:

Pre-trained models can be downloaded from Swin Transformer for ImageNet Classification.
Access code for baidu is swin.

Results of MoBY with Swin Transformer

Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	1x	43.6	39.6	48M	267G	config	github/baidu	github/baidu
Swin-T	ImageNet-1K	3x	46.0	41.7	48M	267G	config	github/baidu	github/baidu

Cascade Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	1x	48.1	41.5	86M	745G	config	github/baidu	github/baidu
Swin-T	ImageNet-1K	3x	50.2	43.5	86M	745G	config	github/baidu	github/baidu

Notes:

The drop path rate needs to be tuned for best practice.
MoBY pre-trained models can be downloaded from MoBY with Swin Transformer.

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

For example, to train a Cascade Mask R-CNN model with a Swin-T backbone and 8 gpus, run:

tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL>

Note: use_checkpoint is used to save GPU memory. Please refer to this page for more details.

Apex (optional):

We use apex for mixed precision training by default. To install apex, run:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

swin-transformer-object-detection's People

Contributors

Stargazers

Watchers

Forkers

93yh kkksqj xjohnxjohn kinraymon cvlife tiger1933 dolphins22365 vladimirrybalkin hxk11111 850748048 xuewengeophysics wuyunhua jiehui87 caojinpei smartremotesensing yankai317 junxuanzhang me714 wwlaoxi biruixing leiqing1 outbreak-hui gmrqiang weiyx16 mrzwd thangnx183 ofsoundof powei-c kentaroy47 827346462 danglive lijiannuist jimmy-zhu impiga karshiev wangxiao5791509 lsr12345 liushhalex stephenyan1231 whz1861 knowledgecluster kaszanas yanxudong23 jjayy anley1 dasschnee maksimrm jack102423 lovpe xiaobai824 k0kumar zlewe tor4z douxiaotian peterding chudur-budur inchanji chenhaohan88 newbieyd qitingshe denred0 alchemistyui yishayahu xieenze wanghuancheng kouyk ngfuong vladandronik jztd6676 erenbalatkan blueheil arkofgalaxy 15103669921 kingbackyang fcogidi lyp317 xw-666 parksajune-88 elephantgit yunongpan hhaa-gif likelyzhao alphonsg aniief syaffa tangqi334 ggupta0945 ameerhamza111 dashesy lzcomeon bbpatil yueyang07 boomboomluo omid-nejati aminrezaei0x443 leandrojooj shaking54 liwentomng prannaykaul leonnerd

swin-transformer-object-detection's Issues

The testing results of the whole dataset is empty

no trick version？

what mAP can no trick version achieve? no autoaug, just 1x training ...

What is the hardware configuration to achieve the reported FPS?

I tried to test the speed of the lightest model Swin-T on my RTX 2070 and it achieve at most 7.5 FPS.

In the paper (Table 2c), a speed of 15.3 FPS is reported. What is the hardware configuration to achieve this? Does it use multiple V100?

Use Swin-Transformer-Object-Detection for Zero-shot object detection

How can we use it for Zero-shot object detection with bounding boxes?

when i use order
python tools/train.py ./configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py
and the mistake is
/opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [32,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
.....
RuntimeError: transform: failed to synchronize: cudaErrorAssert: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered

Custom Dataset Training Runtime error

Hi all,

I am getting this error while running the tools/train.py file on Google Colab.
"RuntimeError: Default process group has not been initialized, please make sure to call init_process_group."

I do not know how to approach this issue. Any help will be appreciated. Thanks!

IndexError list index out of range

File "/content/Swin-Transformer-Object-Detection/mmdet/datasets/coco.py", line 267, in _segm2json
if isinstance(segms[i]['counts'], bytes):
IndexError: list index out of range

Hi,I'm trying to train Swin on my custom dataset.

The dataset runs fine on mmdetection mask rcnns, and the training runs fine on swin for several epochs. But after a while, I get an IndexError during eval. Have anyone got this error?

Changing backbone of the model

Is there anyway to change model's backbone? If so, have anyone tried changing using backbone/ weights from timm yet?

mAP has some difference with the report result

Hi, I run the config of Swin-Tiny 1x with the mask rcnn setting: configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py

And set the use_fp16=False

optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=False,
)

But my mAP has some difference with the report result:

bbox_mAP: 0.4320 (report is 0.437), bbox_mAP_50: 0.6590, bbox_mAP_75: 0.4700, bbox_mAP_s: 0.2790, bbox_mAP_m: 0.4630, bbox_mAP_l: 0.5680, bbox_mAP_copypaste: 0.432 0.659 0.470 0.279 0.463 0.568, 
segm_mAP: 0.3950 (report is 0.398), segm_mAP_50: 0.6270, segm_mAP_75: 0.4230, segm_mAP_s: 0.2310, segm_mAP_m: 0.4280, segm_mAP_l: 0.5460, segm_mAP_copypaste: 0.395 0.627 0.423 0.231 0.428 0.546

I wonder if this is due to the fluctuation of the model or use_fp16=False?
Looking forward for your reply. Thanks!

'DistOptimizerHook is not in the hook registry'

Thank you for your work !

when i run your code, this error happen. It seems like mmdet don't support 'DistOptimizerHook' anymore.

ERROR: Unexpected segmentation fault encountered in worker.

在运行中出现以下错误。
2021-04-21 19:29:24,344 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 872, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/multiprocessing/queues.py", line 104, in get
if not self._poll(timeout):
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/multiprocessing/connection.py", line 414, in _poll
r = wait([self], timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/multiprocessing/connection.py", line 920, in wait
ready = selector.select(timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 109760) is killed by signal: Segmentation fault.

Pretrained Image model in README

In the README, it says the model used ImageNet1K as the pretrained weights; however, in the paper, all detection results are using ImageNet22K.

README:

Paper:

May I know which one is correct? Thanks.

How to modify the number of training pictures

When I used the custom data set for training, I found that there were 2100 pictures in the train data set of coco format, but only 580 pictures were trained during training. How can I modify them

Why apex instead of mmdet's fp16?

Thank you for publishing the great work!

We use apex for mixed precision training by default.
# do not use mmdet version fp16

Why does the code use apex instead of mmdet's fp16?
Are there any difference in AP, training speed, and inference speed?

where is Mask RepPoints V2 config?

SyntaxError: invalid syntax 'dataset_type': 'CocoDataset' lib2to3.pgen2.parse.ParseError: bad input: type=3, value="'dataset_type'", context=('\n', (148, 0))

Traceback (most recent call last):
File "/opt/python3.7/lib/python3.7/site-packages/yapf/yapflib/pytree_utils.py", line 122, in ParseCodeToTree
tree = parser_driver.parse_string(code, debug=False)
File "/opt/python3.7/lib/python3.7/lib2to3/pgen2/driver.py", line 104, in parse_string
return self.parse_tokens(tokens, debug)
File "/opt/python3.7/lib/python3.7/lib2to3/pgen2/driver.py", line 72, in parse_tokens
if p.addtoken(type, value, (prefix, start)):
File "/opt/python3.7/lib/python3.7/lib2to3/pgen2/parse.py", line 159, in addtoken
raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=3, value="'dataset_type'", context=('\n', (148, 0))

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tools/train.py", line 188, in
main()
File "tools/train.py", line 129, in main
cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))
File "/opt/python3.7/lib/python3.7/site-packages/mmcv/utils/config.py", line 458, in dump
f.write(self.pretty_text)
File "/opt/python3.7/lib/python3.7/site-packages/mmcv/utils/config.py", line 413, in pretty_text
text, _ = FormatCode(text, style_config=yapf_style, verify=True)
File "/opt/python3.7/lib/python3.7/site-packages/yapf/yapflib/yapf_api.py", line 147, in FormatCode
tree = pytree_utils.ParseCodeToTree(unformatted_source)
File "/opt/python3.7/lib/python3.7/site-packages/yapf/yapflib/pytree_utils.py", line 128, in ParseCodeToTree
raise e
File "/opt/python3.7/lib/python3.7/site-packages/yapf/yapflib/pytree_utils.py", line 126, in ParseCodeToTree
ast.parse(code)
File "/opt/python3.7/lib/python3.7/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "", line 148
'dataset_type': 'CocoDataset'
^
SyntaxError: invalid syntax

我猜应该是环境安装的问题,第一次使用mmdetection和pytorch,不太熟悉,请过来人指点下,谢谢:

convergence problem for Cascade Mask RCNN

Thanks for sharing your impressive work.

Recently, I train Swin-based Cascade Mask RCNN and Mask RCNN on my own dataset, including1.5k single-class images. The training process of Mask RCNN is relatively stable. However, the loss of Cascade Mask RCNN only converges to around 1.6 and doesn't decrease anymore. Its validation performance, meanwhile, is lower than the Mask RCNN counterpart.

They are trained in the same setting, could you give me some suggestions about fine-tuning the Cascade Mask RCNN?

The testing results of the whole dataset is empty

Hello, I have the following errors in training and testing, There are the following errors when GPU = 2 and single GPU
How to solve this problem(After 36 epochs training, the following errors are also displayed),The following error is an example of an error during testing

(mm290) rth1@lab412-rth1:~/lws/Swin-Transformer-Object-Detection-master$ python tools/test.py configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py work_dirs/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco/epoch_36.pth --eval bbox segm
loading annotations into memory...
Done (t=0.64s)
creating index...
index created!
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 5.1 task/s, elapsed: 978s, ETA: 0s
Evaluating bbox...
Loading and preparing results...
The testing results of the whole dataset is empty.

about swin pre-trained model

2021-06-01 01:32:06,714 - mmdet - INFO - load model from: checkpoints/cascade_mask_rcnn_swin_base_patch4_window7.pth
Traceback (most recent call last):
File "tools/train.py", line 187, in
main()
File "tools/train.py", line 161, in main
test_cfg=cfg.get('test_cfg'))
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/builder.py", line 77, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/builder.py", line 34, in build
return build_from_cfg(cfg, registry, default_args)
File "/usr/local/lib/python3.6/dist-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
return obj_cls(**args)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/detectors/cascade_rcnn.py", line 25, in init
pretrained=pretrained)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/detectors/two_stage.py", line 48, in init
self.init_weights(pretrained=pretrained)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/detectors/two_stage.py", line 68, in init_weights
self.backbone.init_weights(pretrained=pretrained)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/backbones/swin_transformer.py", line 595, in init_weights
load_checkpoint(self, pretrained, strict=False, logger=logger)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmcv_custom/checkpoint.py", line 340, in load_checkpoint
table_current = model.state_dict()[table_key]
KeyError: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'

pytorch==1.7.0
is any pre-trained weight error?
can you provide a swin_transformer pre-trained model's.thanks.

Do you have plan to release code&model of HTC++ and when?

Thanks for your great work!

detect without mask

how can i train the detection model without mask annotations of COCO?

Checkpoint Save error in google colab

During training when checkpoint was about to save after some epoch not after every epoch the following error is coming up

Traceback (most recent call last):
  File "tools/train.py", line 187, in <module>
    main()
  File "tools/train.py", line 183, in main
    meta=meta)
  File "/content/drive/My Drive/Swin-Transformer-Object-Detection/mmdet/apis/train.py", line 185, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/checkpoint.py", line 69, in after_train_epoch
    self._save_checkpoint(runner)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/dist_utils.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/checkpoint.py", line 75, in _save_checkpoint
    self.out_dir, save_optimizer=self.save_optimizer, **self.args)
  File "/content/drive/My Drive/Swin-Transformer-Object-Detection/mmcv_custom/runner/epoch_based_runner.py", line 60, in save_checkpoint
    save_checkpoint(self.model, filepath, optimizer=optimizer, meta=meta)
  File "/content/drive/My Drive/Swin-Transformer-Object-Detection/mmcv_custom/runner/checkpoint.py", line 58, in save_checkpoint
    checkpoint['amp'] = apex.amp.state_dict()
AttributeError: module 'apex' has no attribute 'amp'

How to extract roi features of Swin

I'm trying to extract feature for each bbox via roi_extractor in mmdet.
Currently, I run

from mmdet.models import build_roi_extractor
config_file = '../configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py'
config = mmcv.Config.fromfile(config_file )
model = build_roi_extractor(config.model)

and get KeyError: 'CascadeRCNN is not in the roi_extractor registry'

Note that I've already run python setup.py develop.

Is there any other way or something i missed? Thanks

Error during training in last iteration of first epoch

While training in the last iteration or for last batch of images it throws error

Traceback (most recent call last):
  File "tools/train.py", line 187, in <module>
    main()
  File "tools/train.py", line 183, in main
    meta=meta)
  File "/content/drive/My Drive/Swin-Transformer-Object-Detection/mmdet/apis/train.py", line 185, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
    for _ in range(epochs):
TypeError: 'NoneType' object cannot be interpreted as an integer

Is it not getting any images of that batch or something else?

KeyError: "CascadeRCNN: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'"

Thanks for your work!

I occured the error when I run the code.

I run the command:
python tools/train.py configs/swin/mydef_cascade_mask_rcnn_swin_small_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py --cfg-options model.pretrained=./models/cascade_mask_rcnn_swin_small_patch4_window7.pth

How to solve it?

Model usage in an FPN-architecture

Hi,

Just looking for some advice on how to use the current implementation of Swin Transformer in an FPN-based detector model. Does the current implementation work out of the box, or some modifications to the model must be done?

Thanks.

What is the error? There is only one class in CLASSES.I don't know if that's the reason,What should I do about it.Thanks!

Traceback (most recent call last): File "tools/train.py", line 187, in main() File "tools/train.py", line 183, in main meta=meta) File "/home/server/文档/DETR2/Swin/mmdet/apis/train.py", line 185, in train_detector runner.run(data_loaders, cfg.workflow) File "/home/server/.local/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run epoch_runner(data_loaders[i], **kwargs) File "/home/server/.local/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 45, in train self.call_hook('before_train_epoch') File "/home/server/.local/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/home/server/文档/DETR2/Swin/mmdet/datasets/utils.py", line 150, in before_train_epoch self._check_head(runner) File "/home/server/文档/DETR2/Swin/mmdet/datasets/utils.py", line 137, in _check_head (f'The num_classes ({module.num_classes}) in 'AssertionError: The num_classes (1) in Shared2FCBBoxHead of MMDataParallel does not matches the length of CLASSES 5) in CocoDataset

coco.py bug for only one class

HI,
So am training with just one class, in coco.py i set

CLASSES = ('person')

but later on when checking the consistency of class number

assert module.num_classes == len(dataset.CLASSES)

len(dataset.CLASSES) = len('person') = 6

but if the class are more than one, its fine.. coz dataset.CLASSES is now a tuple

FLOPs and FPS measurement.

Dear authors,
Thanks for your great work.

How do you measure the FLOPs of the detection model, please?
I use the benchmark.py provided by mmdet to measure FPS on one V100-32G, but get a much lower FPS. It is the same as the classification model, appx 687 imgs/s for Swim-T with batch size 64. I also tried on one V100-16G with CUDA10.2. It is faster but still lower than the paper-reported one (737 vs 755). Could you please provide any suggestions?
Thanks a lot in advance!

AssertionError: Default process group is not initialized

Hello,
I want to use swin for an instance segmentation problem.
I installed mmdetection, nvidea apex and cloned git repo.
I configured everything as usual in mmdetction .
But when i runned the training apî i got this error
It's the first time i see this error
Does anyone have an idea ??
Thanks

KeyError: 'SwinTransformer is not in the models registry'

I am using this model for custom training on my dataset in Colab. As I started training , got the error-

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "/content/drive/MyDrive/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 27, in __init__
init_cfg=init_cfg)
File "/content/drive/MyDrive/mmdetection/mmdet/models/detectors/two_stage.py", line 26, in __init__
self.backbone = build_backbone(backbone)
File "/content/drive/MyDrive/mmdetection/mmdet/models/builder.py", line 19, in build_backbone
return BACKBONES.build(cfg)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'SwinTransformer is not in the models registry'

During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
 File "tools/train.py", line 187, in <module>
 main()
  File "tools/train.py", line 161, in main
test_cfg=cfg.get('test_cfg'))
File "/content/drive/MyDrive/mmdetection/mmdet/models/builder.py", line 58, in build_detector
cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
 raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: "CascadeRCNN: 'SwinTransformer is not in the models registry'"

Here is my config file -

2021-05-13 12:30:00,473 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.10 (default, May  3 2021, 02:48:31) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla P100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu101
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.3, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.9.1+cu101
OpenCV: 4.1.2
MMCV: 1.3.3
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.0
MMDetection: 2.12.0+41bb93f
------------------------------------------------------------

2021-05-13 12:30:04,393 - mmdet - INFO - Distributed training: False
2021-05-13 12:30:08,323 - mmdet - INFO - Config:
model = dict(
    type='CascadeRCNN',
    pretrained='./moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.pth',
    backbone=dict(
        type='SwinTransformer',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4.0,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.2,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    neck=dict(
        type='FPN',
        in_channels=[96, 192, 384, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='CascadeRoIHead',
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0))
        ],
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=80,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_across_levels=False,
            nms_pre=2000,
            nms_post=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_across_levels=False,
            nms_pre=1000,
            nms_post=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
dataset_type = 'COCODataset'
data_root = '/content/drive/MyDrive/layout/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='AutoAugment',
        policies=[[{
            'type':
            'Resize',
            'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                          (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                          (736, 1333), (768, 1333), (800, 1333)],
            'multiscale_mode':
            'value',
            'keep_ratio':
            True
        }],
                  [{
                      'type': 'Resize',
                      'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
                      'multiscale_mode': 'value',
                      'keep_ratio': True
                  }, {
                      'type': 'RandomCrop',
                      'crop_type': 'absolute_range',
                      'crop_size': (384, 600),
                      'allow_negative_crop': True
                  }, {
                      'type':
                      'Resize',
                      'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                    (576, 1333), (608, 1333), (640, 1333),
                                    (672, 1333), (704, 1333), (736, 1333),
                                    (768, 1333), (800, 1333)],
                      'multiscale_mode':
                      'value',
                      'override':
                      True,
                      'keep_ratio':
                      True
                  }]]),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='COCODataset',
        ann_file='/content/drive/MyDrive/layout/train.json',
        img_prefix='/content/drive/MyDrive/layout/train/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='AutoAugment',
                policies=[[{
                    'type':
                    'Resize',
                    'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                  (576, 1333), (608, 1333), (640, 1333),
                                  (672, 1333), (704, 1333), (736, 1333),
                                  (768, 1333), (800, 1333)],
                    'multiscale_mode':
                    'value',
                    'keep_ratio':
                    True
                }],
                          [{
                              'type': 'Resize',
                              'img_scale': [(400, 1333), (500, 1333),
                                            (600, 1333)],
                              'multiscale_mode': 'value',
                              'keep_ratio': True
                          }, {
                              'type': 'RandomCrop',
                              'crop_type': 'absolute_range',
                              'crop_size': (384, 600),
                              'allow_negative_crop': True
                          }, {
                              'type':
                              'Resize',
                              'img_scale': [(480, 1333), (512, 1333),
                                            (544, 1333), (576, 1333),
                                            (608, 1333), (640, 1333),
                                            (672, 1333), (704, 1333),
                                            (736, 1333), (768, 1333),
                                            (800, 1333)],
                              'multiscale_mode':
                              'value',
                              'override':
                              True,
                              'keep_ratio':
                              True
                          }]]),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ]),
    val=dict(
        type='COCODataset',
        ann_file='/content/drive/MyDrive/layout/valid.json',
        img_prefix='/content/drive/MyDrive/layout/valid/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='COCODataset',
        ann_file='/content/drive/MyDrive/layout/valid.json',
        img_prefix='/content/drive/MyDrive/layout/valid/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(metric=['bbox', 'segm'])
optimizer = dict(
    type='AdamW',
    lr=0.0001,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = dict(
    grad_clip=None,
    type='DistOptimizerHook',
    update_interval=1,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
checkpoint_config = dict(interval=5)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = '/content/drive/MyDrive/Swin-Transformer-Object-Detection/moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.pth'
resume_from = None
workflow = [('train', 1)]
fp16 = None
work_dir = './work_dirs/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco'
gpu_ids = range(0, 1)

How do you use a pre-trained model to inference a picture?Does this code support CPU?

Comment out the configuration related to the mask

Hello, when I was training my dataset, I found that I need to comment out the settings related to mask in the following files. What should I comment out of these files？？？？？？？
1.cascade_mask_rcnn_swin_small_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x.py(your used config)
2.config/base/model/cascade_mask_rcnn_swin_fpn.py

using the config and pretrained files for detection

Thanks for your amazing work!
I'm sorry if this is a very basic question but I just got started with mmdet. How can I train the Swin-Transformer given in the configs and models for only the detection task (that is, without using the mask part)?
Again, sorry for the basic question and thanks for sharing this code with all of us!

missing keys in source state_dict

patch_embed.proj.weight, patch_embed.proj.bias, patch_embed.norm.weight, patch_embed.norm.bias, layers.0.blocks.0.norm1.weight, layers.0.blocks.0.norm1.bias, layers.0.blocks.0.attn.relative_position_bias_table, layers.0.blocks.0.attn.relative_position_index, layers.0.blocks.0.attn.qkv.weight, layers.0.blocks.0.attn.qkv.bias, layers.0.blocks.0.attn.proj.weight, layers.0.blocks.0.attn.proj.bias, layers.0.blocks.0.norm2.weight, layers.0.blocks.0.norm2.bias, layers.0.blocks.0.mlp.fc1.weight, layers.0.blocks.0.mlp.fc1.bias, layers.0.blocks.0.mlp.fc2.weight, layers.0.blocks.0.mlp.fc2.bias, layers.0.blocks.1.norm1.weight, layers.0.blocks.1.norm1.bias, layers.0.blocks.1.attn.relative_position_bias_table, layers.0.blocks.1.attn.relative_position_index, layers.0.blocks.1.attn.qkv.weight, layers.0.blocks.1.attn.qkv.bias, layers.0.blocks.1.attn.proj.weight, layers.0.blocks.1.attn.proj.bias, layers.0.blocks.1.norm2.weight, layers.0.blocks.1.norm2.bias, layers.0.blocks.1.mlp.fc1.weight, layers.0.blocks.1.mlp.fc1.bias, layers.0.blocks.1.mlp.fc2.weight, layers.0.blocks.1.mlp.fc2.bias, layers.0.downsample.reduction.weight, layers.0.downsample.norm.weight, layers.0.downsample.norm.bias, layers.1.blocks.0.norm1.weight, layers.1.blocks.0.norm1.bias, layers.1.blocks.0.attn.relative_position_bias_table, layers.1.blocks.0.attn.relative_position_index, layers.1.blocks.0.attn.qkv.weight, layers.1.blocks.0.attn.qkv.bias, layers.1.blocks.0.attn.proj.weight, layers.1.blocks.0.attn.proj.bias, layers.1.blocks.0.norm2.weight, layers.1.blocks.0.norm2.bias, layers.1.blocks.0.mlp.fc1.weight, layers.1.blocks.0.mlp.fc1.bias, layers.1.blocks.0.mlp.fc2.weight, layers.1.blocks.0.mlp.fc2.bias, layers.1.blocks.1.norm1.weight, layers.1.blocks.1.norm1.bias, layers.1.blocks.1.attn.relative_position_bias_table, layers.1.blocks.1.attn.relative_position_index, layers.1.blocks.1.attn.qkv.weight, layers.1.blocks.1.attn.qkv.bias, layers.1.blocks.1.attn.proj.weight, layers.1.blocks.1.attn.proj.bias, layers.1.blocks.1.norm2.weight, layers.1.blocks.1.norm2.bias, layers.1.blocks.1.mlp.fc1.weight, layers.1.blocks.1.mlp.fc1.bias, layers.1.blocks.1.mlp.fc2.weight, layers.1.blocks.1.mlp.fc2.bias, layers.1.downsample.reduction.weight, layers.1.downsample.norm.weight, layers.1.downsample.norm.bias, layers.2.blocks.0.norm1.weight, layers.2.blocks.0.norm1.bias, layers.2.blocks.0.attn.relative_position_bias_table, layers.2.blocks.0.attn.relative_position_index, layers.2.blocks.0.attn.qkv.weight, layers.2.blocks.0.attn.qkv.bias, layers.2.blocks.0.attn.proj.weight, layers.2.blocks.0.attn.proj.bias, layers.2.blocks.0.norm2.weight, layers.2.blocks.0.norm2.bias, layers.2.blocks.0.mlp.fc1.weight, layers.2.blocks.0.mlp.fc1.bias, layers.2.blocks.0.mlp.fc2.weight, layers.2.blocks.0.mlp.fc2.bias, layers.2.blocks.1.norm1.weight, layers.2.blocks.1.norm1.bias, layers.2.blocks.1.attn.relative_position_bias_table, layers.2.blocks.1.attn.relative_position_index, layers.2.blocks.1.attn.qkv.weight, layers.2.blocks.1.attn.qkv.bias, layers.2.blocks.1.attn.proj.weight, layers.2.blocks.1.attn.proj.bias, layers.2.blocks.1.norm2.weight, layers.2.blocks.1.norm2.bias, layers.2.blocks.1.mlp.fc1.weight, layers.2.blocks.1.mlp.fc1.bias, layers.2.blocks.1.mlp.fc2.weight, layers.2.blocks.1.mlp.fc2.bias, layers.2.blocks.2.norm1.weight, layers.2.blocks.2.norm1.bias, layers.2.blocks.2.attn.relative_position_bias_table, layers.2.blocks.2.attn.relative_position_index, layers.2.blocks.2.attn.qkv.weight, layers.2.blocks.2.attn.qkv.bias, layers.2.blocks.2.attn.proj.weight, layers.2.blocks.2.attn.proj.bias, layers.2.blocks.2.norm2.weight, layers.2.blocks.2.norm2.bias, layers.2.blocks.2.mlp.fc1.weight, layers.2.blocks.2.mlp.fc1.bias, layers.2.blocks.2.mlp.fc2.weight, layers.2.blocks.2.mlp.fc2.bias, layers.2.blocks.3.norm1.weight, layers.2.blocks.3.norm1.bias, layers.2.blocks.3.attn.relative_position_bias_table, layers.2.blocks.3.attn.relative_position_index, layers.2.blocks.3.attn.qkv.weight, layers.2.blocks.3.attn.qkv.bias, layers.2.blocks.3.attn.proj.weight, layers.2.blocks.3.attn.proj.bias, layers.2.blocks.3.norm2.weight, layers.2.blocks.3.norm2.bias, layers.2.blocks.3.mlp.fc1.weight, layers.2.blocks.3.mlp.fc1.bias, layers.2.blocks.3.mlp.fc2.weight, layers.2.blocks.3.mlp.fc2.bias, layers.2.blocks.4.norm1.weight, layers.2.blocks.4.norm1.bias, layers.2.blocks.4.attn.relative_position_bias_table, layers.2.blocks.4.attn.relative_position_index, layers.2.blocks.4.attn.qkv.weight, layers.2.blocks.4.attn.qkv.bias, layers.2.blocks.4.attn.proj.weight, layers.2.blocks.4.attn.proj.bias, layers.2.blocks.4.norm2.weight, layers.2.blocks.4.norm2.bias, layers.2.blocks.4.mlp.fc1.weight, layers.2.blocks.4.mlp.fc1.bias, layers.2.blocks.4.mlp.fc2.weight, layers.2.blocks.4.mlp.fc2.bias, layers.2.blocks.5.norm1.weight, layers.2.blocks.5.norm1.bias, layers.2.blocks.5.attn.relative_position_bias_table, layers.2.blocks.5.attn.relative_position_index, layers.2.blocks.5.attn.qkv.weight, layers.2.blocks.5.attn.qkv.bias, layers.2.blocks.5.attn.proj.weight, layers.2.blocks.5.attn.proj.bias, layers.2.blocks.5.norm2.weight, layers.2.blocks.5.norm2.bias, layers.2.blocks.5.mlp.fc1.weight, layers.2.blocks.5.mlp.fc1.bias, layers.2.blocks.5.mlp.fc2.weight, layers.2.blocks.5.mlp.fc2.bias, layers.2.downsample.reduction.weight, layers.2.downsample.norm.weight, layers.2.downsample.norm.bias, layers.3.blocks.0.norm1.weight, layers.3.blocks.0.norm1.bias, layers.3.blocks.0.attn.relative_position_bias_table, layers.3.blocks.0.attn.relative_position_index, layers.3.blocks.0.attn.qkv.weight, layers.3.blocks.0.attn.qkv.bias, layers.3.blocks.0.attn.proj.weight, layers.3.blocks.0.attn.proj.bias, layers.3.blocks.0.norm2.weight, layers.3.blocks.0.norm2.bias, layers.3.blocks.0.mlp.fc1.weight, layers.3.blocks.0.mlp.fc1.bias, layers.3.blocks.0.mlp.fc2.weight, layers.3.blocks.0.mlp.fc2.bias, layers.3.blocks.1.norm1.weight, layers.3.blocks.1.norm1.bias, layers.3.blocks.1.attn.relative_position_bias_table, layers.3.blocks.1.attn.relative_position_index, layers.3.blocks.1.attn.qkv.weight, layers.3.blocks.1.attn.qkv.bias, layers.3.blocks.1.attn.proj.weight, layers.3.blocks.1.attn.proj.bias, layers.3.blocks.1.norm2.weight, layers.3.blocks.1.norm2.bias, layers.3.blocks.1.mlp.fc1.weight, layers.3.blocks.1.mlp.fc1.bias, layers.3.blocks.1.mlp.fc2.weight, layers.3.blocks.1.mlp.fc2.bias, norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

model = dict(
pretrained='/storage/wjb/AlignPS/pretrained/swin_tiny_patch4_window7_224.pth',
backbone=dict(
type='SwinTransformer',
embed_dim=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=7,
mlp_ratio=4.,
qkv_bias=True,
qk_scale=None,
drop_rate=0.,
attn_drop_rate=0.,
drop_path_rate=0.2,
ape=False,
patch_norm=True,
out_indices=(0, 1, 2, 3),
use_checkpoint=False),

How to train a custom dataset?

Hi, as I checked the train.py, I cant see the train-dir argument to specify my training directory. Is it possible to train on a custom dataset and would the dataset follow the COCO format

How long does it take to train cascade Mask RCNN with Swin-B for you on COCO instances for you?

I have only 2x1080Ti, and the training seems to need almost 23 DAYS.

在构建检测器的时候显示：KeyError: "CascadeRCNN: 'SwinTransformer is not in the backbone registry'"

执行这行时：
model = build_detector(
cfg.model,
train_cfg=cfg.get('train_cfg'),
test_cfg=cfg.get('test_cfg'))

出现如下错误：
KeyError: "CascadeRCNN: 'SwinTransformer is not in the backbone registry'"

python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/apis/train.py", line 75, in <listcomp> seed=cfg.seed) for ds in dataset TypeError: object of type 'int' has no len()**

(swin-detection) wangxiao@wx:~/Documents/Swin-Transformer-Object-Detection$ python tools/train.py ./configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py
apex is not installed
apex is not installed
apex is not installed
apex is not installed
2021-05-16 20:27:11,937 - mmdet - INFO - Environment info:

sys.platform: linux
Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
CUDA available: True
GPU 0,1: GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda-10.0
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 10.0
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.5.2
MMCV: 1.3.4
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 10.0
MMDetection: 2.11.0+02baa30

2021-05-16 20:27:14,467 - mmdet - INFO - Distributed training: False
2021-05-16 20:27:16,458 - mmdet - INFO - Config:
model = dict(
type='MaskRCNN',
pretrained=None,
backbone=dict(
type='SwinTransformer',
embed_dim=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=7,
mlp_ratio=4.0,
qkv_bias=True,
qk_scale=None,
drop_rate=0.0,
attn_drop_rate=0.0,
drop_path_rate=0.2,
ape=False,
patch_norm=True,
out_indices=(0, 1, 2, 3),
use_checkpoint=True),
neck=dict(
type='FPN',
in_channels=[96, 192, 384, 768],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
mask_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
mask_head=dict(
type='FCNMaskHead',
num_convs=4,
in_channels=256,
conv_out_channels=256,
num_classes=80,
loss_mask=dict(
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100,
mask_thr_binary=0.5)))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='AutoAugment',
policies=[[{
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
'multiscale_mode':
'value',
'keep_ratio':
True
}],
[{
'type': 'Resize',
'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
'multiscale_mode': 'value',
'keep_ratio': True
}, {
'type': 'RandomCrop',
'crop_type': 'absolute_range',
'crop_size': (384, 600),
'allow_negative_crop': True
}, {
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
'multiscale_mode':
'value',
'override':
True,
'keep_ratio':
True
}]]),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_train2017.json',
img_prefix='data/coco/train2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='AutoAugment',
policies=[[{
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
'multiscale_mode':
'value',
'keep_ratio':
True
}],
[{
'type': 'Resize',
'img_scale': [(400, 1333), (500, 1333),
(600, 1333)],
'multiscale_mode': 'value',
'keep_ratio': True
}, {
'type': 'RandomCrop',
'crop_type': 'absolute_range',
'crop_size': (384, 600),
'allow_negative_crop': True
}, {
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333),
(544, 1333), (576, 1333),
(608, 1333), (640, 1333),
(672, 1333), (704, 1333),
(736, 1333), (768, 1333),
(800, 1333)],
'multiscale_mode':
'value',
'override':
True,
'keep_ratio':
True
}]]),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]),
val=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_val2017.json',
img_prefix='data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]),
test=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_val2017.json',
img_prefix='data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]))
evaluation = dict(metric=['bbox', 'segm'])
optimizer = dict(
type='AdamW',
lr=0.0001,
betas=(0.9, 0.999),
weight_decay=0.05,
paramwise_cfg=dict(
custom_keys=dict(
absolute_pos_embed=dict(decay_mult=0.0),
relative_position_bias_table=dict(decay_mult=0.0),
norm=dict(decay_mult=0.0))))
optimizer_config = dict(
grad_clip=None,
type='DistOptimizerHook',
update_interval=1,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[27, 33])
runner = dict(type='EpochBasedRunner', max_epochs=36)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
fp16 = None
work_dir = './work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco'
gpu_ids = 0

loading annotations into memory...
Done (t=9.13s)
creating index...
index created!
Traceback (most recent call last):
File "tools/train.py", line 159, in
main()
File "tools/train.py", line 155, in main
meta=meta)
File "/home/wangxiao/anaconda3/envs/swin-detection/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/apis/train.py", line 75, in train_detector
seed=cfg.seed) for ds in dataset
File "/home/wangxiao/anaconda3/envs/swin-detection/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/apis/train.py", line 75, in
seed=cfg.seed) for ds in dataset
TypeError: object of type 'int' has no len()

Hi, I met the error as mentioned above. How can I solve it? Thanks.

Mask RCNN pretrained error

I have already tried tiny and small version, none of them can run.
When train without pretrained weight, it is works fine, but time is a problem.
The following is the configs :

model = dict(
    type='MaskRCNN',
    pretrained='mask_rcnn_swin_tiny_patch4_window7.pth',
    backbone=dict(
        type='SwinTransformer',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4.0,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.2,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    neck=dict(
        type='FPN',
        in_channels=[96, 192, 384, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=43,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=43,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            mask_size=28,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
customed = [
    'speedlimit-20', 'speedlimit-30', 'speedlimit-50', 'speedlimit-60',
    'speedlimit-70', 'speedlimit-80', 'restrict-end-80', 'speedlimit-100',
    'speedlimit-120', 'no-overtake', 'no-overtake-truck',
    'priority-next-intersect', 'priority-road', 'giveaway', 'stop',
    'no-traffic-bothways', 'no-truck', 'no-entry', 'danger', 'bend-left',
    'bend-right', 'bend', 'uneven-road', 'slippery-road', 'road-narrow',
    'construction', 'traffic-signal', 'pedestrian-crossing', 'school-crossing',
    'cycle-crossing', 'snow', 'animals', 'restriction-ends', 'go-right',
    'go-left', 'go-straight', 'go-right-straight', 'go-left-straight',
    'keep-right', 'keep-left', 'roundabout', 'restrict-ends-overtaking',
    'restrict-ends-overtaking-truck'
]
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='AutoAugment',
        policies=[[{
            'type':
            'Resize',
            'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                          (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                          (736, 1333), (768, 1333), (800, 1333)],
            'multiscale_mode':
            'value',
            'keep_ratio':
            True
        }],
                  [{
                      'type': 'Resize',
                      'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
                      'multiscale_mode': 'value',
                      'keep_ratio': True
                  }, {
                      'type': 'RandomCrop',
                      'crop_type': 'absolute_range',
                      'crop_size': (384, 600),
                      'allow_negative_crop': True
                  }, {
                      'type':
                      'Resize',
                      'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                    (576, 1333), (608, 1333), (640, 1333),
                                    (672, 1333), (704, 1333), (736, 1333),
                                    (768, 1333), (800, 1333)],
                      'multiscale_mode':
                      'value',
                      'override':
                      True,
                      'keep_ratio':
                      True
                  }]]),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_train2017.json',
        img_prefix='data/coco/train2017/',
        classes=[
            'speedlimit-20', 'speedlimit-30', 'speedlimit-50', 'speedlimit-60',
            'speedlimit-70', 'speedlimit-80', 'restrict-end-80',
            'speedlimit-100', 'speedlimit-120', 'no-overtake',
            'no-overtake-truck', 'priority-next-intersect', 'priority-road',
            'giveaway', 'stop', 'no-traffic-bothways', 'no-truck', 'no-entry',
            'danger', 'bend-left', 'bend-right', 'bend', 'uneven-road',
            'slippery-road', 'road-narrow', 'construction', 'traffic-signal',
            'pedestrian-crossing', 'school-crossing', 'cycle-crossing', 'snow',
            'animals', 'restriction-ends', 'go-right', 'go-left',
            'go-straight', 'go-right-straight', 'go-left-straight',
            'keep-right', 'keep-left', 'roundabout',
            'restrict-ends-overtaking', 'restrict-ends-overtaking-truck'
        ],
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='AutoAugment',
                policies=[[{
                    'type':
                    'Resize',
                    'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                  (576, 1333), (608, 1333), (640, 1333),
                                  (672, 1333), (704, 1333), (736, 1333),
                                  (768, 1333), (800, 1333)],
                    'multiscale_mode':
                    'value',
                    'keep_ratio':
                    True
                }],
                          [{
                              'type': 'Resize',
                              'img_scale': [(400, 1333), (500, 1333),
                                            (600, 1333)],
                              'multiscale_mode': 'value',
                              'keep_ratio': True
                          }, {
                              'type': 'RandomCrop',
                              'crop_type': 'absolute_range',
                              'crop_size': (384, 600),
                              'allow_negative_crop': True
                          }, {
                              'type':
                              'Resize',
                              'img_scale': [(480, 1333), (512, 1333),
                                            (544, 1333), (576, 1333),
                                            (608, 1333), (640, 1333),
                                            (672, 1333), (704, 1333),
                                            (736, 1333), (768, 1333),
                                            (800, 1333)],
                              'multiscale_mode':
                              'value',
                              'override':
                              True,
                              'keep_ratio':
                              True
                          }]]),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ]),
    val=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        classes=[
            'speedlimit-20', 'speedlimit-30', 'speedlimit-50', 'speedlimit-60',
            'speedlimit-70', 'speedlimit-80', 'restrict-end-80',
            'speedlimit-100', 'speedlimit-120', 'no-overtake',
            'no-overtake-truck', 'priority-next-intersect', 'priority-road',
            'giveaway', 'stop', 'no-traffic-bothways', 'no-truck', 'no-entry',
            'danger', 'bend-left', 'bend-right', 'bend', 'uneven-road',
            'slippery-road', 'road-narrow', 'construction', 'traffic-signal',
            'pedestrian-crossing', 'school-crossing', 'cycle-crossing', 'snow',
            'animals', 'restriction-ends', 'go-right', 'go-left',
            'go-straight', 'go-right-straight', 'go-left-straight',
            'keep-right', 'keep-left', 'roundabout',
            'restrict-ends-overtaking', 'restrict-ends-overtaking-truck'
        ],
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        classes=[
            'speedlimit-20', 'speedlimit-30', 'speedlimit-50', 'speedlimit-60',
            'speedlimit-70', 'speedlimit-80', 'restrict-end-80',
            'speedlimit-100', 'speedlimit-120', 'no-overtake',
            'no-overtake-truck', 'priority-next-intersect', 'priority-road',
            'giveaway', 'stop', 'no-traffic-bothways', 'no-truck', 'no-entry',
            'danger', 'bend-left', 'bend-right', 'bend', 'uneven-road',
            'slippery-road', 'road-narrow', 'construction', 'traffic-signal',
            'pedestrian-crossing', 'school-crossing', 'cycle-crossing', 'snow',
            'animals', 'restriction-ends', 'go-right', 'go-left',
            'go-straight', 'go-right-straight', 'go-left-straight',
            'keep-right', 'keep-left', 'roundabout',
            'restrict-ends-overtaking', 'restrict-ends-overtaking-truck'
        ],
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(
    type='AdamW',
    lr=0.0001,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = dict(
    grad_clip=None,
    type='DistOptimizerHook',
    update_interval=1,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
fp16 = None
work_dir = './work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco'
gpu_ids = range(0, 1)

KeyError: "Mask RCNN: 'backbone.layers.0.blocks.0.attn.relative position bias table'"

How to use COCO pre-trained weights?

#4 reply

ImageNet pre-trained weights in this page are available.
but, I wonder why using COCO pre-trained weights is not available.

because I'm trying to train custom detection dataset on Swin Transformer-Object Detection.
It was success that training custom dataset using ImageNet pre-trained weights but,
it wasn't success that training using COCO pre-trained weights.

Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.926047721682624e-98

I trained my custom data from cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py.
Because my data only has bbox mark, without mask mask, so I change all with_mask to False in the config files and remove mask_head FCNMaskHead at cascade_mask_rcnn_swin_fpn.py to make train execute successfully. when I trained for about 10 miniutes, the loss begin to nan.
2021-04-29 15:20:07,805 - mmdet - INFO - Epoch [1][1550/9370] lr: 1.000e-04, eta: 2 days, 13:28:02, time: 0.569, data_time: 0.005, memory: 6632, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 3.3333, s0.loss_bbox: nan, s1.loss_cls: nan, s1.acc: 3.3333, s1.loss_bbox: nan, s2.loss_cls: nan, s2.acc: 3.3333, s2.loss_bbox: nan, loss: nan
Is there something wrong when my trained?

model = dict( type='CascadeRCNN', pretrained='swin_tiny_patch4_window7_224.pth', backbone=dict( type='SwinTransformer', embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), use_checkpoint=False), neck=dict( type='FPN', in_channels=[96, 192, 384, 768], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='CascadeRoIHead', num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=[ dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=26, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='BN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=26, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='BN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=26, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='BN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)) ], mask_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32])), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=[ dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False) ]), test_cfg=dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5))) dataset_type = 'CocoDataset' data_root = 'data/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(800, 400), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_train2017.json', img_prefix='data/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(800, 400), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(800, 400), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(metric=['bbox', 'segm']) optimizer = dict( type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05, paramwise_cfg=dict( custom_keys=dict( absolute_pos_embed=dict(decay_mult=0.0), relative_position_bias_table=dict(decay_mult=0.0), norm=dict(decay_mult=0.0)))) optimizer_config = dict( grad_clip=None, type='DistOptimizerHook', update_interval=1, coalesce=True, bucket_size_mb=-1, use_fp16=True) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[27, 33]) runner = dict(type='EpochBasedRunnerAmp', max_epochs=36) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] fp16 = None work_dir = 'work_dirs' gpu_ids = [1]
Thanks very much!

The model and loaded state dict do not match exactly

Hi,

I try to load the Swin backbone using the configuration mask_rcnn_swin_small_patch4_window7_mstrain_480-800_adamw_3x_coco.py and weights swin_small_patch4_window7_224.pth and I get the following warnings:

mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: norm.weight, norm.bias, head.weight, head.bias, layers.0.blocks.1.attn_mask, layers.1.blocks.1.attn_mask, layers.2.blocks.1.attn_mask, layers.2.blocks.3.attn_mask, layers.2.blocks.5.attn_mask, layers.2.blocks.7.attn_mask, layers.2.blocks.9.attn_mask, layers.2.blocks.11.attn_mask, layers.2.blocks.13.attn_mask, layers.2.blocks.15.attn_mask, layers.2.blocks.17.attn_mask

missing keys in source state_dict: norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

I understand the problem with the norm layers. In the original Swin backbone, there is only one normalization layer at the output while there is a norm layer at every output stage in the Swin backbone used for detection.
However, I am not sure why there is the problem with attn_masks.

Can you please help me?

Thank you very much in advance.

TypeError: CascadeRCNN: SwinTransformer: init() got an unexpected keyword argument 'depth'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cai/anaconda3/envs/swin/lib/python3.7/site-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "/home/cai/project/Swin/mmdet/models/detectors/cascade_rcnn.py", line 25, in init
pretrained=pretrained)
File "/home/cai/project/Swin/mmdet/models/detectors/two_stage.py", line 26, in init
self.backbone = build_backbone(backbone)
File "/home/cai/project/Swin/mmdet/models/builder.py", line 39, in build_backbone
return build(cfg, BACKBONES)
File "/home/cai/project/Swin/mmdet/models/builder.py", line 34, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/cai/anaconda3/envs/swin/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: SwinTransformer: init() got an unexpected keyword argument 'depth'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "tools/train.py", line 187, in
main()
File "tools/train.py", line 161, in main
test_cfg=cfg.get('test_cfg'))
File "/home/cai/project/Swin/mmdet/models/builder.py", line 77, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/home/cai/project/Swin/mmdet/models/builder.py", line 34, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/cai/anaconda3/envs/swin/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: CascadeRCNN: SwinTransformer: init() got an unexpected keyword argument 'depth'

When run the demo provided in config, it run well.
However ,When I employ the swintransformer as the backbone in cascade rcnn, error occurs.
Can give me some advice? Thanks.

Can you report the performance of SWIN on Retinanet under 1* and 3* schedule?

How to achieve only object detection without instance segmentation

in the object detection process, i don't have the shaded part. Please help!

How to finetune the pretrained model on other COCO-format dataset?

Hi,

First of all, thank you for your excellent work.

Now I want to finetune the pretrained model (trained on COCO dataset) using another COCO-format dataset, but here is an error "KeyError: "MaskRCNN: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'"", how could i solve it?

Look to your reply and wish you have a nice day!

i have problem when using my own dataset with cascade_mask_rcnn_swin_tiny for detection

thank you for your great work, but there is a problem when i train with my own data
i can't figure out why this happen
i've changed all num_class to my num
this is traceback:

Traceback (most recent call last):
File "./tools/train.py", line 187, in
main()
File "./tools/train.py", line 183, in main
meta=meta)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/apis/train.py", line 185, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
cv2.error: Caught error in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/custom.py", line 193, in getitem
data = self.prepare_train_img(idx)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/custom.py", line 216, in prepare_train_img
return self.pipeline(results)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/pipelines/compose.py", line 40, in call
data = t(data)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/pipelines/transforms.py", line 534, in call
self._pad_masks(results)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/pipelines/transforms.py", line 515, in _pad_masks
results[key] = results[key].pad(pad_shape, pad_val=self.pad_val)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/core/mask/structures.py", line 305, in pad
for mask in self.masks
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/core/mask/structures.py", line 305, in
for mask in self.masks
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/image/geometric.py", line 450, in impad
value=pad_val)
cv2.error: OpenCV(4.5.1) /tmp/pip-req-build-7m_g9lbm/opencv/modules/core/src/copy.cpp:1445: error: (-215:Assertion failed) top >= 0 && bottom >= 0 && left >= 0 && right >= 0 && _src.dims() <= 2 in function 'copyMakeBorder'

Onnx Conversion: Integer division of tensors using div or / is no longer supported

Hello,

I'm trying to export my trained model into ONNX. For this, I'm using pytorch2onnx.py.
When running I got into this issue:
RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

The error is pointing to this line:
File "/Swin-Transformer-Object-Detection/mmdet/models/backbones/swin_transformer.py", line 374, in forward Hp = int(np.ceil(H / self.window_size)) * self.window_size

Should I convert the H and W to LongTensor like suggested in this pytorch thread ?

Thanks for helping

Can I add more layers in the last stage of the backbone when I run a detection task?

Congradulations for the outstanding work. I tried to add more backbone layers by modifying the config file's backbone.depths from 2 to 6 when I run a detection task, but it doesn't help. Do I need to train a new pretrained model?

swintransformer / swin-transformer-object-detection Goto Github PK

swin-transformer-object-detection's Introduction

Swin Transformer for Object Detection

Updates

Results and Models

Mask R-CNN

Cascade Mask R-CNN

RepPoints V2

Mask RepPoints V2

Results of MoBY with Swin Transformer

Mask R-CNN

Cascade Mask R-CNN

Usage

Installation

Inference

Training

Apex (optional):

Citing Swin Transformer

Other Links

swin-transformer-object-detection's People

Contributors

Stargazers

Watchers

Forkers

swin-transformer-object-detection's Issues

TorchVision: 0.5.0 OpenCV: 4.5.2 MMCV: 1.3.4 MMCV Compiler: GCC 7.5 MMCV CUDA Compiler: 10.0 MMDetection: 2.11.0+02baa30

Recommend Projects

Recommend Topics

Recommend Org

TorchVision: 0.5.0
OpenCV: 4.5.2
MMCV: 1.3.4
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 10.0
MMDetection: 2.11.0+02baa30