Code Monkey home page Code Monkey logo

swintransformer / swin-transformer-object-detection Goto Github PK

View Code? Open in Web Editor NEW

This project forked from open-mmlab/mmdetection

1.8K 22.0 374.0 20.39 MB

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Home Page: https://arxiv.org/abs/2103.14030

License: Apache License 2.0

Shell 0.07% Python 99.84% Dockerfile 0.09%
mscoco swin-transformer cascade mask-rcnn object-detection reppoints swin

swin-transformer-object-detection's Introduction

Swin Transformer for Object Detection

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on mmdetection.

Updates

05/11/2021 Models for MoBY are released

04/12/2021 Initial commits

Results and Models

Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 1x 43.7 39.8 48M 267G config github/baidu github/baidu
Swin-T ImageNet-1K 3x 46.0 41.6 48M 267G config github/baidu github/baidu
Swin-S ImageNet-1K 3x 48.5 43.3 69M 359G config github/baidu github/baidu

Cascade Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 1x 48.1 41.7 86M 745G config github/baidu github/baidu
Swin-T ImageNet-1K 3x 50.4 43.7 86M 745G config github/baidu github/baidu
Swin-S ImageNet-1K 3x 51.9 45.0 107M 838G config github/baidu github/baidu
Swin-B ImageNet-1K 3x 51.9 45.0 145M 982G config github/baidu github/baidu

RepPoints V2

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 3x 50.0 - 45M 283G config github github

Mask RepPoints V2

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 3x 50.4 43.8 47M 292G config github github

Notes:

Results of MoBY with Swin Transformer

Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 1x 43.6 39.6 48M 267G config github/baidu github/baidu
Swin-T ImageNet-1K 3x 46.0 41.7 48M 267G config github/baidu github/baidu

Cascade Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 1x 48.1 41.5 86M 745G config github/baidu github/baidu
Swin-T ImageNet-1K 3x 50.2 43.5 86M 745G config github/baidu github/baidu

Notes:

  • The drop path rate needs to be tuned for best practice.
  • MoBY pre-trained models can be downloaded from MoBY with Swin Transformer.

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments] 

For example, to train a Cascade Mask R-CNN model with a Swin-T backbone and 8 gpus, run:

tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL> 

Note: use_checkpoint is used to save GPU memory. Please refer to this page for more details.

Apex (optional):

We use apex for mixed precision training by default. To install apex, run:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Other Links

Image Classification: See Swin Transformer for Image Classification.

Semantic Segmentation: See Swin Transformer for Semantic Segmentation.

Self-Supervised Learning: See MoBY with Swin Transformer.

Video Recognition, See Video Swin Transformer.

swin-transformer-object-detection's People

Contributors

aemikachow avatar chrisfsj2051 avatar daavoo avatar erotemic avatar gt9505 avatar hellock avatar hhaandroid avatar impiga avatar innerlee avatar johnson-wang avatar jshilong avatar korabelnikov avatar liaopeiyuan avatar lindahua avatar melikovk avatar mxbonn avatar myownskyw7 avatar oceanpang avatar runningleon avatar ryanxli avatar shinya7y avatar thangvubk avatar tianyuandu avatar v-qjqs avatar wangruohui avatar wswday avatar xvjiarui avatar yhcao6 avatar yuzhj avatar zwwwayne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swin-transformer-object-detection's Issues

index out of bounds

when i use order
python tools/train.py ./configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py
and the mistake is
/opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [32,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
.....
RuntimeError: transform: failed to synchronize: cudaErrorAssert: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered

Custom Dataset Training Runtime error

Hi all,

I am getting this error while running the tools/train.py file on Google Colab.
"RuntimeError: Default process group has not been initialized, please make sure to call init_process_group."

I do not know how to approach this issue. Any help will be appreciated. Thanks!

IndexError list index out of range

File "/content/Swin-Transformer-Object-Detection/mmdet/datasets/coco.py", line 267, in _segm2json
if isinstance(segms[i]['counts'], bytes):
IndexError: list index out of range

Hi,I'm trying to train Swin on my custom dataset.

The dataset runs fine on mmdetection mask rcnns, and the training runs fine on swin for several epochs. But after a while, I get an IndexError during eval. Have anyone got this error?

mAP has some difference with the report result

Hi, I run the config of Swin-Tiny 1x with the mask rcnn setting: configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py

And set the use_fp16=False

optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=False,
)

But my mAP has some difference with the report result:

bbox_mAP: 0.4320 (report is 0.437), bbox_mAP_50: 0.6590, bbox_mAP_75: 0.4700, bbox_mAP_s: 0.2790, bbox_mAP_m: 0.4630, bbox_mAP_l: 0.5680, bbox_mAP_copypaste: 0.432 0.659 0.470 0.279 0.463 0.568, 
segm_mAP: 0.3950 (report is 0.398), segm_mAP_50: 0.6270, segm_mAP_75: 0.4230, segm_mAP_s: 0.2310, segm_mAP_m: 0.4280, segm_mAP_l: 0.5460, segm_mAP_copypaste: 0.395 0.627 0.423 0.231 0.428 0.546

I wonder if this is due to the fluctuation of the model or use_fp16=False?
Looking forward for your reply. Thanks!

ERROR: Unexpected segmentation fault encountered in worker.

ๅœจ่ฟ่กŒไธญๅ‡บ็Žฐไปฅไธ‹้”™่ฏฏใ€‚
2021-04-21 19:29:24,344 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 872, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/multiprocessing/queues.py", line 104, in get
if not self._poll(timeout):
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/multiprocessing/connection.py", line 414, in _poll
r = wait([self], timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/multiprocessing/connection.py", line 920, in wait
ready = selector.select(timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
File "/home/ASRRW_1/songtengfei/software/anaconda3_swin/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 109760) is killed by signal: Segmentation fault.

Pretrained Image model in README

In the README, it says the model used ImageNet1K as the pretrained weights; however, in the paper, all detection results are using ImageNet22K.

README:
image

Paper:
image

May I know which one is correct? Thanks.

How to modify the number of training pictures

When I used the custom data set for training, I found that there were 2100 pictures in the train data set of coco format, but only 580 pictures were trained during training. How can I modify them

Why apex instead of mmdet's fp16?

Thank you for publishing the great work!

We use apex for mixed precision training by default.
# do not use mmdet version fp16

Why does the code use apex instead of mmdet's fp16?
Are there any difference in AP, training speed, and inference speed?

SyntaxError: invalid syntax 'dataset_type': 'CocoDataset' lib2to3.pgen2.parse.ParseError: bad input: type=3, value="'dataset_type'", context=('\n', (148, 0))

Traceback (most recent call last):
File "/opt/python3.7/lib/python3.7/site-packages/yapf/yapflib/pytree_utils.py", line 122, in ParseCodeToTree
tree = parser_driver.parse_string(code, debug=False)
File "/opt/python3.7/lib/python3.7/lib2to3/pgen2/driver.py", line 104, in parse_string
return self.parse_tokens(tokens, debug)
File "/opt/python3.7/lib/python3.7/lib2to3/pgen2/driver.py", line 72, in parse_tokens
if p.addtoken(type, value, (prefix, start)):
File "/opt/python3.7/lib/python3.7/lib2to3/pgen2/parse.py", line 159, in addtoken
raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=3, value="'dataset_type'", context=('\n', (148, 0))

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tools/train.py", line 188, in
main()
File "tools/train.py", line 129, in main
cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))
File "/opt/python3.7/lib/python3.7/site-packages/mmcv/utils/config.py", line 458, in dump
f.write(self.pretty_text)
File "/opt/python3.7/lib/python3.7/site-packages/mmcv/utils/config.py", line 413, in pretty_text
text, _ = FormatCode(text, style_config=yapf_style, verify=True)
File "/opt/python3.7/lib/python3.7/site-packages/yapf/yapflib/yapf_api.py", line 147, in FormatCode
tree = pytree_utils.ParseCodeToTree(unformatted_source)
File "/opt/python3.7/lib/python3.7/site-packages/yapf/yapflib/pytree_utils.py", line 128, in ParseCodeToTree
raise e
File "/opt/python3.7/lib/python3.7/site-packages/yapf/yapflib/pytree_utils.py", line 126, in ParseCodeToTree
ast.parse(code)
File "/opt/python3.7/lib/python3.7/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "", line 148
'dataset_type': 'CocoDataset'
^
SyntaxError: invalid syntax

ๆˆ‘็Œœๅบ”่ฏฅๆ˜ฏ็Žฏๅขƒๅฎ‰่ฃ…็š„้—ฎ้ข˜,็ฌฌไธ€ๆฌกไฝฟ็”จmmdetectionๅ’Œpytorch,ไธๅคช็†Ÿๆ‚‰,่ฏท่ฟ‡ๆฅไบบๆŒ‡็‚นไธ‹,่ฐข่ฐข:

convergence problem for Cascade Mask RCNN

Thanks for sharing your impressive work.

Recently, I train Swin-based Cascade Mask RCNN and Mask RCNN on my own dataset, including1.5k single-class images. The training process of Mask RCNN is relatively stable. However, the loss of Cascade Mask RCNN only converges to around 1.6 and doesn't decrease anymore. Its validation performance, meanwhile, is lower than the Mask RCNN counterpart.

They are trained in the same setting, could you give me some suggestions about fine-tuning the Cascade Mask RCNN?

The testing results of the whole dataset is empty

Hello, I have the following errors in training and testing, There are the following errors when GPU = 2 and single GPU
How to solve this problem(After 36 epochs training, the following errors are also displayed),The following error is an example of an error during testing

(mm290) rth1@lab412-rth1:~/lws/Swin-Transformer-Object-Detection-master$ python tools/test.py configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py work_dirs/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco/epoch_36.pth --eval bbox segm
loading annotations into memory...
Done (t=0.64s)
creating index...
index created!
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 5.1 task/s, elapsed: 978s, ETA: 0s
Evaluating bbox...
Loading and preparing results...
The testing results of the whole dataset is empty.

about swin pre-trained model

2021-06-01 01:32:06,714 - mmdet - INFO - load model from: checkpoints/cascade_mask_rcnn_swin_base_patch4_window7.pth
Traceback (most recent call last):
File "tools/train.py", line 187, in
main()
File "tools/train.py", line 161, in main
test_cfg=cfg.get('test_cfg'))
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/builder.py", line 77, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/builder.py", line 34, in build
return build_from_cfg(cfg, registry, default_args)
File "/usr/local/lib/python3.6/dist-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
return obj_cls(**args)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/detectors/cascade_rcnn.py", line 25, in init
pretrained=pretrained)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/detectors/two_stage.py", line 48, in init
self.init_weights(pretrained=pretrained)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/detectors/two_stage.py", line 68, in init_weights
self.backbone.init_weights(pretrained=pretrained)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmdet/models/backbones/swin_transformer.py", line 595, in init_weights
load_checkpoint(self, pretrained, strict=False, logger=logger)
File "/workspace/mnt/storage/kanghaidong/cloud_project/Swin-Transformer-Object-Detection/mmcv_custom/checkpoint.py", line 340, in load_checkpoint
table_current = model.state_dict()[table_key]
KeyError: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'

pytorch==1.7.0
is any pre-trained weight error?
can you provide a swin_transformer pre-trained model's.thanks.

Checkpoint Save error in google colab

During training when checkpoint was about to save after some epoch not after every epoch the following error is coming up

Traceback (most recent call last):
  File "tools/train.py", line 187, in <module>
    main()
  File "tools/train.py", line 183, in main
    meta=meta)
  File "/content/drive/My Drive/Swin-Transformer-Object-Detection/mmdet/apis/train.py", line 185, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/checkpoint.py", line 69, in after_train_epoch
    self._save_checkpoint(runner)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/dist_utils.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/checkpoint.py", line 75, in _save_checkpoint
    self.out_dir, save_optimizer=self.save_optimizer, **self.args)
  File "/content/drive/My Drive/Swin-Transformer-Object-Detection/mmcv_custom/runner/epoch_based_runner.py", line 60, in save_checkpoint
    save_checkpoint(self.model, filepath, optimizer=optimizer, meta=meta)
  File "/content/drive/My Drive/Swin-Transformer-Object-Detection/mmcv_custom/runner/checkpoint.py", line 58, in save_checkpoint
    checkpoint['amp'] = apex.amp.state_dict()
AttributeError: module 'apex' has no attribute 'amp'

How to extract roi features of Swin

I'm trying to extract feature for each bbox via roi_extractor in mmdet.
Currently, I run

from mmdet.models import build_roi_extractor
config_file = '../configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py'
config = mmcv.Config.fromfile(config_file )
model = build_roi_extractor(config.model)

and get KeyError: 'CascadeRCNN is not in the roi_extractor registry'

Note that I've already run python setup.py develop.

Is there any other way or something i missed? Thanks

Error during training in last iteration of first epoch

While training in the last iteration or for last batch of images it throws error

Traceback (most recent call last):
  File "tools/train.py", line 187, in <module>
    main()
  File "tools/train.py", line 183, in main
    meta=meta)
  File "/content/drive/My Drive/Swin-Transformer-Object-Detection/mmdet/apis/train.py", line 185, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
    for _ in range(epochs):
TypeError: 'NoneType' object cannot be interpreted as an integer

Is it not getting any images of that batch or something else?

Model usage in an FPN-architecture

Hi,

Just looking for some advice on how to use the current implementation of Swin Transformer in an FPN-based detector model. Does the current implementation work out of the box, or some modifications to the model must be done?

Thanks.

What is the error? There is only one class in CLASSES.I don't know if that's the reason,What should I do about it.Thanks!

Traceback (most recent call last):ย  File "tools/train.py", line 187, in ย  ย  main()ย  File "tools/train.py", line 183, in mainย  ย  meta=meta)ย  File "/home/server/ๆ–‡ๆกฃ/DETR2/Swin/mmdet/apis/train.py", line 185, in train_detectorย  ย  runner.run(data_loaders, cfg.workflow)ย  File "/home/server/.local/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in runย  ย  epoch_runner(data_loaders[i], **kwargs)ย  File "/home/server/.local/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 45, in trainย  ย  self.call_hook('before_train_epoch')ย  File "/home/server/.local/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 307, in call_hookย  ย  getattr(hook, fn_name)(self)ย  File "/home/server/ๆ–‡ๆกฃ/DETR2/Swin/mmdet/datasets/utils.py", line 150, in before_train_epochย  ย  self._check_head(runner)ย  File "/home/server/ๆ–‡ๆกฃ/DETR2/Swin/mmdet/datasets/utils.py", line 137, in _check_headย  ย  (f'The num_classes ({module.num_classes}) in 'AssertionError: The num_classes (1) in Shared2FCBBoxHead of MMDataParallel does not matches the length of CLASSES 5) in CocoDataset

coco.py bug for only one class

HI,
So am training with just one class, in coco.py i set

CLASSES = ('person')

but later on when checking the consistency of class number

assert module.num_classes == len(dataset.CLASSES)

len(dataset.CLASSES) = len('person') = 6

but if the class are more than one, its fine.. coz dataset.CLASSES is now a tuple

FLOPs and FPS measurement.

Dear authors,
Thanks for your great work.

  1. How do you measure the FLOPs of the detection model, please?
  2. I use the benchmark.py provided by mmdet to measure FPS on one V100-32G, but get a much lower FPS. It is the same as the classification model, appx 687 imgs/s for Swim-T with batch size 64. I also tried on one V100-16G with CUDA10.2. It is faster but still lower than the paper-reported one (737 vs 755). Could you please provide any suggestions?
    Thanks a lot in advance!

AssertionError: Default process group is not initialized

Hello,
I want to use swin for an instance segmentation problem.
I installed mmdetection, nvidea apex and cloned git repo.
I configured everything as usual in mmdetction .
But when i runned the training apรฎ i got this error
It's the first time i see this error
Does anyone have an idea ??
Thanks

image

KeyError: 'SwinTransformer is not in the models registry'

I am using this model for custom training on my dataset in Colab. As I started training , got the error-

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "/content/drive/MyDrive/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 27, in __init__
init_cfg=init_cfg)
File "/content/drive/MyDrive/mmdetection/mmdet/models/detectors/two_stage.py", line 26, in __init__
self.backbone = build_backbone(backbone)
File "/content/drive/MyDrive/mmdetection/mmdet/models/builder.py", line 19, in build_backbone
return BACKBONES.build(cfg)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'SwinTransformer is not in the models registry'

During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
 File "tools/train.py", line 187, in <module>
 main()
  File "tools/train.py", line 161, in main
test_cfg=cfg.get('test_cfg'))
File "/content/drive/MyDrive/mmdetection/mmdet/models/builder.py", line 58, in build_detector
cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
 raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: "CascadeRCNN: 'SwinTransformer is not in the models registry'"

Here is my config file -

2021-05-13 12:30:00,473 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.10 (default, May  3 2021, 02:48:31) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla P100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu101
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.3, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.9.1+cu101
OpenCV: 4.1.2
MMCV: 1.3.3
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.0
MMDetection: 2.12.0+41bb93f
------------------------------------------------------------

2021-05-13 12:30:04,393 - mmdet - INFO - Distributed training: False
2021-05-13 12:30:08,323 - mmdet - INFO - Config:
model = dict(
    type='CascadeRCNN',
    pretrained='./moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.pth',
    backbone=dict(
        type='SwinTransformer',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4.0,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.2,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    neck=dict(
        type='FPN',
        in_channels=[96, 192, 384, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='CascadeRoIHead',
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0))
        ],
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=80,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_across_levels=False,
            nms_pre=2000,
            nms_post=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_across_levels=False,
            nms_pre=1000,
            nms_post=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
dataset_type = 'COCODataset'
data_root = '/content/drive/MyDrive/layout/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='AutoAugment',
        policies=[[{
            'type':
            'Resize',
            'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                          (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                          (736, 1333), (768, 1333), (800, 1333)],
            'multiscale_mode':
            'value',
            'keep_ratio':
            True
        }],
                  [{
                      'type': 'Resize',
                      'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
                      'multiscale_mode': 'value',
                      'keep_ratio': True
                  }, {
                      'type': 'RandomCrop',
                      'crop_type': 'absolute_range',
                      'crop_size': (384, 600),
                      'allow_negative_crop': True
                  }, {
                      'type':
                      'Resize',
                      'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                    (576, 1333), (608, 1333), (640, 1333),
                                    (672, 1333), (704, 1333), (736, 1333),
                                    (768, 1333), (800, 1333)],
                      'multiscale_mode':
                      'value',
                      'override':
                      True,
                      'keep_ratio':
                      True
                  }]]),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='COCODataset',
        ann_file='/content/drive/MyDrive/layout/train.json',
        img_prefix='/content/drive/MyDrive/layout/train/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='AutoAugment',
                policies=[[{
                    'type':
                    'Resize',
                    'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                  (576, 1333), (608, 1333), (640, 1333),
                                  (672, 1333), (704, 1333), (736, 1333),
                                  (768, 1333), (800, 1333)],
                    'multiscale_mode':
                    'value',
                    'keep_ratio':
                    True
                }],
                          [{
                              'type': 'Resize',
                              'img_scale': [(400, 1333), (500, 1333),
                                            (600, 1333)],
                              'multiscale_mode': 'value',
                              'keep_ratio': True
                          }, {
                              'type': 'RandomCrop',
                              'crop_type': 'absolute_range',
                              'crop_size': (384, 600),
                              'allow_negative_crop': True
                          }, {
                              'type':
                              'Resize',
                              'img_scale': [(480, 1333), (512, 1333),
                                            (544, 1333), (576, 1333),
                                            (608, 1333), (640, 1333),
                                            (672, 1333), (704, 1333),
                                            (736, 1333), (768, 1333),
                                            (800, 1333)],
                              'multiscale_mode':
                              'value',
                              'override':
                              True,
                              'keep_ratio':
                              True
                          }]]),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ]),
    val=dict(
        type='COCODataset',
        ann_file='/content/drive/MyDrive/layout/valid.json',
        img_prefix='/content/drive/MyDrive/layout/valid/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='COCODataset',
        ann_file='/content/drive/MyDrive/layout/valid.json',
        img_prefix='/content/drive/MyDrive/layout/valid/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(metric=['bbox', 'segm'])
optimizer = dict(
    type='AdamW',
    lr=0.0001,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = dict(
    grad_clip=None,
    type='DistOptimizerHook',
    update_interval=1,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
checkpoint_config = dict(interval=5)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = '/content/drive/MyDrive/Swin-Transformer-Object-Detection/moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.pth'
resume_from = None
workflow = [('train', 1)]
fp16 = None
work_dir = './work_dirs/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco'
gpu_ids = range(0, 1)

Comment out the configuration related to the mask

Hello, when I was training my dataset, I found that I need to comment out the settings related to mask in the following files. What should I comment out of these files๏ผŸ๏ผŸ๏ผŸ๏ผŸ๏ผŸ๏ผŸ๏ผŸ
1.cascade_mask_rcnn_swin_small_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x.py(your used config)
2.config/base/model/cascade_mask_rcnn_swin_fpn.py

using the config and pretrained files for detection

Thanks for your amazing work!
I'm sorry if this is a very basic question but I just got started with mmdet. How can I train the Swin-Transformer given in the configs and models for only the detection task (that is, without using the mask part)?
Again, sorry for the basic question and thanks for sharing this code with all of us!

missing keys in source state_dict

patch_embed.proj.weight, patch_embed.proj.bias, patch_embed.norm.weight, patch_embed.norm.bias, layers.0.blocks.0.norm1.weight, layers.0.blocks.0.norm1.bias, layers.0.blocks.0.attn.relative_position_bias_table, layers.0.blocks.0.attn.relative_position_index, layers.0.blocks.0.attn.qkv.weight, layers.0.blocks.0.attn.qkv.bias, layers.0.blocks.0.attn.proj.weight, layers.0.blocks.0.attn.proj.bias, layers.0.blocks.0.norm2.weight, layers.0.blocks.0.norm2.bias, layers.0.blocks.0.mlp.fc1.weight, layers.0.blocks.0.mlp.fc1.bias, layers.0.blocks.0.mlp.fc2.weight, layers.0.blocks.0.mlp.fc2.bias, layers.0.blocks.1.norm1.weight, layers.0.blocks.1.norm1.bias, layers.0.blocks.1.attn.relative_position_bias_table, layers.0.blocks.1.attn.relative_position_index, layers.0.blocks.1.attn.qkv.weight, layers.0.blocks.1.attn.qkv.bias, layers.0.blocks.1.attn.proj.weight, layers.0.blocks.1.attn.proj.bias, layers.0.blocks.1.norm2.weight, layers.0.blocks.1.norm2.bias, layers.0.blocks.1.mlp.fc1.weight, layers.0.blocks.1.mlp.fc1.bias, layers.0.blocks.1.mlp.fc2.weight, layers.0.blocks.1.mlp.fc2.bias, layers.0.downsample.reduction.weight, layers.0.downsample.norm.weight, layers.0.downsample.norm.bias, layers.1.blocks.0.norm1.weight, layers.1.blocks.0.norm1.bias, layers.1.blocks.0.attn.relative_position_bias_table, layers.1.blocks.0.attn.relative_position_index, layers.1.blocks.0.attn.qkv.weight, layers.1.blocks.0.attn.qkv.bias, layers.1.blocks.0.attn.proj.weight, layers.1.blocks.0.attn.proj.bias, layers.1.blocks.0.norm2.weight, layers.1.blocks.0.norm2.bias, layers.1.blocks.0.mlp.fc1.weight, layers.1.blocks.0.mlp.fc1.bias, layers.1.blocks.0.mlp.fc2.weight, layers.1.blocks.0.mlp.fc2.bias, layers.1.blocks.1.norm1.weight, layers.1.blocks.1.norm1.bias, layers.1.blocks.1.attn.relative_position_bias_table, layers.1.blocks.1.attn.relative_position_index, layers.1.blocks.1.attn.qkv.weight, layers.1.blocks.1.attn.qkv.bias, layers.1.blocks.1.attn.proj.weight, layers.1.blocks.1.attn.proj.bias, layers.1.blocks.1.norm2.weight, layers.1.blocks.1.norm2.bias, layers.1.blocks.1.mlp.fc1.weight, layers.1.blocks.1.mlp.fc1.bias, layers.1.blocks.1.mlp.fc2.weight, layers.1.blocks.1.mlp.fc2.bias, layers.1.downsample.reduction.weight, layers.1.downsample.norm.weight, layers.1.downsample.norm.bias, layers.2.blocks.0.norm1.weight, layers.2.blocks.0.norm1.bias, layers.2.blocks.0.attn.relative_position_bias_table, layers.2.blocks.0.attn.relative_position_index, layers.2.blocks.0.attn.qkv.weight, layers.2.blocks.0.attn.qkv.bias, layers.2.blocks.0.attn.proj.weight, layers.2.blocks.0.attn.proj.bias, layers.2.blocks.0.norm2.weight, layers.2.blocks.0.norm2.bias, layers.2.blocks.0.mlp.fc1.weight, layers.2.blocks.0.mlp.fc1.bias, layers.2.blocks.0.mlp.fc2.weight, layers.2.blocks.0.mlp.fc2.bias, layers.2.blocks.1.norm1.weight, layers.2.blocks.1.norm1.bias, layers.2.blocks.1.attn.relative_position_bias_table, layers.2.blocks.1.attn.relative_position_index, layers.2.blocks.1.attn.qkv.weight, layers.2.blocks.1.attn.qkv.bias, layers.2.blocks.1.attn.proj.weight, layers.2.blocks.1.attn.proj.bias, layers.2.blocks.1.norm2.weight, layers.2.blocks.1.norm2.bias, layers.2.blocks.1.mlp.fc1.weight, layers.2.blocks.1.mlp.fc1.bias, layers.2.blocks.1.mlp.fc2.weight, layers.2.blocks.1.mlp.fc2.bias, layers.2.blocks.2.norm1.weight, layers.2.blocks.2.norm1.bias, layers.2.blocks.2.attn.relative_position_bias_table, layers.2.blocks.2.attn.relative_position_index, layers.2.blocks.2.attn.qkv.weight, layers.2.blocks.2.attn.qkv.bias, layers.2.blocks.2.attn.proj.weight, layers.2.blocks.2.attn.proj.bias, layers.2.blocks.2.norm2.weight, layers.2.blocks.2.norm2.bias, layers.2.blocks.2.mlp.fc1.weight, layers.2.blocks.2.mlp.fc1.bias, layers.2.blocks.2.mlp.fc2.weight, layers.2.blocks.2.mlp.fc2.bias, layers.2.blocks.3.norm1.weight, layers.2.blocks.3.norm1.bias, layers.2.blocks.3.attn.relative_position_bias_table, layers.2.blocks.3.attn.relative_position_index, layers.2.blocks.3.attn.qkv.weight, layers.2.blocks.3.attn.qkv.bias, layers.2.blocks.3.attn.proj.weight, layers.2.blocks.3.attn.proj.bias, layers.2.blocks.3.norm2.weight, layers.2.blocks.3.norm2.bias, layers.2.blocks.3.mlp.fc1.weight, layers.2.blocks.3.mlp.fc1.bias, layers.2.blocks.3.mlp.fc2.weight, layers.2.blocks.3.mlp.fc2.bias, layers.2.blocks.4.norm1.weight, layers.2.blocks.4.norm1.bias, layers.2.blocks.4.attn.relative_position_bias_table, layers.2.blocks.4.attn.relative_position_index, layers.2.blocks.4.attn.qkv.weight, layers.2.blocks.4.attn.qkv.bias, layers.2.blocks.4.attn.proj.weight, layers.2.blocks.4.attn.proj.bias, layers.2.blocks.4.norm2.weight, layers.2.blocks.4.norm2.bias, layers.2.blocks.4.mlp.fc1.weight, layers.2.blocks.4.mlp.fc1.bias, layers.2.blocks.4.mlp.fc2.weight, layers.2.blocks.4.mlp.fc2.bias, layers.2.blocks.5.norm1.weight, layers.2.blocks.5.norm1.bias, layers.2.blocks.5.attn.relative_position_bias_table, layers.2.blocks.5.attn.relative_position_index, layers.2.blocks.5.attn.qkv.weight, layers.2.blocks.5.attn.qkv.bias, layers.2.blocks.5.attn.proj.weight, layers.2.blocks.5.attn.proj.bias, layers.2.blocks.5.norm2.weight, layers.2.blocks.5.norm2.bias, layers.2.blocks.5.mlp.fc1.weight, layers.2.blocks.5.mlp.fc1.bias, layers.2.blocks.5.mlp.fc2.weight, layers.2.blocks.5.mlp.fc2.bias, layers.2.downsample.reduction.weight, layers.2.downsample.norm.weight, layers.2.downsample.norm.bias, layers.3.blocks.0.norm1.weight, layers.3.blocks.0.norm1.bias, layers.3.blocks.0.attn.relative_position_bias_table, layers.3.blocks.0.attn.relative_position_index, layers.3.blocks.0.attn.qkv.weight, layers.3.blocks.0.attn.qkv.bias, layers.3.blocks.0.attn.proj.weight, layers.3.blocks.0.attn.proj.bias, layers.3.blocks.0.norm2.weight, layers.3.blocks.0.norm2.bias, layers.3.blocks.0.mlp.fc1.weight, layers.3.blocks.0.mlp.fc1.bias, layers.3.blocks.0.mlp.fc2.weight, layers.3.blocks.0.mlp.fc2.bias, layers.3.blocks.1.norm1.weight, layers.3.blocks.1.norm1.bias, layers.3.blocks.1.attn.relative_position_bias_table, layers.3.blocks.1.attn.relative_position_index, layers.3.blocks.1.attn.qkv.weight, layers.3.blocks.1.attn.qkv.bias, layers.3.blocks.1.attn.proj.weight, layers.3.blocks.1.attn.proj.bias, layers.3.blocks.1.norm2.weight, layers.3.blocks.1.norm2.bias, layers.3.blocks.1.mlp.fc1.weight, layers.3.blocks.1.mlp.fc1.bias, layers.3.blocks.1.mlp.fc2.weight, layers.3.blocks.1.mlp.fc2.bias, norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

model = dict(
pretrained='/storage/wjb/AlignPS/pretrained/swin_tiny_patch4_window7_224.pth',
backbone=dict(
type='SwinTransformer',
embed_dim=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=7,
mlp_ratio=4.,
qkv_bias=True,
qk_scale=None,
drop_rate=0.,
attn_drop_rate=0.,
drop_path_rate=0.2,
ape=False,
patch_norm=True,
out_indices=(0, 1, 2, 3),
use_checkpoint=False),

How to train a custom dataset?

Hi, as I checked the train.py, I cant see the train-dir argument to specify my training directory. Is it possible to train on a custom dataset and would the dataset follow the COCO format

python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/apis/train.py", line 75, in <listcomp> seed=cfg.seed) for ds in dataset TypeError: object of type 'int' has no len()**

(swin-detection) wangxiao@wx:~/Documents/Swin-Transformer-Object-Detection$ python tools/train.py ./configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py
apex is not installed
apex is not installed
apex is not installed
apex is not installed
2021-05-16 20:27:11,937 - mmdet - INFO - Environment info:

sys.platform: linux
Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
CUDA available: True
GPU 0,1: GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda-10.0
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.0
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.5.2
MMCV: 1.3.4
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 10.0
MMDetection: 2.11.0+02baa30

2021-05-16 20:27:14,467 - mmdet - INFO - Distributed training: False
2021-05-16 20:27:16,458 - mmdet - INFO - Config:
model = dict(
type='MaskRCNN',
pretrained=None,
backbone=dict(
type='SwinTransformer',
embed_dim=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=7,
mlp_ratio=4.0,
qkv_bias=True,
qk_scale=None,
drop_rate=0.0,
attn_drop_rate=0.0,
drop_path_rate=0.2,
ape=False,
patch_norm=True,
out_indices=(0, 1, 2, 3),
use_checkpoint=True),
neck=dict(
type='FPN',
in_channels=[96, 192, 384, 768],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
mask_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
mask_head=dict(
type='FCNMaskHead',
num_convs=4,
in_channels=256,
conv_out_channels=256,
num_classes=80,
loss_mask=dict(
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100,
mask_thr_binary=0.5)))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='AutoAugment',
policies=[[{
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
'multiscale_mode':
'value',
'keep_ratio':
True
}],
[{
'type': 'Resize',
'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
'multiscale_mode': 'value',
'keep_ratio': True
}, {
'type': 'RandomCrop',
'crop_type': 'absolute_range',
'crop_size': (384, 600),
'allow_negative_crop': True
}, {
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
'multiscale_mode':
'value',
'override':
True,
'keep_ratio':
True
}]]),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_train2017.json',
img_prefix='data/coco/train2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='AutoAugment',
policies=[[{
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
'multiscale_mode':
'value',
'keep_ratio':
True
}],
[{
'type': 'Resize',
'img_scale': [(400, 1333), (500, 1333),
(600, 1333)],
'multiscale_mode': 'value',
'keep_ratio': True
}, {
'type': 'RandomCrop',
'crop_type': 'absolute_range',
'crop_size': (384, 600),
'allow_negative_crop': True
}, {
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333),
(544, 1333), (576, 1333),
(608, 1333), (640, 1333),
(672, 1333), (704, 1333),
(736, 1333), (768, 1333),
(800, 1333)],
'multiscale_mode':
'value',
'override':
True,
'keep_ratio':
True
}]]),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]),
val=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_val2017.json',
img_prefix='data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]),
test=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_val2017.json',
img_prefix='data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]))
evaluation = dict(metric=['bbox', 'segm'])
optimizer = dict(
type='AdamW',
lr=0.0001,
betas=(0.9, 0.999),
weight_decay=0.05,
paramwise_cfg=dict(
custom_keys=dict(
absolute_pos_embed=dict(decay_mult=0.0),
relative_position_bias_table=dict(decay_mult=0.0),
norm=dict(decay_mult=0.0))))
optimizer_config = dict(
grad_clip=None,
type='DistOptimizerHook',
update_interval=1,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[27, 33])
runner = dict(type='EpochBasedRunner', max_epochs=36)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
fp16 = None
work_dir = './work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco'
gpu_ids = 0

loading annotations into memory...
Done (t=9.13s)
creating index...
index created!
Traceback (most recent call last):
File "tools/train.py", line 159, in
main()
File "tools/train.py", line 155, in main
meta=meta)
File "/home/wangxiao/anaconda3/envs/swin-detection/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/apis/train.py", line 75, in train_detector
seed=cfg.seed) for ds in dataset
File "/home/wangxiao/anaconda3/envs/swin-detection/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/apis/train.py", line 75, in
seed=cfg.seed) for ds in dataset
TypeError: object of type 'int' has no len()

Hi, I met the error as mentioned above. How can I solve it? Thanks.

Mask RCNN pretrained error

I have already tried tiny and small version, none of them can run.
When train without pretrained weight, it is works fine, but time is a problem.
The following is the configs :

model = dict(
    type='MaskRCNN',
    pretrained='mask_rcnn_swin_tiny_patch4_window7.pth',
    backbone=dict(
        type='SwinTransformer',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4.0,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.2,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    neck=dict(
        type='FPN',
        in_channels=[96, 192, 384, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=43,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=43,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            mask_size=28,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
customed = [
    'speedlimit-20', 'speedlimit-30', 'speedlimit-50', 'speedlimit-60',
    'speedlimit-70', 'speedlimit-80', 'restrict-end-80', 'speedlimit-100',
    'speedlimit-120', 'no-overtake', 'no-overtake-truck',
    'priority-next-intersect', 'priority-road', 'giveaway', 'stop',
    'no-traffic-bothways', 'no-truck', 'no-entry', 'danger', 'bend-left',
    'bend-right', 'bend', 'uneven-road', 'slippery-road', 'road-narrow',
    'construction', 'traffic-signal', 'pedestrian-crossing', 'school-crossing',
    'cycle-crossing', 'snow', 'animals', 'restriction-ends', 'go-right',
    'go-left', 'go-straight', 'go-right-straight', 'go-left-straight',
    'keep-right', 'keep-left', 'roundabout', 'restrict-ends-overtaking',
    'restrict-ends-overtaking-truck'
]
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='AutoAugment',
        policies=[[{
            'type':
            'Resize',
            'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                          (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                          (736, 1333), (768, 1333), (800, 1333)],
            'multiscale_mode':
            'value',
            'keep_ratio':
            True
        }],
                  [{
                      'type': 'Resize',
                      'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
                      'multiscale_mode': 'value',
                      'keep_ratio': True
                  }, {
                      'type': 'RandomCrop',
                      'crop_type': 'absolute_range',
                      'crop_size': (384, 600),
                      'allow_negative_crop': True
                  }, {
                      'type':
                      'Resize',
                      'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                    (576, 1333), (608, 1333), (640, 1333),
                                    (672, 1333), (704, 1333), (736, 1333),
                                    (768, 1333), (800, 1333)],
                      'multiscale_mode':
                      'value',
                      'override':
                      True,
                      'keep_ratio':
                      True
                  }]]),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_train2017.json',
        img_prefix='data/coco/train2017/',
        classes=[
            'speedlimit-20', 'speedlimit-30', 'speedlimit-50', 'speedlimit-60',
            'speedlimit-70', 'speedlimit-80', 'restrict-end-80',
            'speedlimit-100', 'speedlimit-120', 'no-overtake',
            'no-overtake-truck', 'priority-next-intersect', 'priority-road',
            'giveaway', 'stop', 'no-traffic-bothways', 'no-truck', 'no-entry',
            'danger', 'bend-left', 'bend-right', 'bend', 'uneven-road',
            'slippery-road', 'road-narrow', 'construction', 'traffic-signal',
            'pedestrian-crossing', 'school-crossing', 'cycle-crossing', 'snow',
            'animals', 'restriction-ends', 'go-right', 'go-left',
            'go-straight', 'go-right-straight', 'go-left-straight',
            'keep-right', 'keep-left', 'roundabout',
            'restrict-ends-overtaking', 'restrict-ends-overtaking-truck'
        ],
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='AutoAugment',
                policies=[[{
                    'type':
                    'Resize',
                    'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                  (576, 1333), (608, 1333), (640, 1333),
                                  (672, 1333), (704, 1333), (736, 1333),
                                  (768, 1333), (800, 1333)],
                    'multiscale_mode':
                    'value',
                    'keep_ratio':
                    True
                }],
                          [{
                              'type': 'Resize',
                              'img_scale': [(400, 1333), (500, 1333),
                                            (600, 1333)],
                              'multiscale_mode': 'value',
                              'keep_ratio': True
                          }, {
                              'type': 'RandomCrop',
                              'crop_type': 'absolute_range',
                              'crop_size': (384, 600),
                              'allow_negative_crop': True
                          }, {
                              'type':
                              'Resize',
                              'img_scale': [(480, 1333), (512, 1333),
                                            (544, 1333), (576, 1333),
                                            (608, 1333), (640, 1333),
                                            (672, 1333), (704, 1333),
                                            (736, 1333), (768, 1333),
                                            (800, 1333)],
                              'multiscale_mode':
                              'value',
                              'override':
                              True,
                              'keep_ratio':
                              True
                          }]]),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ]),
    val=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        classes=[
            'speedlimit-20', 'speedlimit-30', 'speedlimit-50', 'speedlimit-60',
            'speedlimit-70', 'speedlimit-80', 'restrict-end-80',
            'speedlimit-100', 'speedlimit-120', 'no-overtake',
            'no-overtake-truck', 'priority-next-intersect', 'priority-road',
            'giveaway', 'stop', 'no-traffic-bothways', 'no-truck', 'no-entry',
            'danger', 'bend-left', 'bend-right', 'bend', 'uneven-road',
            'slippery-road', 'road-narrow', 'construction', 'traffic-signal',
            'pedestrian-crossing', 'school-crossing', 'cycle-crossing', 'snow',
            'animals', 'restriction-ends', 'go-right', 'go-left',
            'go-straight', 'go-right-straight', 'go-left-straight',
            'keep-right', 'keep-left', 'roundabout',
            'restrict-ends-overtaking', 'restrict-ends-overtaking-truck'
        ],
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        classes=[
            'speedlimit-20', 'speedlimit-30', 'speedlimit-50', 'speedlimit-60',
            'speedlimit-70', 'speedlimit-80', 'restrict-end-80',
            'speedlimit-100', 'speedlimit-120', 'no-overtake',
            'no-overtake-truck', 'priority-next-intersect', 'priority-road',
            'giveaway', 'stop', 'no-traffic-bothways', 'no-truck', 'no-entry',
            'danger', 'bend-left', 'bend-right', 'bend', 'uneven-road',
            'slippery-road', 'road-narrow', 'construction', 'traffic-signal',
            'pedestrian-crossing', 'school-crossing', 'cycle-crossing', 'snow',
            'animals', 'restriction-ends', 'go-right', 'go-left',
            'go-straight', 'go-right-straight', 'go-left-straight',
            'keep-right', 'keep-left', 'roundabout',
            'restrict-ends-overtaking', 'restrict-ends-overtaking-truck'
        ],
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(
    type='AdamW',
    lr=0.0001,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = dict(
    grad_clip=None,
    type='DistOptimizerHook',
    update_interval=1,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
fp16 = None
work_dir = './work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco'
gpu_ids = range(0, 1)
KeyError: "Mask RCNN: 'backbone.layers.0.blocks.0.attn.relative position bias table'"

How to use COCO pre-trained weights?

#4

#4 reply

ImageNet pre-trained weights in this page are available.
but, I wonder why using COCO pre-trained weights is not available.

because I'm trying to train custom detection dataset on Swin Transformer-Object Detection.
It was success that training custom dataset using ImageNet pre-trained weights but,
it wasn't success that training using COCO pre-trained weights.

Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.926047721682624e-98

I trained my custom data from cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py.
Because my data only has bbox mark, without mask mask, so I change all with_mask to False in the config files and remove mask_head FCNMaskHead at cascade_mask_rcnn_swin_fpn.py to make train execute successfully. when I trained for about 10 miniutes, the loss begin to nan.
2021-04-29 15:20:07,805 - mmdet - INFO - Epoch [1][1550/9370] lr: 1.000e-04, eta: 2 days, 13:28:02, time: 0.569, data_time: 0.005, memory: 6632, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 3.3333, s0.loss_bbox: nan, s1.loss_cls: nan, s1.acc: 3.3333, s1.loss_bbox: nan, s2.loss_cls: nan, s2.acc: 3.3333, s2.loss_bbox: nan, loss: nan
Is there something wrong when my trained?

model = dict( type='CascadeRCNN', pretrained='swin_tiny_patch4_window7_224.pth', backbone=dict( type='SwinTransformer', embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), use_checkpoint=False), neck=dict( type='FPN', in_channels=[96, 192, 384, 768], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='CascadeRoIHead', num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=[ dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=26, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='BN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=26, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='BN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=26, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='BN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)) ], mask_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32])), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=[ dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False) ]), test_cfg=dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5))) dataset_type = 'CocoDataset' data_root = 'data/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(800, 400), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_train2017.json', img_prefix='data/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(800, 400), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(800, 400), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(metric=['bbox', 'segm']) optimizer = dict( type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05, paramwise_cfg=dict( custom_keys=dict( absolute_pos_embed=dict(decay_mult=0.0), relative_position_bias_table=dict(decay_mult=0.0), norm=dict(decay_mult=0.0)))) optimizer_config = dict( grad_clip=None, type='DistOptimizerHook', update_interval=1, coalesce=True, bucket_size_mb=-1, use_fp16=True) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[27, 33]) runner = dict(type='EpochBasedRunnerAmp', max_epochs=36) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] fp16 = None work_dir = 'work_dirs' gpu_ids = [1]
Thanks very much!

The model and loaded state dict do not match exactly

Hi,

I try to load the Swin backbone using the configuration mask_rcnn_swin_small_patch4_window7_mstrain_480-800_adamw_3x_coco.py and weights swin_small_patch4_window7_224.pth and I get the following warnings:

mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: norm.weight, norm.bias, head.weight, head.bias, layers.0.blocks.1.attn_mask, layers.1.blocks.1.attn_mask, layers.2.blocks.1.attn_mask, layers.2.blocks.3.attn_mask, layers.2.blocks.5.attn_mask, layers.2.blocks.7.attn_mask, layers.2.blocks.9.attn_mask, layers.2.blocks.11.attn_mask, layers.2.blocks.13.attn_mask, layers.2.blocks.15.attn_mask, layers.2.blocks.17.attn_mask

missing keys in source state_dict: norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

I understand the problem with the norm layers. In the original Swin backbone, there is only one normalization layer at the output while there is a norm layer at every output stage in the Swin backbone used for detection.
However, I am not sure why there is the problem with attn_masks.

Can you please help me?

Thank you very much in advance.

TypeError: CascadeRCNN: SwinTransformer: __init__() got an unexpected keyword argument 'depth'

Traceback (most recent call last):
File "/home/cai/anaconda3/envs/swin/lib/python3.7/site-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
return obj_cls(**args)
TypeError: init() got an unexpected keyword argument 'depth'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cai/anaconda3/envs/swin/lib/python3.7/site-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "/home/cai/project/Swin/mmdet/models/detectors/cascade_rcnn.py", line 25, in init
pretrained=pretrained)
File "/home/cai/project/Swin/mmdet/models/detectors/two_stage.py", line 26, in init
self.backbone = build_backbone(backbone)
File "/home/cai/project/Swin/mmdet/models/builder.py", line 39, in build_backbone
return build(cfg, BACKBONES)
File "/home/cai/project/Swin/mmdet/models/builder.py", line 34, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/cai/anaconda3/envs/swin/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: SwinTransformer: init() got an unexpected keyword argument 'depth'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "tools/train.py", line 187, in
main()
File "tools/train.py", line 161, in main
test_cfg=cfg.get('test_cfg'))
File "/home/cai/project/Swin/mmdet/models/builder.py", line 77, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/home/cai/project/Swin/mmdet/models/builder.py", line 34, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/cai/anaconda3/envs/swin/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: CascadeRCNN: SwinTransformer: init() got an unexpected keyword argument 'depth'


When run the demo provided in config, it run well.
However ,When I employ the swintransformer as the backbone in cascade rcnn, error occurs.
Can give me some advice? Thanks.

How to finetune the pretrained model on other COCO-format dataset?

Hi,

First of all, thank you for your excellent work.

Now I want to finetune the pretrained model (trained on COCO dataset) using another COCO-format dataset, but here is an error "KeyError: "MaskRCNN: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'"", how could i solve it?

Look to your reply and wish you have a nice day!

i have problem when using my own dataset with cascade_mask_rcnn_swin_tiny for detection

thank you for your great work, but there is a problem when i train with my own data
i can't figure out why this happen
i've changed all num_class to my num
this is traceback:

Traceback (most recent call last):
File "./tools/train.py", line 187, in
main()
File "./tools/train.py", line 183, in main
meta=meta)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/apis/train.py", line 185, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
cv2.error: Caught error in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/custom.py", line 193, in getitem
data = self.prepare_train_img(idx)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/custom.py", line 216, in prepare_train_img
return self.pipeline(results)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/pipelines/compose.py", line 40, in call
data = t(data)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/pipelines/transforms.py", line 534, in call
self._pad_masks(results)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/datasets/pipelines/transforms.py", line 515, in _pad_masks
results[key] = results[key].pad(pad_shape, pad_val=self.pad_val)
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/core/mask/structures.py", line 305, in pad
for mask in self.masks
File "/home/ding/chenTY/Swin-Transformer-Object-Detection/mmdet/core/mask/structures.py", line 305, in
for mask in self.masks
File "/home/ding/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/image/geometric.py", line 450, in impad
value=pad_val)
cv2.error: OpenCV(4.5.1) /tmp/pip-req-build-7m_g9lbm/opencv/modules/core/src/copy.cpp:1445: error: (-215:Assertion failed) top >= 0 && bottom >= 0 && left >= 0 && right >= 0 && _src.dims() <= 2 in function 'copyMakeBorder'

Onnx Conversion: Integer division of tensors using div or / is no longer supported

Hello,

I'm trying to export my trained model into ONNX. For this, I'm using pytorch2onnx.py.
When running I got into this issue:
RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

The error is pointing to this line:
File "/Swin-Transformer-Object-Detection/mmdet/models/backbones/swin_transformer.py", line 374, in forward Hp = int(np.ceil(H / self.window_size)) * self.window_size

Should I convert the H and W to LongTensor like suggested in this pytorch thread ?

Thanks for helping

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.