fudan-zvg / setr Goto Github PK

View Code? Open in Web Editor NEW

1.0K 1.0K 146.0 12.51 MB

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

License: MIT License

Python 99.91% Dockerfile 0.01% Shell 0.07%

setr's People

Contributors

Stargazers

Watchers

Forkers

wyh20000305 goldlee mksarker xrosliang gnn2qsu eeaesa nicholasxin nebulordang mymuli shijun18 fengweie city292 cvlinks zlpsophina qingpingzheng xusanpangzi johndpope gitwaynezhang cao-dut lerylee saoruy apple3c tnwls6865 light201212 senwang98 peternara digital-idiot pha-nguyen liuguoyou flybiubiu cigaftex cuttlefishxuan coldcode1 lpfworld onatyap liaw05 roberto-amoroso davecoding tianhaofu zkungithub jacksyu zebrajack qishi21 shota74 bruinxiong kaeless hjhjb cyhuauin designer00 bing-zhang qq664956261 zz7379 gitshohoku fanrz xieenze yuhuang-ca harrylan zhuwenzhen jackylee1 1oliver1 wutianyirosun tlwzzy fightli123456 shiwanxipitxc xdq1ang snoopybingo mengyangpu sunlinlin-aragon zzzhoudj mark1dong zhaoyutim zhe-meng lidaowei haitaowang97 cvjie liu-qi333 djx2726889 sunwanchun ema-rachmawati hataewook atlasgooo2 lee-nam-kyoung gongyuchen lv-tuan kingpleasure junjue-wang lxmwust luckydog-1998 js228 1960675737 qtjiebin monta0315 zongzi13545329 desperadolxh zhangyiming786 fcbfcb1998 gth901007 zbwxp teru997 haribhutanadhu

setr's Issues

when will you release the code?

Auxiliary loss

Thank you for your great work. Auxiliary segmentation loss helps the model training and may lead to significant performance improvement. Can you provide some related results w/o auxiliary loss?

GPU memory

Hello，thanks for your code.
How much GPU memory is needed for training SETR ?
I have 2 P40 GPU but I cann't start training cus OOM.
Looking forward to your reply.

21k Pretrain weight

Could you please release the ResNet101 pretrained weights with ImageNet21K or tell us where can we find it?
Thank you!

About the position embeddings for patches

Since the patches come from a 2D images, the position information consists of two directions, in other words, x-axis and y-axis indexes. This is different from the case in 1-D sequence. How do you implement the position embedding? Can you share the details since the code is not released?

Test issue

hi authors:
where to download the pretrained model? I'd like to test your model but i cannot download the pretrained model, for instance, SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth.

typo in Dockerfile

Hi, a "t" is missing in Dockerfile. It should be "mmsegmentation", or git clone won't work.

could you share the demo of SETR?

The demo in your repo is written for pspnet, so could you share the demo for SETR?

Custom dataset how to use?

Hello, It's my first time use MMseg, can you help me to use my own datasets to train?

Test model on my testset

Hi,
Thanks for sharing the code. I want to use your model on my own dataset, how can I do that? As I see you provide a separate python file for each dataset.

AttributeError: 'DataContainer' object has no attribute 'shape'

hello, Thank you for your contribution, When my dataset is Cityscapes, I run tools / train.py. It has the following error:

Traceback (most recent call last):
File "/home/caiweixin/Downloads/SETR-main/tools/train.py", line 161, in
main()
File "/home/caiweixin/Downloads/SETR-main/tools/train.py", line 150, in main
train_segmentor(
File "/home/caiweixin/Downloads/SETR-main/mmseg/apis/train.py", line 105, in train_segmentor
runner.run(data_loaders, cfg.workflow, cfg.total_iters)
File "/home/caiweixin/anaconda3/envs/SETR-main/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 130, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/caiweixin/anaconda3/envs/SETR-main/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/home/caiweixin/Downloads/SETR-main/mmseg/models/segmentors/base.py", line 152, in train_step
losses = self(**data_batch)
File "/home/caiweixin/anaconda3/envs/SETR-main/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/caiweixin/anaconda3/envs/SETR-main/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/home/caiweixin/Downloads/SETR-main/mmseg/models/segmentors/base.py", line 122, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/caiweixin/Downloads/SETR-main/mmseg/models/segmentors/encoder_decoder.py", line 153, in forward_train
x = self.extract_feat(img)
File "/home/caiweixin/Downloads/SETR-main/mmseg/models/segmentors/encoder_decoder.py", line 79, in extract_feat
x = self.backbone(img)
File "/home/caiweixin/anaconda3/envs/SETR-main/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/caiweixin/Downloads/SETR-main/mmseg/models/backbones/vit.py", line 393, in forward
B = x.shape[0]
AttributeError: 'DataContainer' object has no attribute 'shape'

I don't know how to solve it. I hope you can give me some suggestions. Thank you very much.

About selection of gpu

how to select gpu when training with multiple gpus, thanks a lot

AssertionError: Default process group is not initialized

Hi, authors,

I got the following error after executing command: python tools/train.py configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py

2021-04-08 08:03:22,265 - mmseg - INFO - Loaded 2975 images
2021-04-08 08:03:24,275 - mmseg - INFO - Loaded 500 images
2021-04-08 08:03:24,276 - mmseg - INFO - Start running, host: root@milton-LabPC, work_dir: /media/root/mdata/data/code13/SETR/work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8
2021-04-08 08:03:24,276 - mmseg - INFO - workflow: [('train', 1)], max: 40000 iters
Traceback (most recent call last):
  File "tools/train.py", line 161, in <module>
    main()
  File "tools/train.py", line 150, in main
    train_segmentor(
  File "/media/root/mdata/data/code13/SETR/mmseg/apis/train.py", line 106, in train_segmentor
    runner.run(data_loaders, cfg.workflow, cfg.total_iters)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 130, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/media/root/mdata/data/code13/SETR/mmseg/models/segmentors/base.py", line 152, in train_step
    losses = self(**data_batch)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
    return old_func(*args, **kwargs)
  File "/media/root/mdata/data/code13/SETR/mmseg/models/segmentors/base.py", line 122, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/media/root/mdata/data/code13/SETR/mmseg/models/segmentors/encoder_decoder.py", line 157, in forward_train
    loss_decode = self._decode_head_forward_train(x, img_metas,
  File "/media/root/mdata/data/code13/SETR/mmseg/models/segmentors/encoder_decoder.py", line 100, in _decode_head_forward_train
    loss_decode = self.decode_head.forward_train(x, img_metas,
  File "/media/root/mdata/data/code13/SETR/mmseg/models/decode_heads/decode_head.py", line 185, in forward_train
    seg_logits = self.forward(inputs)
  File "/media/root/mdata/data/code13/SETR/mmseg/models/decode_heads/vit_up_head.py", line 93, in forward
    x = self.syncbn_fc_0(x)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 519, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 625, in get_world_size
    return _get_group_size(group)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
    _check_default_pg()
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg
    assert _default_pg is not None, \
AssertionError: Default process group is not initialized
(pytorch1.7.0) root@milton-LabPC:/data/code13/SETR

As I use a single GPU device to perform the training, it seems the error is related to distributed training. Any hints to solve this issue?

THX!

Could you release your best checkpoints?

from another post, this code needs a lot of GPU resources, which is really difficult for me to retrain your results. Could you please release your best checkpoints?

SETR-Naive-Base model

Hi, do you have a google drive link for the models with T-Base referenced in the paper (such as SETR-Naive-Base) as well as the corresponding configuration files?

Alternatively, what configuration can I use to train the model if it is not readily available? I tried changing the depth in SETR/configs/base/models/setr_naive_pup.py to 12, but that errors out with "RuntimeError: shape '[2, 1025, 3, 12, 85]' is invalid for input of size 6297600" when using the ADE20K configuration file (https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_Naive_512x512_160k_ade20k_bs_16.py) for training. Changing the embedding dimension in this file from 1024 results in a lot of shape mismatches with the pretrained imagenet21k model as well. The default training with the T-large depth and embedding dimension work for me with the same file.

Thanks for your help.

测试得到的图像是和原图叠加在一起的，怎么才能保留语义分割图？

AttributeError: module 'torch.distributed' has no attribute 'group'

Hi, thanks for providing the code, When I use SETR_MLA_480x480_80k_pascal_context_bs_8.py and SETR_MLA_pascal_context_b8_80k.pth, I met this error:"AttributeError: module 'torch.distributed' has no attribute 'group'". How can I solve this on windows?
Below is my environment:
pytorch 1.6.0
mmcv 1.2.6
mmcv-full 1.1.5

Speed test on those models?

Question about optimizer config.

"paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)}"

Hi, thank you for open-source your code firstly. I have a question about the configuration of the optimizer.
I found there is "decode_head" in your model, not "head" used in 'custom_keys'. Will 'lr_mult=10' takes effect while we training the model?

Thanks~

about single GPU

I met this error, how to deal with it..., thanks a lot.
RuntimeError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 24.00 GiB total capacity; 18.99 GiB already allocated; 18.86 MiB free; 19.31 GiB reserved in total by PyTorch)

I think the batchsize is too big, or it's the GPU setting in the code. But I didn't find set batchsize.
Can this code only run under multiple GPUs?

ZeroDivisionError: integer division or modulo by zero

Hi thanks for share the code, i got some problem plz help me.
i have only one GPU
my mmcv and pytorch version is the same as the readme.md.

(base) root@Pub:/mnt/c/Users/Pub/SETR# ./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth 1 --eval mIoU
Traceback (most recent call last):
File "./tools/test.py", line 144, in
main()
File "./tools/test.py", line 100, in main
init_dist(args.launcher, **cfg.dist_params)
File "/root/miniconda3/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 20, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/root/miniconda3/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 33, in _init_dist_pytorch
torch.cuda.set_device(rank % num_gpus)
ZeroDivisionError: integer division or modulo by zero
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/root/miniconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/bin/python', '-u', './tools/test.py', '--local_rank=0', 'configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py', 'work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth', '--launcher', 'pytorch', '--eval', 'mIoU']' returned non-zero exit status 1.

Question about the method of handling the multi-patch inputs

After reading your paper, I have a confusion that how do you handle the multi-patch (256) inputs in the encoder? It seems that in the encoder, the network fuses the 256 patches and learns one feature map (with size: (H/16, W/16, D)) of the whole original image (instead of the patch-wise image), and then decode this feature map to generate the segmentatoin map. Wonder how to process and fue the 256 patches in the encoder?

pascal_context dataset

SETR/configs/_base_/datasets/pascal_context.py

Line 30 in 23f8fde

dict(type='Resize', keep_ratio=True, dataset='pascal_context'),

There is no defination of 'dataset' in class Resize.

num classes on pascal_context

According to the results of "mmsegmentation" and "HRNet-segmentation", evaluating with 60 classes (including the background) will lead to a huge decrease in mIoU. Can you provide the evalution results of 59 classes?

Can you share the iter_40000.pth file？

I am following the steps for single-scale testing, but there is an error in the checkpoint file. Can you share this file？

How to predict for an image after training?

could please share the trained model（请问可以分享一下训练好的模型吗）

Thank you for your sharing.
because i only have one 1080 GPU, and it cannot trained your model(CUDA error: an illegal memory access was encountered), when i want to download your model, it shows 404 website, could you please share the trained model to my email: [email protected]
thank you very much.
同学你好，谢谢你的分享
但是由于我只有一块GPU因此无法训练你的模型，请问你可以把你训练好的模型分享到我的邮箱吗，谢谢。

when can we have the honour to read your code!

The model efficiency and speed

@lzrobots The paper seems promising, but some question about the efficiency are unanswered:

For CPU-only inferencing, how much memory is required for inferencing a 1024 x 1024 image?
For CPU-only inferencing, what is the fps count for 1024*1024 images?
Number of FLOPS and Parameters?

Questions about feature visualization (Fig. 5 & Fig. 9)

Given the encoder of Vit-Large-Patches16 and input size of 3x480x480, the output feature maps of any layer Z should be 1024x30x30 (reshaped). How to map these 1024 features to an RGB space for visualization? Are the feature maps directly upsampled to the original image size during the visualization process?

I didn't find any related codes in this repo.

KeyError: "EncoderDecoder: 'VisionTransformer is not in the backbone registry'"

Hi, authors,

I got this error: KeyError: "EncoderDecoder: 'VisionTransformer is not in the backbone registry'" when I run the command: python tools/train.py configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py. The error in details is listed here: https://gist.github.com/amiltonwong/476e04e81e33b3cf8c4cb3f28ee01ddd

Any hints to solve this issue?

THX!

Where is the code?

When are you going to release the code?

Request for diverting flow to MMSegmentation.

Hi, congrats to acceptance of CVPR 2021.

I am the member of OpenMMLab and our vision is to provide abundant models in our codebase, where researchers could make fair and effective comparison in computer vision field easily, which could in turn make more citations of those original excellent works because of already built baselines.

As the first transformer model on semantic segmentation, SETR has gotten too much attention in related field.
That's why my colleagues re-implement SETR in MMSeg last several month. Please check our link: https://github.com/open-mmlab/mmsegmentation/tree/master/configs/setr.

So could you please add our link in your original github repositories? We hope more people could use this famous model.

Lookig forward to your reply! Wish you make more great works in the future.

Best,

torch.hub has no attribute 'get_dir'

pretained model Page does not exist

when i open the pretained model Page,there is no pretrained models

error for using dist_train.sh

Excuse me
I'm Trainning with multiple GPUs,for example:./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] and I have 2 GPUs try to use
Traceback (most recent call last):
File "/home/anaconda3/envs/py37_torch1.6/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/anaconda3/envs/py37_torch1.6/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/anaconda3/envs/py37_torch1.6/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home/anaconda3/envs/py37_torch1.6/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/anaconda3/envs/py37_torch1.6/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/SETR/SETR_Naive_768x768_40k_cityscapes_bs_8.py', '--launcher', 'pytorch', '--load-from=./pth/jx_vit_large_p16_384-b3be5167.pth']' returned non-zero exit status 1.

Thanks for your answer！

DataLoader error.

Hello, according to the document, I installed and tried the training model normally, but it reported an error during the training process: Is it incompatible with MMseg? Thanks for giving anwser.
Here are some details about the error I met:

2021-07-13 11:57:07,774 - mmseg - INFO - Loaded 2975 images
2021-07-13 11:57:12,063 - mmseg - INFO - Loaded 500 images
2021-07-13 11:57:12,064 - mmseg - INFO - Start running, host: well@admin01, work_dir: /home/well/SETR/work_dirs/SETR_Naive_768x768_40k_cityscapes_bs_8
2021-07-13 11:57:12,064 - mmseg - INFO - workflow: [('train', 1)], max: 40000 iters

...

FileNotFoundError: [Errno 2] No such file or directory: '/home/well/SETR/data/cityscapes/gtFine/train/cologne/cologne_000019_000019_gtFine_labelTrainIds.png'

I make sure the file path is correct. But it can not start the training process.

The input size problem

Thank you for your great work.The size of my picture is (256,832),how should I deal with it?Please tell me more details.thanks.

2D interpolation of the pre-trained position embeddings

How do you do about 2D interpolation of the pre-trained position embeddings？Thanks！！

image size doesn't match model

Hello:
when test pascal_context dataset, output:
AssertionError: Input image size (389480) doesn't match model (480480).

always CUDA out of memory

@lzrobots
@VictorLlu
@sixiaozheng
Hi, thank you for your sharing.
however, when i run "./tools/dist_test.sh configs/SETR/SETR_PUP_512x512_160k_ade20k_bs_16_MS.py", i got the error: CUDA out of memory. I have 2 NVIDIA Tesla P100 about 16GB per GPU.
Could you please tell me what is wrong.
Thank you.

Question about Figure 8 in paper

Thank you for your nice work!
I am wondering how to get the attention map of the picked point, could you give a simple introduction?

Release pretrained-models

Hi @sixiaozheng,
Could you please release the R101 pretrained-model with ImageNet-21K.

When are you going to release the code?

_{Sent from PPHub}

What are the reults using DeiT on ADE?

Hi! Thanks for opensourcing the code. I wonder what are the results of SETR using DeiT on ADE.

Cant achieve the best miou when batchsize=4

I cannt achieve the most miou proposed in original paper while i didnt change the hyperparameter except tuning the bs from 8 to 4. How to achieve the best miou? Is that related to batchsize?

Which checkpoint file is the best?

Hi, thank you for your awesome work.

I trained SETR-MLA model on my own dataset, and there are many checkpoints files of different iterations. But how can I know which is the best one to test data?

Thanks in advance.

MMCV Error(mmcv-full 1.2.2 torch1.6 python3.7)

Can you help me solve this problem? I use the dataset in VOC format.

Traceback (most recent call last):
File "tools/train.py", line 163, in
main()
File "tools/train.py", line 159, in main
meta=meta)
File "/home/ubuntu/disk1/user/SETR-main/mmseg/apis/train.py", line 91, in train_segmentor
val_dataset = build_dataset(cfg.data.val, dict(test_mode=True))
File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/builder.py", line 73, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/home/ubuntu/anaconda3/envs/setr/lib/python3.7/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
return obj_cls(**args)
File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/pascal_context.py", line 53, in init
**kwargs)
File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/custom.py", line 86, in init
self.pipeline = Compose(pipeline)
File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/pipelines/compose.py", line 22, in init
transform = build_from_cfg(transform, PIPELINES)
File "/home/ubuntu/anaconda3/envs/setr/lib/python3.7/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
return obj_cls(**args)
File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/pipelines/test_time_aug.py", line 59, in init
self.transforms = Compose(transforms)
File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/pipelines/compose.py", line 22, in init
transform = build_from_cfg(transform, PIPELINES)
File "/home/ubuntu/anaconda3/envs/setr/lib/python3.7/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
return obj_cls(**args)
TypeError: init() got an unexpected keyword argument 'dataset'

could you make the code available?

mmcv error ,python3.7.0,pytorch 1.9.1，windows

pip install mmcv
python setup.py develop
and run train.py

ImportError: DLL load failed: 找不到指定的模块。

can you give me some advices?
thanks