nvlabs / segformer Goto Github PK

View Code? Open in Web Editor NEW

2.4K 31.0 340.0 2.64 MB

Official PyTorch implementation of SegFormer

Home Page: https://arxiv.org/abs/2105.15203

License: Other

Python 99.77% Dockerfile 0.07% Shell 0.16%

semantic-segmentation transformer ade20k cityscapes

segformer's Introduction

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Figure 1: Performance of SegFormer-B0 to SegFormer-B5.

Project page | Paper | Demo (Youtube) | Demo (Bilibili) | Intro Video

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo.
NeurIPS 2021.

This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegFormer.

SegFormer is a simple, efficient and powerful semantic segmentation method, as shown in Figure 1.

We use MMSegmentation v0.13.0 as the codebase.

🔥🔥 SegFormer is on MMSegmentation. 🔥🔥

Installation

For install and data preparation, please refer to the guidelines in MMSegmentation v0.13.0.

Other requirements: pip install timm==0.3.2

An example (works for me): CUDA 10.1 and pytorch 1.7.1

pip install torchvision==0.8.2
pip install timm==0.3.2
pip install mmcv-full==1.2.7
pip install opencv-python==4.5.1.48
cd SegFormer && pip install -e . --user

Evaluation

Download trained weights. ( google drive | onedrive )

Example: evaluate SegFormer-B1 on ADE20K:

# Single-gpu testing
python tools/test.py local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file

# Multi-gpu testing
./tools/dist_test.sh local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file <GPU_NUM>

# Multi-gpu, multi-scale testing
tools/dist_test.sh local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file <GPU_NUM> --aug-test

Training

Download weights ( google drive | onedrive ) pretrained on ImageNet-1K, and put them in a folder pretrained/.

Example: train SegFormer-B1 on ADE20K:

# Single-gpu training
python tools/train.py local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py 

# Multi-gpu training
./tools/dist_train.sh local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py <GPU_NUM>

Visualize

Here is a demo script to test a single image. More details refer to MMSegmentation's Doc.

python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${DEVICE_NAME}] [--palette-thr ${PALETTE}]

Example: visualize SegFormer-B1 on CityScapes:

python demo/image_demo.py demo/demo.png local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py \
/path/to/checkpoint_file --device cuda:0 --palette cityscapes

License

Please check the LICENSE file. SegFormer may be used non-commercially, meaning for research or evaluation purposes only. For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Citation

@inproceedings{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  booktitle={Neural Information Processing Systems (NeurIPS)},
  year={2021}
}

segformer's People

Contributors

Stargazers

Watchers

Forkers

wasedamagina qiangzhangcv chenmingxuan nielsrogge ancientremember sarvex evdcush trendingtechnology xiamenwcy miaochunle shota74 rickragv ankitshah009 peterding jdc08161063 pinto0309 lifeisstrange vuno-dslee zots0127 githubltqc jumperkables zhaobinnf legendbc davis-love-ai ddonatien yhytoto12 kshitijd20 chomolungma xtanitfy hukefei 605789414 topdu monaim1 vincycode7 satuki717 sauravbandral sxjscience mark1dong yangsenwxy kilaruoleh-def lslrh peternara bingao110 zbwxp jinlinyi githubfragments wuxiaolianggit pulak09 guoshuxuan yvanyin janysunny shuizhilinxin wangjie0825 qinliuliuqin wenbin94 songlang laplacekorea jingliang95 voidrank rawalkhirodkar cuongdv1 whu-pzhang wstchhwp inch-z tvshow5727 ljm198134 gitshohoku luojing1030 overbestfitting junjue-wang dunazo zdchen-star leo-hao w-copper chinitaberrio chaineypung celsopitta codingtaotao enginbozkurt kanshichao segregation andre20000131 delldu gcordova19 auniquesun grimreaperz-creator phatli sycasia thuwyl cv-ip alphacool123 crashmoon zry979 ricklentz rstrudel yuanlin1993 mhaiyang zhtian327 hermar98 emptymountain

segformer's Issues

export segFormer to onnx and then convert to trt error

the onnx out shape is not determined
trtexec convert occur GPU out of Memory

无法通过预训练模型完成训练

预训练模型路径：
./SegFormer/checkpoints/pretrained/mit_b1.pth
执行脚本：
cd SegFormer

python ./tools/train.py ./local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py
报错信息：
Traceback (most recent call last):
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "C:\xxxx\github\torch_env\lib\site-packages\mmseg\models\segmentors\encoder_decoder.py", line 30, in init
self.backbone = builder.build_backbone(backbone)
File "C:\xxxx\github\torch_env\lib\site-packages\mmseg\models\builder.py", line 17, in build_backbone
return BACKBONES.build(cfg)
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "c:\xxxx\github\mmcv\mmcv\cnn\builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 44, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'mit_b1 is not in the models registry'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./tools/train.py", line 166, in
main()
File "./tools/train.py", line 135, in main
test_cfg=cfg.get('test_cfg'))
File "C:\xxxx\github\torch_env\lib\site-packages\mmseg\models\builder.py", line 46, in build_segmentor
cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "c:\xxxx\github\mmcv\mmcv\cnn\builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
KeyError: "EncoderDecoder: 'mit_b1 is not in the models registry'"

请问大神如何解决该问题？
mmcv-full, mmsegmentation以及您的SegFormer是否需要通过特定的代码版本一一对应才行？

Pretraining segformer on ImageNet-22K

The Swin transformer release a large model pretrained on ImageNet-22K for semantic segmentation and achieved a good result. I wonder if you are interested in improving segformer in a similar way? Thanks!

Question on Mapillary pretrain when evaluating on cityscapes(val) dataset

I met a problem when training on Mapillary and evaluating on cityscapes. The class "wall" miou=0.0.
Could you please provide the training log of Mapillary pretrain and eval？(prefer Model B2) Thanks a lot!

+---------------+-------+-------+
|     Class     |  IoU  |  Acc  |
+---------------+-------+-------+
|      road     | 96.93 | 98.09 |
|    sidewalk   | 76.57 | 90.46 |
|    building   | 89.02 | 95.67 |
|      wall     |  0.0  |  0.0  |
|     fence     | 35.52 | 59.93 |
|      pole     | 52.85 | 63.03 |
| traffic light | 59.63 | 71.81 |
|  traffic sign | 68.11 | 77.09 |
|   vegetation  | 89.89 | 96.67 |
|    terrain    |  26.0 |  26.5 |
|      sky      | 90.97 | 93.58 |
|     person    | 72.78 | 87.27 |
|     rider     | 33.21 |  41.0 |
|      car      | 91.25 | 97.25 |
|     truck     |  61.8 | 64.37 |
|      bus      | 66.93 | 71.56 |
|     train     | 62.85 | 65.31 |
|   motorcycle  | 47.68 | 65.62 |
|    bicycle    | 67.57 | 74.03 |
+---------------+-------+-------+
2021-06-21 16:06:43,150 - mmseg - INFO - Summary:
2021-06-21 16:06:43,150 - mmseg - INFO - 
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 93.66 | 62.61 | 70.49 |
+-------+-------+-------+

Why do not directly use MLP to predict the mask on the concatenated features?

Thanks for your work! I have a question. In your decoder, you first use MLP layers to multi-level features to unify the channel dimension and resize them to the same feature size and concatenate them, then you apply an MLP layer again to reduce the channel from 4C to C, and another MLP to predict the segmentation mask. My question is, why you do not directly use the MLP layer on the concatenated features? Have you tested this two-MLPs decoder performance?

AssertionError: default process group is not initialized.

I follow the readme carefully, it shows as follows:

why the performance of segformer-b3 worse than hrnet-18-ocr?

same custom datasets,same optimizer
why the performance of segformer-b3 worse than hrnet-18-ocr?

segformer on edge devices

Hi there,
Have you considered:

inferencing segformer B0 on an edge device, such as a Raspberry Pi
Pruning of the B0 model, to reduce the model flops and size

Do these functions be used in training?

Hi, I'm not familiar with mmsegmentation training pipeline. I want to know that do functions: reset_drop_path, freeze_patch_emb and no_weight_decay in mix_transformer.py be used when training? Thanks for the nice project.

Could you publish the Cityscapes-C dataset?

Can you upload the cityscapes-c dataset? I think it will promote the research of robustness of semantic segmentation.

SegFormer realtime demo using OSSDC VisionAI platform

See here a realtime demo using this implementation:

SegFormer - semantic segmentation with transformers using OSSDC VisionAI platform
https://www.youtube.com/watch?v=3ws-irF4dEQw

More details in video description.
It takes less than 5 min to run it in Google Colab with realtime video streamed from any Android 4.2.2 phone/tablet/media player camera.

KeyError: 'AlignedResize is not in the pipeline registry'

Hi,

I hava a similar error to #2. I've just forked the repo to add a print statement, so fix #1 is included. When running python tools/test.py, I'm getting the following:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/pipelines/test_time_aug.py", line 59, in __init__
    self.transforms = Compose(transforms)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/pipelines/compose.py", line 22, in __init__
    transform = build_from_cfg(transform, PIPELINES)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
    f'{obj_type} is not in the {registry.name} registry')
KeyError: 'AlignedResize is not in the pipeline registry'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/ade.py", line 91, in __init__
    **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/custom.py", line 88, in __init__
    self.pipeline = Compose(pipeline)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/pipelines/compose.py", line 22, in __init__
    transform = build_from_cfg(transform, PIPELINES)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: "MultiScaleFlipAug: 'AlignedResize is not in the pipeline registry'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/test.py", line 170, in <module>
    main()
  File "tools/test.py", line 122, in main
    dataset = build_dataset(cfg.data.test)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/builder.py", line 73, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: 'ADE20KDataset: "MultiScaleFlipAug: \'AlignedResize is not in the pipeline registry\'"'

I've made a Google Colab to reproduce: https://colab.research.google.com/drive/1-t_lj5K2ZEFemxn88DSfcy9W7RTvklsz?usp=sharing

What is the difference between ${dataset}_repeat.py and the original ${dataset}.py in local_configs?

Inference speed of the model

Hello
How are you?
Thanks for contributing to this project.
Which device did u test your models on?

You did NOT explain the device specification in the paper.

About the efficient attention module

Hi,

I would like to ask a question about the efficient attention module, please:
I see that you use a reduction ratio R to descrease the spatial size of input sequences, normally this operation will produce a output sequence of spatial size N/R. But according to your Table.6 it doesn't, the output spatial size is still N. I would like to ask where do you upsample your sequence spatial size from N/R back to N in the attention module after the reduced QKV multiplication?

Thank you!

Instance Segmentation

Can SegFormer be applied to other tasks like instance segmentation?

Error: AlignedResize is not in the pipeline registry

Currently trying to run the evaluation script:
python tools/test.py local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file

results in the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
  File "/content/mmsegmentation/mmseg/datasets/pipelines/test_time_aug.py", line 59, in __init__
    self.transforms = Compose(transforms)
  File "/content/mmsegmentation/mmseg/datasets/pipelines/compose.py", line 22, in __init__
    transform = build_from_cfg(transform, PIPELINES)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
    f'{obj_type} is not in the {registry.name} registry')
KeyError: 'AlignedResize is not in the pipeline registry'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
  File "/content/mmsegmentation/mmseg/datasets/ade.py", line 91, in __init__
    **kwargs)
  File "/content/mmsegmentation/mmseg/datasets/custom.py", line 88, in __init__
    self.pipeline = Compose(pipeline)
  File "/content/mmsegmentation/mmseg/datasets/pipelines/compose.py", line 22, in __init__
    transform = build_from_cfg(transform, PIPELINES)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: "MultiScaleFlipAug: 'AlignedResize is not in the pipeline registry'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/test.py", line 166, in <module>
    main()
  File "tools/test.py", line 122, in main
    dataset = build_dataset(cfg.data.test)
  File "/content/mmsegmentation/mmseg/datasets/builder.py", line 73, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: 'ADE20KDataset: "MultiScaleFlipAug: \'AlignedResize is not in the pipeline registry\'"'

as AlignedResize seems to be missing an import in SegFormer/mmseg/datasets/pipelines/__init__.py. This pull request adds the import accordingly.

Is it possible to use SegFormer for Saliency Object Detection Task?

Hi @xieenze , thanks for your great work. I am wondering that whether you have tested ViT based methods for SOD tasks which normally have higher standard for output mask quality. For example, if I need to segment a human portrait, the hair(very thin object) should also be precisely segment out. When I check the demo for SegFormer, I notice that the mask quality especially on the edge is not that good. Do you think this is caused by the labelled data or the network resolution itself?

Segformer for medical image segmentation

Hi, I want to ask whether Segformer is suitable for medical image segmentation？

Question about normalization (Mean/Std) value different from swin pretrained backbone

Thanks for your great work on making transformer model working so well on semantic segmentation. I have a question regarding the normalization value of mean and std (I have also observed this in maskFormer too, so that feel really confused).

For training swin transformer, the original swin transformer import std and mean from timm with the following value:

In your work, mean and std have been set to the following value:

It would be really appreciated if you could give any information on this! Thanks!

Performance on ImageNet?

I see that every segmenation config need a pretrained model on Imagenet, so can you provide the performance of B0 - B5 on ImageNet?

why the b0 results is so bad at cityscapes, when I train from scratch(don't use pre-trained backbone)

Porting SegFormer to HuggingFace Transformers

Hi guys,

First of all thanks for this impressive (and simple) model!

I'd like to port this model to HuggingFace Transformers, which, as you might know, is a library that includes a lot of Transformer-based models (mostly NLP models like BERT and RoBERTa, but recently I've added the Vision Transformer (ViT), DeiT and DETR to the library, so I think SegFormer definitely deserves its place there too!).

The API I had in mind could look something like this (very similar to ViT):

from transformers import SegFormerFeatureExtractor, SegFormerForImageSegmentation
from PIL import Image
import requests

feature_extractor = SegFormerFeatureExtractor.from_pretrained("nvidia/segformer-b0-fine-tuned-ade-512-512")
model = SegFormerForImageSegmentation("nvidia/segformer-b0-fine-tuned-ade-512-512")

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits # shape (batch_size, num_labels, height/4, width/4)

The main advantage would be that people could train the SegFormer model within a Colab notebook with ease just using a native PyTorch training loop or with frameworks like PyTorch Lighting, HuggingFace Accelerate, etc., and also perform inference very easily as shown above. No scripts required!

The feature extractor should not be a fully-fledged preprocessor, it would probably just need to resize + normalize images, such that they can be fed to the model. I guess resizing to 512x512 is a good default option. I would perhaps include a post_process method, that can be used to convert the logits of the model to an actual image of the semantic segmentation.

All model checkpoints can be hosted for free on the hub, under the NVIDIA namespace (which currently includes models like Megatron-GPT-2).

Are you interested in helping me finishing up this model? My main questions would be:

what are the most basic image + mask transformations that would work in order to perform inference + fine-tune on a custom dataset? What should the values of the image size be for each of the checkpoints? It seems that for the 512x512 ADE model, the shortest side is 512?
I guess that if the feature extractor resizes (rescales) images to 512x512, the corresponding masks also need to be resized. But as the model predicts masks at resolution 128x128, does the feature extractor need to resize them to this resolution?
how is the loss defined, is this just the CrossEntropyLoss between the predicted mask and the ground truth mask?

Training details

Hi, I'm trying to reproduce SegFormer on PASCAL VOC dataset. When using the codes of this repo, I could get ~77% mIoU (without multi-scale test). However, I only got ~75% mIoU with my reproduced code. Here are my training details.

I have reproduced the training and validation data pipeline, including random scaling , random horizontal flipping , and random cropping, etc. For the model, I used the code of this repo and the pre-trained weights. I also used an AdamW optimizer with a warmup scheduler. The other optimizer settings are set as the same with this repo.

Therefore, I'm wondering if there are any extra training details in SegFormer or mmseg itself. I'll very appreciate for your reply.

MiT-B1 Decoder Parameter Count Question

Hello!

In the paper, the decoder parameter count of the B1 model was reported to be 0.6 million parameters in Table 1.

I may be misunderstanding, but from what I can see the decoder parameter count is a function of the feature map channels. Looking at the code and Table 6., the B1 model has the exact same feature channel size as the B2-B5 models, all of which have a decoder size of 3.3million parameters.

Am I missing something as to why the parameter count of the B1 model is 0.6 million parameters and not 3.3million?

How to reduce the dataset?

In your paper, the max iteration is 160000, and the ADE20K datasets contains 20010 pictures. I choose the the part of the pictures of ADE20k and train by your model, the training time is same as the whole datasets. Can you kindly tell me how to reduce the training time using part of the dataset?

train error mit_b1.pth is not a checkpoint file

It's a great honor for me to study your reserch, when i download the pretrained model into pretrained directory . It shows as follows, hope you can give me some advice. Thanks for your time and kindness.

ade20k_repeat is the result of article? or Just ade20k

the article provides the result of ade20k. Is it just trained by ade20k? or ade20k repeated?

How did you get the calculation?

I use mmsegmentation official get_flop.py but the result is smaller than yours.

How can I fix this error？Thanks!

Default process group is not initialized

I'm trying to run the training code as follows:
python tools/train.py local_configs/segformer/B0/segformer.b0.512x512.ade.160k.py

I've changed the dataset to a custom dataset (the one given in the OpenMMLab Tutorial)

Getting the below error. The tutorial notebook works fine in the original OpenMMLab repo. Do you have any insight into why this might be happening and what I have to change to make it run?

Traceback (most recent call last):
  File "tools/train.py", line 181, in <module>
    main()
  File "tools/train.py", line 177, in main
    meta=meta)
  File "/content/mmsegmentation/mmseg/apis/train.py", line 115, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/iter_based_runner.py", line 131, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/content/mmsegmentation/mmseg/models/segmentors/base.py", line 152, in train_step
    losses = self(**data_batch)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
    return old_func(*args, **kwargs)
  File "/content/mmsegmentation/mmseg/models/segmentors/base.py", line 122, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/content/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 158, in forward_train
    gt_semantic_seg)
  File "/content/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 102, in _decode_head_forward_train
    self.train_cfg)
  File "/content/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 188, in forward_train
    seg_logits = self.forward(inputs)
  File "/content/mmsegmentation/mmseg/models/decode_heads/segformer_head.py", line 82, in forward
    _c = self.linear_fuse(torch.cat([_c4, _c3, _c2, _c1], dim=1))
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/bricks/conv_module.py", line 195, in forward
    x = self.norm(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/batchnorm.py", line 519, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 638, in get_world_size
    return _get_group_size(group)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
    _check_default_pg()
  File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 211, in _check_default_pg
    "Default process group is not initialized"
AssertionError: Default process group is not initialized

support for ASE20k part segmentation

ADE20K dataset contain part segmentation on top of object segmentation. how to achieve part segmentation in segformer

Evaluation报错无法执行成功

执行脚本：
python ./tools/test.py ./local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py ./checkpoints/trained/segformer.b1.512x512.ade.160k.pth
报错信息：
Traceback (most recent call last):
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "C:\xxxx\github\torch_env\lib\site-packages\mmseg\datasets\pipelines\test_time_aug.py", line 59, in init
self.transforms = Compose(transforms)
File "C:\xxxx\github\torch_env\lib\site-packages\mmseg\datasets\pipelines\compose.py", line 22, in init
transform = build_from_cfg(transform, PIPELINES)
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 44, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'AlignedResize is not in the pipeline registry'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "C:\xxxx\github\torch_env\lib\site-packages\mmseg\datasets\ade.py", line 84, in init
**kwargs)
File "C:\xxxx\github\torch_env\lib\site-packages\mmseg\datasets\custom.py", line 88, in init
self.pipeline = Compose(pipeline)
File "C:\xxxx\github\torch_env\lib\site-packages\mmseg\datasets\pipelines\compose.py", line 22, in init
transform = build_from_cfg(transform, PIPELINES)
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
KeyError: "MultiScaleFlipAug: 'AlignedResize is not in the pipeline registry'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./tools/test.py", line 166, in
main()
File "./tools/test.py", line 122, in main
dataset = build_dataset(cfg.data.test)
File "C:\xxxx\github\torch_env\lib\site-packages\mmseg\datasets\builder.py", line 73, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "c:\xxxx\github\mmcv\mmcv\utils\registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
KeyError: 'ADE20KDataset: "MultiScaleFlipAug: 'AlignedResize is not in the pipeline registry'"'

Would it be possible to obtain the code for viewing the ERF of Segformer?

Hello,

Would it be possible to upload the code for viewing the ERF of Segformer in order to compare it to other network architectures? The images in the paper look very interesting and I'd love to take a close look.

Thanks!

How to pre-train on Imagenet?

Hello,

First, thank you for your excellent work and code!

May I know how does segformer pre-train on Imagenet?

Maybe I missed some details in the paper but I didn't find the process of generating the classification token or pooling feature maps as other works do when training for the classification task.

Thank you!

Issue in the beginning

Hi, guys. I ran into an issue at the very beginning.
You can see it below.
Any advice?

Traceback (most recent call last):
File "tools/test.py", line 10, in
from mmseg.apis import multi_gpu_test, single_gpu_test
File "D:\Conda\envs\torch\lib\site-packages\mmseg\apis_init_.py", line 1, in
from .inference import inference_segmentor, init_segmentor, show_result_pyplot
File "D:\Conda\envs\torch\lib\site-packages\mmseg\apis\inference.py", line 8, in
from mmseg.models import build_segmentor
File "D:\Conda\envs\torch\lib\site-packages\mmseg\models_init_.py", line 1, in
from .backbones import * # noqa: F401,F403
File "D:\Conda\envs\torch\lib\site-packages\mmseg\models\backbones_init_.py", line 2, in
from .fast_scnn import FastSCNN
File "D:\Conda\envs\torch\lib\site-packages\mmseg\models\backbones\fast_scnn.py", line 6, in
from mmseg.models.decode_heads.psp_head import PPM
File "D:\Conda\envs\torch\lib\site-packages\mmseg\models\decode_heads_init_.py", line 4, in
from .cc_head import CCHead
File "D:\Conda\envs\torch\lib\site-packages\mmseg\models\decode_heads\cc_head.py", line 7, in
from mmcv.ops import CrissCrossAttention
File "D:\Conda\envs\torch\lib\site-packages\mmcv\ops_init_.py", line 1, in
from .bbox import bbox_overlaps
File "D:\Conda\envs\torch\lib\site-packages\mmcv\ops\bbox.py", line 3, in
ext_module = ext_loader.load_ext('ext', ['bbox_overlaps'])
File "D:\Conda\envs\torch\lib\site-packages\mmcv\utils\ext_loader.py", line 12, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "D:\Conda\envs\torch\lib\importlib_init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: DLL load failed

Can you share the config file for training COCO-stuff

Hi, thanks for sharing! I can't find the config for training COCO-stuff, can you share it?

Share Google Colab notebook

Hello, I am wondering if you, or any others here have this implemented in a google colab notebook that you can share. I am very new to this and would appreciate any assistance to set this up in google colab so that i can fine-tune on a small custom dataset (for school purposes). Thank you very much

How to change checkpoint saving frequency

Hi, first of all, thank you for your research and code.

I see that during training, the model is saved every 4000 iterations. Where can I change this spec, such that my model is saved every, lets say, 1000 iterations?

Thank you

Fp16OptimizerHook breaks training

First off, thanks for your awesome work.

I have no trouble getting SegFormer to train normally but if I configure with Fp16OptimizerHook then a short way into training (10-20k iterations) suddenly no classes are predicted:

optimizer_config=dict(type="Fp16OptimizerHook"),
fp16=dict(),

Is this expected? Has anyone got Fp16OptimizerHook to work with this repo?

Mapillary Class Remapping

Hello, I see that Mapillary uses a remapping to 19 classes,

SegFormer/mmseg/datasets/pipelines/transforms.py

Line 1025 in 3561d14

class MaillaryHack(object):

Does this mean the experiments done in the paper uses 19 classes for all methods on Mapilary?

Can you use some urls to release your pretrain weights?

By doing like so, the pretrain weights can be downloaded and loaded in model at the same time.

KeyError: "EncoderDecoder: 'mit_b1 is not in the models registry'"

How to solve this problem? I need your hand

About the pretrained model for B5

Hi, thanks for the great work! May I know if the pre-trained weights for B5 are trained with Mapillary Vistas or only with ImageNet-1K?

did not find the first embeding operation in the codes

The 3.Method said that "Given an image of size H × W × 3, we first divide it into patches of size 4 × 4" in the papers, however I did not find the step in your code, could you please point out for me? THanks !

training error: unrecognized arguments

When i use your script command 'python tools/train.py local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py
', it shows unrecognized arguments as follows. Can you tell me how to solve it ?

license

I was looking to port this to a website but saw the license restriction so decided not to
"3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use
non-commercially. Notwithstanding the foregoing, NVIDIA and its affiliates may use the Work and any derivative
works commercially. As used herein, “non-commercially” means for research or evaluation purposes only."
but I see that the authors are ok with it being ported to other websites #20
so does what does evaluation purposes only cover under the license?

Simple SegFormer network class

Hello
How are you?
Thanks for contributing to this project.
It is difficult for us to use this project because it contains many other scripts.
Did u check https://github.com/lucidrains/segformer-pytorch which is a third-party implementation for SegFormer?
This project contains ONLY a simple segformer network class so it is easy to use.
But the number of params of MiT-B0 network by this implementation is 7M.
I know that the number of params of MiT-B0 is 3.6M in the paper.
Could u check https://github.com/lucidrains/segformer-pytorch shortly?
If it is difficult, could u make the SegFormer network class like the above implementation?
Thanks

Results of models trained at 768x768 resolution on Cityscapes

Is the 80.5mIoU in Table 1 (c) the result obtained at 768x768 resolution for segformer-b2? Or could you provide some results obtained at a crop size of 768x768 during training? Thanks.

The memory usage is high?

I trained with batch size=2, image size is (1216, 1216), mit_b1. It uses almost 22GB GPU memory, is this normal?

nvlabs / segformer Goto Github PK

segformer's Introduction

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Project page | Paper | Demo (Youtube) | Demo (Bilibili) | Intro Video

Installation

Evaluation

Training

Visualize

License

Citation

segformer's People

Contributors

Stargazers

Watchers

Forkers

segformer's Issues

Recommend Projects

Recommend Topics

Recommend Org