openseg-group / openseg.pytorch Goto Github PK

View Code? Open in Web Editor NEW

1.2K 40.0 139.0 1.3 MB

The official Pytorch implementation of OCNet series and SegFix.

License: MIT License

Python 65.07% Shell 22.17% Cuda 6.82% C 3.68% C++ 2.14% Cython 0.12%

openseg.pytorch's People

Contributors

Stargazers

Watchers

Forkers

kekedan syzlhh shiyongde giserh wohaiyo lonestar686 yangsenwxy ashwathaithal leeseyun haitaobiyao happog amirunpri2018 pandinosaurus nnu-gisa xiaoyufenfei shabbirmarzban xiezixiustc dulvqingyun solomon1588 zhulei2016 phymucs sweaterr hell-to-heaven xingbaji neverstoplearn wuzhihao7788 liuyangshan wsxwd dadream meenakshiravisankar lovejoan99 mohan1914 garfield2005 youtang1993 devolfnn dbofseuofhust mengkunzhao shuai-xie achreff tianweiy yangyuren03 hanyeliu waterbearbee myknowntime arasharchor shannongxn cv-ip zeta1999 suyanzhou626 jstzwjr guofenggitlearning freegliboracle xrosliang justbiubiu d123456ddq yangtong1989 piedrahitacarol zerinhwang03 guoxiw mikeswf mayinjin feiyang2008 elle1994 hmchuong fangpanliang wooruang michael7roy shwe-creator lmm077 jingzhengli tomsirliu tenglang123 tristangomez44 qxcnwu xiezhonghua tamwaiban domaey1 roger1993 kylezhengai xinlingqiu sailfish009 shmnl whatsups cnnandbn dreamer121121 xuewengeophysics wr19960001 24werewolf jasonfinisher123 mymuli shikishima-tasakilab hyalvin jerry365 xuliangcs tian-ux543 alando-kevin j30206868 quant1766 niklas-ad haojunyu1998

openseg.pytorch's Issues

SegFix: Model-Agnostic Boundary Refinement for Segmentation can not find

Hi :
I can not download this paper, could you offer me this paper pdf format ? or linklist also ok. Thanks a lot

Difference between "openseg.pytorch" and "HRNet-Semantic-Segmentation"

Dear Authors,

Could you clarify the difference between the "HRNet-OCR" at (1. https://github.com/openseg-group/openseg.pytorch)
and the "HRNet-OCR" at (2. https://github.com/HRNet/HRNet-Semantic-Segmentation/tree/HRNet-OCR)?

Is the difference very minor and can be ignored?
Or the difference is just about the pytorch version (e.g., version 0.41 for 1. and version 1.1 for 2. in the training using the Cityscapes dataset)?
What's your advice (which one to start with) for the beginners?

Many thanks!

when i use H_SEGFIX.json to train cityscapes datasets meet the error:

In loss_heleper.py
In the calculation of loss function, the input is two tensors[1,8,128,128] /[1,2,128,128], and the corresponding label of single is three tensors.[1,512,512],[1,512,512],[1,512,512]

targets=targets_.clone().unsqueeze(1).float()
AttributeError:'list' object has no attribute 'clone'

Guildline on How to use the Segfix method?

Could you please improve the doc about how to use the segfix method？I'm a little bit confused about how to generate the offset file.

Any plan to release higher mIOU checkpoints?

Is the released model the same as the one achieved mIOU 84.5?

questions/issues on training segfix with own data

I was excited to try segfix training on my own data.

I could produce the mat files for train and val data.
Training works with run_h_48_d_4_segfix.sh and loss convergences. But on the validation the IoU is more or less random (I have 2 classes)

2020-08-20 10:47:41,932 INFO [base.py, 32] Result for mask
2020-08-20 10:47:41,932 INFO [base.py, 48] Mean IOU: 0.7853758111568029
2020-08-20 10:47:41,933 INFO [base.py, 49] Pixel ACC: 0.9692584678389714
2020-08-20 10:47:41,933 INFO [base.py, 54] F1 Score: 0.7523384841507573 Precision: 0.7928424176432377 Recall: 0.7157718538603068
2020-08-20 10:47:41,933 INFO [base.py, 32] Result for dir (mask)
2020-08-20 10:47:41,933 INFO [base.py, 48] Mean IOU: 0.5390945167184129
2020-08-20 10:47:41,933 INFO [base.py, 49] Pixel ACC: 0.7248566725097775
2020-08-20 10:47:41,933 INFO [base.py, 32] Result for dir (GT)
2020-08-20 10:47:41,934 INFO [base.py, 48] Mean IOU: 0.41990305666871003
2020-08-20 10:47:41,934 INFO [base.py, 49] Pixel ACC: 0.6007717101395131

to investigate the issue further I tried to analyse the predicted mat files with
bash scripts/cityscapes/segfix/run_h_48_d_4_segfix.sh segfix_pred_val 1

with "input_size": [640, 480] this exception happens:
File "/home/rsa-key-20190908/openseg.pytorch/lib/datasets/tools/collate.py", line 108, in collate
assert pad_height >= 0 and pad_width >= 0
after fixing it more or less, iv got similar results as val during training
They were around 3Kb instead of ~70kb
btw, it took "input_size": [640, 480] config from "test": { leave instead "val": {

is it possible validation only works with "input_size": [2048, 1024],?
Can you give me any hints how to manually verify the .mat files of there correctness? Currently I'm diving into 2007.04269.pdf and the code of dt_offset_generator.py to get an understanding.

Problem with OCR similarity map

Thanks for sharing this wonderful work with us!

I have a problem with the computing of similarity map in the OCR module.
In line 131 in lib/models/seg_hrnet_orc.py
sim_map = (self.key_channels**-.5) * sim_map
Why multiply a small value (self.key_channels**-.5) to sim_map before softmax?

During validation, I have printed the final result of sim_map and I found all values in this map are very close to 0.0526 (equals to 1/19), which means the probabilities of a pixel i belong to different classes k are almost equal.
Is this contradicting the assumption that the similarity map should represent the relation between the _i_th pixel and the _k_th object region?

#######################

Your former answer:

Multiplying the small value is following the original self-attention scheme. Please refer to the last paragraph of 3.2.1 in the paper "Attention Is All You Need". However, we find this small factor does not influence the segmentation performance.
As the final result of the sim_map, we do not understand why all the values are almost the same in your case. What checkpoints are you testing? How about the performance of the used checkpoint? Please provide more information so that we can help you.

#########################

Thanks a lot for your reply!
I used the checkpoint posted on HRNet-OCR. The segmentation performance is good ad the mIoU is 81.6, too.

In inference, I have printed 10 random rows in the sim_map like below:

All values in this map are very close to 0.0526 (equals to 1/19).

Pretrained models of segfix on ade20k

why not release segfix weights pretrained on ade20k dataset ?

I can't find it in the MODEL_ZOO page.

Training stucks every time when using 8 gpus

Does current framework support training with 8 gpus? Pytorch version == 0.4.1

segfix offsets

Hi thanks for releasing the code, first thing was to try it by myself!

All worked very well, i successfully trained and validated with own images and own label files.
But what I don't get is, how I can generate own *.mat files to run the segfix. You provided only mat files for cityscape but how to generate them for an own dataset?

When starting val with segfix (scripts/cityscapes/hrnet/run_h_48_d_4_ocr.sh segfix 3 val) I recieve:

FileNotFoundError: [Errno 2] No such file or directory: openseg.pytorch/data/cityscapes/val/offset_pred/semantic/offset_hrnext/is-03-08-2019-normal-98-001089.mat'

Is it possible to train segfix and ocrnet in an end-to-end way(multi-task)?

Question on the implementation of spatial_ocr_block.py

Hi, noted that for spatial_ocr_block.py，this implementation is diffrent with that of your HRNet+OCR in the sub module f_object,f_pixel,f_down,f_up and so on. All the convolution layer in those sub modules are followed with a bias in this implementation, while the option 'bias' is set to 'False' in all convolution layer of those sub modules in the implementation of your HRNet+OCR. What's the motivation or different effect of the two implementation?

Error in loading the checking point for validation (Cityscapes, both ResNet-101 and HRNet-W48)

Dear Author,

I am trying to use the pretrained models (ResNet-101 or HRNet-W48 backbones) in my work, but similar errors are reported for both backbones.

checkpoint names:
checkpoints/cityscapes/hrnet_w48_ocr_1_latest.pth
checkpoints/cityscapes/spatial_ocrnet_deepbase_resnet101_dilated8_1_latest.pth
commands:
(for HRNet-W48:)
python -u main.py --configs configs/cityscapes/H_48_D_4.json --drop_last y --backbone hrnet48 --model_name hrnet_w48_ocr --checkpoints_name hrnet_w48_ocr_1 --phase test --gpu 0 --resume ./checkpoints/cityscapes/hrnet_w48_ocr_1_latest.pth --loss_type fs_auxce_loss --test_dir input_images --out_dir output_images
(for ResNet101:)
python -u main.py --configs configs/cityscapes/R_101_D_8.json --drop_last y --backbone deepbase_resnet101_dilated8 --model_name spatial_ocrnet --checkpoints_name spatial_ocrnet_deepbase_resnet101_dilated8_1 --phase test --gpu 0 --resume ./checkpoints/cityscapes/spatial_ocrnet_deepbase_resnet101_dilated8_1_latest.pth --loss_type fs_auxce_loss --test_dir input_images --out_dir output_images
environments:
python 3.7.3 h33d41f4_1 conda-forge
pytorch 1.1.0 py3.7_cuda10.0.130_cudnn7.5.1_0 PyTorch
torchcontrib 0.0.2
torchvision 0.3.0 py37_cu10.0.130_1 pytorch
gcc (GCC) 7.2.0
cuda 10.0
Error messages:
RuntimeError: unexpected key in source state_dict: conv_3x3.1.weight, conv_3x3.1.bias, conv_3x3.1.running_mean, conv_3x3.1.running_var, spatial_ocr_head.object_context_block.f_pixel.1.weight, spatial_ocr_head.object_context_block.f_pixel.1.bias, spatial_ocr_head.object_context_block.f_pixel.1.running_mean, spatial_ocr_head.object_context_block.f_pixel.1.running_var, spatial_ocr_head.object_context_block.f_pixel.3.weight, spatial_ocr_head.object_context_block.f_pixel.3.bias, spatial_ocr_head.object_context_block.f_pixel.3.running_mean, spatial_ocr_head.object_context_block.f_pixel.3.running_var, spatial_ocr_head.object_context_block.f_object.1.weight, spatial_ocr_head.object_context_block.f_object.1.bias, spatial_ocr_head.object_context_block.f_object.1.running_mean, spatial_ocr_head.object_context_block.f_object.1.running_var, spatial_ocr_head.object_context_block.f_object.3.weight, spatial_ocr_head.object_context_block.f_object.3.bias, spatial_ocr_head.object_context_block.f_object.3.running_mean, spatial_ocr_head.object_context_block.f_object.3.running_var, spatial_ocr_head.object_context_block.f_down.1.weight, spatial_ocr_head.object_context_block.f_down.1.bias, spatial_ocr_head.object_context_block.f_down.1.running_mean, spatial_ocr_head.object_context_block.f_down.1.running_var, spatial_ocr_head.object_context_block.f_up.1.weight, spatial_ocr_head.object_context_block.f_up.1.bias, spatial_ocr_head.object_context_block.f_up.1.running_mean, spatial_ocr_head.object_context_block.f_up.1.running_var, spatial_ocr_head.conv_bn_dropout.1.weight, spatial_ocr_head.conv_bn_dropout.1.bias, spatial_ocr_head.conv_bn_dropout.1.running_mean, spatial_ocr_head.conv_bn_dropout.1.running_var, dsn_head.1.weight, dsn_head.1.bias, dsn_head.1.running_mean, dsn_head.1.running_var

It looks like that the checkpoint model and the running model does not match at some layers. Could you take a look please? Thank you very much!

关于ISA 代码

PyTorch Version

Hello,
in the requirements.txt it is recommended to use

torch==0.4.1
torchvision==0.2.1

versions. But are the newer versions of pytorch with CUDA 10 support supported?

what is "SegFix" scheme ？

.pth files dont match .sh scripts, will raise RuntimeError when load_state_dict

.pth files dont match .sh scripts, will raise RuntimeError when load_state_dict, such as:

ocr/Cityscapes/hrnet_w48_ocr_1_latest.pth does not math checkpoints/cityscapes,
will raise RuntimeError when load_state_dict in segmentor/tools/module_runner.py#L156
https://github.com/openseg-group/openseg.pytorch/blob/master/segmentor/tools/module_runner.py#L156

Which script to use for reproducing the results on LIPdataset. Please Help!

"ce2p_auxce_loss" is not defined in loss/loss_manager.py

Which is required for LIP/R_101_D_16.json

SegFix cannot be used in my own dataset?

SegFix is just used to citiscapes, is right? Because my own dataset hasnot the *.mat offset files.

Is the Mapillary the research edition or commercial edition?

As the title.
And any config details or suggestions about pretraining on Mapillary will be appreciated!

Thanks!

Guidelines on how to train the model your own dataset.

Could you please, improve the documentation about how can we use the library with pre-trained model ?

I would like to use it on my own dataset if possible.
Thanks

RuntimeError: Ninja is required to load C++ extensions

I met this RuntimeError: Ninja is required to load C++ extensions when the program running, could you pls help me to solve it?
Thanks a lot!
Ps, I've installed ninja and its version is 1.10.1

SegFix paper link

Hi!

Thanks for your nice work. It is really impressive. I'm interested in the SegFix algorithm.
Could you send a copy of the paper "SegFix: Model-Agnostic Boundary Refinement for Segmentation", since I cannot find it on arXiv.

Best,
David

out of memory when val during training

I met OOM problem when validate during training phase.
Here is the log:
World size: 4
['--configs', 'configs/cityscapes/R_101_D_8.json', '--drop_last', 'y', '--phase', 'train', '--gathered', 'n', '--loss_balance', 'y', '--log_to_file', 'n', '--backbone', 'deepbase_resnet101_dilated8', '--model_name', 'base_ocnet', '--gpu', '0', '1', '2', '3', '--distributed', '--data_dir', './dataset/cityscapes', '--loss_type', 'fs_auxce_loss', '--max_iters', '40000', '--checkpoints_name', 'base_ocnet_deepbase_resnet101_dilated8_20201029', '--pretrained', './pretrained_model/resnet101-imagenet.pth']

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

2020-10-30 23:23:59,797 INFO 2020-10-30 23:23:59,797 INFO 2020-10-30 23:23:59,797 INFO 2020-10-30 23:23:59,800 INFO 2020-10-30 23:23:59,800 INFO 2020-10-30 23:23:59,800 INFO 2020-10-30 23:23:59,808 INFO 2020-10-30 23:23:59,808 INFO 2020-10-30 23:23:59,811 INFO 2020-10-30 23:23:59,811 INFO 2020-10-30 23:24:00,360 INFO 2020-10-30 23:24:00,361 INFO 2020-10-30 23:24:00,752 INFO 2020-10-30 23:24:00,752 INFO 2020-10-30 23:24:00,752 INFO 2020-10-30 23:24:00,759 INFO 2020-10-30 23:24:00,760 INFO 2020-10-30 23:24:00,760 INFO 2020-10-30 23:24:00,763 INFO 2020-10-30 23:24:00,763 INFO 2020-10-30 23:24:00,771 INFO 2020-10-30 23:24:00,771 INFO 2020-10-30 23:24:01,344 INFO 2020-10-30 23:24:01,349 INFO 2020-10-30 23:24:06,815 INFO 2020-10-30 23:24:06,816 INFO 2020-10-30 23:24:06,816 INFO 2020-10-30 23:24:06,816 INFO 2020-10-30 23:24:06,817 INFO 2020-10-30 23:24:06,817 INFO 2020-10-30 23:24:06,818 INFO 2020-10-30 23:24:06,818 INFO 2020-10-30 23:24:06,818 INFO 2020-10-30 23:24:06,818 INFO 2020-10-30 23:24:06,818 INFO 2020-10-30 23:24:06,818 INFO 2020-10-30 23:24:06,855 INFO 2020-10-30 23:24:06,855 INFO 2020-10-30 23:24:06,856 INFO 2020-10-30 23:24:06,857 INFO 2020-10-30 23:24:06,861 INFO 2020-10-30 23:24:06,861 INFO 2020-10-30 23:24:06,861 INFO 2020-10-30 23:24:06,861 INFO 2020-10-30 23:24:06,863 INFO 2020-10-30 23:24:06,863 INFO 2020-10-30 23:24:06,863 INFO 2020-10-30 23:24:06,863 INFO 2020-10-30 23:24:07,060 INFO 2020-10-30 23:24:07,061 INFO 2020-10-30 23:24:07,115 INFO 2020-10-30 23:24:07,115 INFO 2020-10-30 23:24:07,117 INFO 2020-10-30 23:24:07,117 INFO 2020-10-30 23:24:07,126 INFO 2020-10-30 23:24:07,126 INFO 2020-10-30 23:24:18,010 INFO Learning rate = [0.00999797497721687, [offset_helper.py, 54] engery/max-distance: 5 engery/min-distance: 0
[offset_helper.py, 61] direction/num_classes: 8 scale: 1
[offset_helper.py, 66] c4 align axis: False
[offset_helper.py, 54] engery/max-distance: 5 engery/min-distance: 0
[offset_helper.py, 61] direction/num_classes: 8 scale: 1
[offset_helper.py, 66] c4 align axis: False
[module_runner.py, 44] BN Type is inplace_abn.
[init.py, 17] Using evaluator: StandardEvaluator
[module_runner.py, 44] BN Type is inplace_abn.
[init.py, 17] Using evaluator: StandardEvaluator
[module_helper.py, 136] Loading pretrained model:./pretrained_model/resnet101-imagenet.pth
[module_helper.py, 136] Loading pretrained model:./pretrained_model/resnet101-imagenet.pth
[offset_helper.py, 54] engery/max-distance: 5 engery/min-distance: 0
[offset_helper.py, 61] direction/num_classes: 8 scale: 1
[offset_helper.py, 66] c4 align axis: False
[offset_helper.py, 54] engery/max-distance: 5 engery/min-distance: 0
[offset_helper.py, 61] direction/num_classes: 8 scale: 1
[offset_helper.py, 66] c4 align axis: False
[module_runner.py, 44] BN Type is inplace_abn.
[init.py, 17] Using evaluator: StandardEvaluator
[module_runner.py, 44] BN Type is inplace_abn.
[init.py, 17] Using evaluator: StandardEvaluator
[module_helper.py, 136] Loading pretrained model:./pretrained_model/resnet101-imagenet.pth
[module_helper.py, 136] Loading pretrained model:./pretrained_model/resnet101-imagenet.pth
[trainer.py, 78] Params Group Method: None
[trainer.py, 78] Params Group Method: None
[trainer.py, 78] Params Group Method: None
[trainer.py, 78] Params Group Method: None
[optim_scheduler.py, 66] Use lambda_poly policy with default power 0.9
[data_loader.py, 131] use the DefaultLoader for train...
[optim_scheduler.py, 66] Use lambda_poly policy with default power 0.9
[optim_scheduler.py, 66] Use lambda_poly policy with default power 0.9
[data_loader.py, 131] use the DefaultLoader for train...
[optim_scheduler.py, 66] Use lambda_poly policy with default power 0.9
[data_loader.py, 131] use the DefaultLoader for train...
[data_loader.py, 131] use the DefaultLoader for train...
[data_loader.py, 164] use DefaultLoader for val ...
[data_loader.py, 164] use DefaultLoader for val ...
[data_loader.py, 164] use DefaultLoader for val ...
[data_loader.py, 164] use DefaultLoader for val ...
[loss_manager.py, 54] use loss: fs_auxce_loss.
[loss_manager.py, 54] use loss: fs_auxce_loss.
[loss_manager.py, 39] use distributed loss
[loss_manager.py, 39] use distributed loss
[loss_manager.py, 54] use loss: fs_auxce_loss.
[loss_manager.py, 54] use loss: fs_auxce_loss.
[loss_manager.py, 39] use distributed loss
[loss_manager.py, 39] use distributed loss
[data_helper.py, 119] Input keys: ['img']
[data_helper.py, 120] Target keys: ['labelmap']
[data_helper.py, 119] Input keys: ['img']
[data_helper.py, 120] Target keys: ['labelmap']
[data_helper.py, 119] Input keys: ['img']
[data_helper.py, 120] Target keys: ['labelmap']
[data_helper.py, 119] Input keys: ['img']
[data_helper.py, 120] Target keys: ['labelmap']
[trainer.py, 219] Train Epoch: 0 Train Iteration: 10 Time 11.147s / 10iters, (1.115) Forward Time 3.996s / 10iters, (0.400) Backward Time 6.918s / 10iters, (0.692) Loss Time 0.029s / 10iters, (0.003) Data load 0.203s / 10iters, (0.020318)
0.00999797497721687] Loss = 3.54437590 (ave = 3.62284539)

2020-10-30 23:24:18,808 INFO [trainer.py, 259] 0 images processed

2020-10-30 23:24:19,588 INFO [trainer.py, 259] 0 images processed

2020-10-30 23:24:19,825 INFO [trainer.py, 259] 0 images processed

2020-10-30 23:24:19,840 INFO [trainer.py, 259] 0 images processed

Traceback (most recent call last):
File "/home/kururu/github/openseg.pytorch/main.py", line 227, in
model.train()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 365, in train
self.__train()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 240, in __train
self.__val()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 308, in __val
outputs = self.seg_net(*inputs)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/nets/ocnet.py", line 58, in forward
x = self.oc_module(x)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 153, in forward
priors = [stage(feats) for stage in self.stages]
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 153, in
priors = [stage(feats) for stage in self.stages]
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 89, in forward
sim_map = (self.key_channels-.5) * sim_map
RuntimeError: CUDA out of memory. Tried to allocate 4.10 GiB (GPU 0; 10.91 GiB total capacity; 5.16 GiB already allocated; 3.28 GiB free; 229.69 MiB cached)
Traceback (most recent call last):
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/kururu/anaconda3/envs/kururudev-torch-1-0/bin/python', '-u', '/home/kururu/github/openseg.pytorch/main.py', '--local_rank=3', '--configs', 'configs/cityscapes/R_101_D_8.json', '--drop_last', 'y', '--phase', 'train', '--gathered', 'n', '--loss_balance', 'y', '--log_to_file', 'n', '--backbone', 'deepbase_resnet101_dilated8', '--model_name', 'base_ocnet', '--gpu', '0', '1', '2', '3', '--distributed', '--data_dir', './dataset/cityscapes', '--loss_type', 'fs_auxce_loss', '--max_iters', '40000', '--checkpoints_name', 'base_ocnet_deepbase_resnet101_dilated8_20201029', '--pretrained', './pretrained_model/resnet101-imagenet.pth']' returned non-zero exit status 1.
Traceback (most recent call last):
File "main.py", line 178, in
handle_distributed(args_parser, os.path.expanduser(os.path.abspath(file)))
File "/home/kururu/github/openseg.pytorch/lib/utils/distributed.py", line 56, in handle_distributed
cmd=command_args)
subprocess.CalledProcessError: Command '['/home/kururu/anaconda3/envs/kururudev-torch-1-0/bin/python', '-u', '-m', 'torch.distributed.launch', '--nproc_per_node', '4', '/home/kururu/github/openseg.pytorch/main.py', '--configs', 'configs/cityscapes/R_101_D_8.json', '--drop_last', 'y', '--phase', 'train', '--gathered', 'n', '--loss_balance', 'y', '--log_to_file', 'n', '--backbone', 'deepbase_resnet101_dilated8', '--model_name', 'base_ocnet', '--gpu', '0', '1', '2', '3', '--distributed', '--data_dir', './dataset/cityscapes', '--loss_type', 'fs_auxce_loss', '--max_iters', '40000', '--checkpoints_name', 'base_ocnet_deepbase_resnet101_dilated8_20201029', '--pretrained', './pretrained_model/resnet101-imagenet.pth']' returned non-zero exit status 1.
Traceback (most recent call last):
File "/home/kururu/github/openseg.pytorch/main.py", line 227, in
model.train()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 365, in train
self.__train()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 240, in __train
self.__val()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 308, in __val
outputs = self.seg_net(*inputs)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/nets/ocnet.py", line 58, in forward
x = self.oc_module(x)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 153, in forward
priors = [stage(feats) for stage in self.stages]
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 153, in
priors = [stage(feats) for stage in self.stages]
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 89, in forward
sim_map = (self.key_channels-.5) * sim_map
RuntimeError: CUDA out of memory. Tried to allocate 4.10 GiB (GPU 3; 10.92 GiB total capacity; 5.46 GiB already allocated; 1019.50 MiB free; 3.89 GiB cached)
Traceback (most recent call last):
File "/home/kururu/github/openseg.pytorch/main.py", line 227, in
model.train()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 365, in train
self.__train()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 240, in __train
self.__val()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 308, in __val
outputs = self.seg_net(*inputs)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/nets/ocnet.py", line 58, in forward
x = self.oc_module(x)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 153, in forward
priors = [stage(feats) for stage in self.stages]
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 153, in
priors = [stage(feats) for stage in self.stages]
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 89, in forward
sim_map = (self.key_channels-.5) * sim_map
RuntimeError: CUDA out of memory. Tried to allocate 4.10 GiB (GPU 2; 10.92 GiB total capacity; 5.46 GiB already allocated; 1023.50 MiB free; 3.89 GiB cached)
Traceback (most recent call last):
File "/home/kururu/github/openseg.pytorch/main.py", line 227, in
model.train()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 365, in train
self.__train()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 240, in __train
self.__val()
File "/home/kururu/github/openseg.pytorch/segmentor/trainer.py", line 308, in __val
outputs = self.seg_net(*inputs)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/nets/ocnet.py", line 58, in forward
x = self.oc_module(x)
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 153, in forward
priors = [stage(feats) for stage in self.stages]
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 153, in
priors = [stage(feats) for stage in self.stages]
File "/home/kururu/anaconda3/envs/kururudev-torch-1-0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, kwargs)
File "/home/kururu/github/openseg.pytorch/lib/models/modules/base_oc_block.py", line 89, in forward
sim_map = (self.key_channels-.5) * sim_map
RuntimeError: CUDA out of memory. Tried to allocate 4.10 GiB (GPU 1; 10.92 GiB total capacity; 5.46 GiB already allocated; 523.50 MiB free; 4.38 GiB cached)

softmax dim of spatial_ocr_block

openseg.pytorch/lib/models/modules/spatial_ocr_block.py

Line 64 in 0acca42

probs = F.softmax(self.scale * probs, dim=2)# batch x k x hw

Quick question: dim should be 1 or 2? In my opinion, k represents the number of object classes. Maybe I misunderstood some detailed parts of the proposed method.

Why i can open the pretrained model pth file web？404

i cant open the model.zoo link about model.
like this:

just the ISA Resnet101 can be opened and downloaded.
could u provide the link again? thank u

Best pre-trained weights for HRNet + OCR

Hello,

Thanks for making the code and the pre-trained models available!

I would like to know to reproduce your results on the Cityscapes test set (mIoU 84.5/84.2 with/without SegFix) from your provided pre-trained model.

Should I take the 80000-iteration OCR HRNet-W48 weights that you listed in the model zoo?

Thank you in advance for your response.

关于isa模块疑惑

你好，从代码来看，通过长短两步，虽然每一步的图片的尺寸变小了，但是他的batchsize变大了啊，那么他是怎么减小了计算量和参数量的呢，非常不理解这点，请大佬解释一下

Test sets results

For comparison in our paper, we are looking for the detailed test set results (class IoUs) of these prediction files that you shared: https://drive.google.com/drive/folders/156vMABydr7btdPDBU6b9J-e0jJHuPI73
Do you happen to have a snapshot of the submission results obtained with these predictions?
Thank you for your consideration.

segfix调用

您好，我想直接调用segfix来优化下我的结果，请问我应该调用哪个程序？
期待您的回复，谢谢！

problem occured in hrnet_backbone.py

Dear Author,

Thank you for your excellent work, but some errors are reported for backbones.

checkpoint names:
checkpoints/cityscapes/hrnet_w48_ocr_1_latest.pth


commands:
(for HRNet-W48:)
python -u main.py --configs configs/cityscapes/H_48_D_4.json --drop_last y --backbone hrnet48 --model_name hrnet_w48_ocr --checkpoints_name hrnet_w48_ocr_1 --phase test --gpu 0 --resume ./checkpoints/cityscapes/hrnet_w48_ocr_1_latest.pth --loss_type fs_auxce_loss --test_dir input_images --out_dir output_images

Error messages:

2020-07-15 21:00:10,470 INFO [module_runner.py, 44] BN Type is inplace_abn.
Traceback (most recent call last):
File "main.py", line 214, in
model = Tester(configer)
File "/home/dai/code/semantic_segmentation/9/openseg.pytorch-master/segmentor/tester.py", line 69, in init
self._init_model()
File "/home/dai/code/semantic_segmentation/9/openseg.pytorch-master/segmentor/tester.py", line 72, in _init_model
self.seg_net = self.model_manager.semantic_segmentor()
File "/home/dai/code/semantic_segmentation/9/openseg.pytorch-master/lib/models/model_manager.py", line 81, in semantic_segmentor
model = SEG_MODEL_DICTmodel_name
File "/home/dai/code/semantic_segmentation/9/openseg.pytorch-master/lib/models/nets/hrnet.py", line 105, in init
self.backbone = BackboneSelector(configer).get_backbone()
File "/home/dai/code/semantic_segmentation/9/openseg.pytorch-master/lib/models/backbones/backbone_selector.py", line 34, in get_backbone
model = HRNetBackbone(self.configer)(**params)
File "/home/dai/code/semantic_segmentation/9/openseg.pytorch-master/lib/models/backbones/hrnet/hrnet_backbone.py", line 598, in call
bn_momentum=0.1)
File "/home/dai/code/semantic_segmentation/9/openseg.pytorch-master/lib/models/backbones/hrnet/hrnet_backbone.py", line 307, in init
self.bn1 = ModuleHelper.BatchNorm2d(bn_type=bn_type)(64, momentum=bn_momentum)
TypeError: 'NoneType' object is not callable

Could you please tell me what is wrong? thank you.

How to prepare the Cityscapes data

Hello. I'm trying to reproduce your CityScapes results for our BMVC paper.

after I followed the data directory format in the config.profile file and running bash ./scripts/cityscapes/hrnet/run_h_48_d_4_ocr.sh val 1 I get this error:

ERROR: Found no prediction for ground truth /home/arash/openseg.pytorch/dataset/cityscapes/val/label/munster_000027_000019_gtFine_labelIds.png

could you explain how did you prepare the data?
Thanks

Details about HRNetV2-W48 with OCR?

Paper explains how you combine ResNet with OCR. The output stride of (dilated) ResNet is 8 and you use the last two stages as inputs for OCR.

However, HRNetV2 has outputs at 4 different scales (output stride = [4, 8, 16, 32]). Can you explain how you combine them?

In addition, section 3.2 in paper states that the output size of stage 3&4 are H × W. Is this 1/8 of original input image size since the output stride is 8?

Source Code?

Hello, would you like share your source code?

OCR

when release the OCR module.

Seg-Fix and OCRnet question

Dear Author

Hello. Thank you for sharing the code about it.
I can get some insight into your code to solve my problem.
I have some questions about your code.
First, in the ocrnet.py, you apply the feature network after that use the OCR block after that you use the F.interpolat 2 times. However, I am wondering why you have two returns about the first interpolation result and the second interpolation result. Usual segmentation network uses the last segmentation as the final segmentation result.

Second, I want to use your segfix algorithm about my problem. However, I can not find the independent algorithm about it. I also can not find the paper. The readme only mentions that it is similar about the PointRend scheme. I am wondering is it CNN approach or use the extra others?
Thank you.

inplace_abn

Thanks for your job!
elif torch_ver == '1.2':
from inplace_abn import InPlaceABNSync
return InPlaceABNSync(num_features, **kwargs)
I cant't find the file of inplace_abn for torch_ver==1.2
and i want to know if i use this file if i need to install first

Any plan to transplant segfix to pytorch1.x

Pytorch0.4 is such an old version and very inconvenient to be used in a new machine. Is there any plan to transplant segfix to pytorch1.x ?

When will the implementation of OCR be released?

resnet50 pretrained model

Hi,

Would you consider releasing the resnet50 pretrained model?

Could you please provide the dataset preprocess script for coco-stuff dataset？

Thank you for your excellent algorithm.
Could you please provide the script that transfer the original coco-stuff dataset to the format for training？（train/image，train/label，val/image，val/label）Because I just found the scripts for other dataset（eg.cityscapes/LIP）

checkpoints had release?

Hi, have you release any checkpoints about OCNet

Cityscapes Instance offsets

Under the Cityscapes Semantic Segmentation section in model zoo following is written:
To apply SegFix, you should first down the offset files offset_instance.zip to $DATA_ROOT/cityscapes, and then extract the archive.

where offset_instance.zip is linked to offset_semantic.zip.

I was wondering whether you have released the instance offset and the link is wrong or it's just a typo?
In the case of a typo, can you provide the link for instance offsets?

where can i find the hrnet48.json

about json file， the input size and crop size should based on what

my dataset image size is 256*256，and i dont know how to modifiy the json file

{
    "dataset": "BDCI",
    "method": "fcn_segmentor",
    "data": {
      "image_tool": "cv2",
      "input_mode": "BGR",
      "num_classes": 7,
      "label_list": [0, 1, 2, 3, 4, 5, 6, 255],
      "data_dir": "~/DataSet/BDCI",
      "workers": 8
    },
   "train": {
      "batch_size": 16,
      "data_transformer": {
        "size_mode": "fix_size",
        "input_size": [256, 256],
        "align_method": "only_pad",
        "pad_mode": "random"
      }
    },
    "val": {
      "batch_size": 4,
      "mode": "ss_test",
      "data_transformer": {
        "size_mode": "fix_size",
        "input_size": [256, 256],
        "align_method": "only_pad"
      }
    },
    "test": {
      "batch_size": 4,
      "mode": "ss_test",
      "out_dir": "~/DataSet/BDCI/seg_result/BDCI",
      "data_transformer": {
        "size_mode": "fix_size",
        "input_size": [256, 256],
        "align_method": "only_pad"
      }
    },
    "train_trans": {
      "trans_seq": ["random_resize", "random_crop", "random_hflip", "random_brightness"],
      "random_brightness": {
        "ratio": 1.0,
        "shift_value": 10
      },
      "random_hflip": {
        "ratio": 0.5,
        "swap_pair": []
      },
      "random_resize": {
        "ratio": 1.0,
        "method": "random",
        "scale_range": [0.5, 2.0],
        "aspect_range": [0.9, 1.1]
      },
      "random_crop":{
        "ratio": 1.0,
        "crop_size": [256, 256],
        "method": "random",
        "allow_outside_center": false
      }
    },
    "val_trans": {
      "trans_seq": []
    },
    "normalize": {
      "div_value": 255.0,
      "mean_value": [0.485, 0.456, 0.406],
      "mean": [0.485, 0.456, 0.406],
      "std": [0.229, 0.224, 0.225]
    },
    "checkpoints": {
      "checkpoints_name": "fs_baseocnet_BDCI_seg",
      "checkpoints_dir": "./checkpoints/BDCI",
      "save_iters": 500
    },
    "network":{
      "backbone": "deepbase_resnet101_dilated8",
      "multi_grid": [1, 1, 1],
      "model_name": "base_ocnet",
      "bn_type": "inplace_abn",
      "stride": 8,
      "factors": [[8, 8]],
      "loss_weights": {
        "corr_loss": 0.01,
        "aux_loss": 0.4,
        "seg_loss": 1.0
      }
    },
    "logging": {
      "logfile_level": "info",
      "stdout_level": "info",
      "log_file": "./log/BDCI/fs_baseocnet_BDCI_seg.log",
      "log_format": "%(asctime)s %(levelname)-7s %(message)s",
      "rewrite": true
    },
    "lr": {
      "base_lr": 0.01,
      "metric": "iters",
      "lr_policy": "lambda_poly",
      "step": {
        "gamma": 0.5,
        "step_size": 100
      }
    },
    "solver": {
      "display_iter": 10,
      "test_interval": 1000,
      "max_iters": 40000
    },
    "optim": {
      "optim_method": "sgd",
      "adam": {
        "betas": [0.9, 0.999],
        "eps": 1e-08,
        "weight_decay": 0.0001
      },
      "sgd": {
        "weight_decay": 0.0005,
        "momentum": 0.9,
        "nesterov": false
      }
    },
    "loss": {
      "loss_type": "fs_auxce_loss",
      "params": {
        "ce_weight": [0.8373, 0.9180, 0.8660, 1.0345, 1.0166, 0.9969, 0.9754,
                      1.0489, 0.8786, 1.0023, 0.9539, 0.9843, 1.1116, 0.9037,
                      1.0865, 1.0955, 1.0865, 1.1529, 1.0507],
        "ce_reduction": "elementwise_mean",
        "ce_ignore_index": -1,
        "ohem_minkeep": 100000,
        "ohem_thresh": 0.9
      }
    }
}

here is my json file, and when i try to train my dataset, there is such sizemisbatch error...like:

and so on,
environment should be satisfied：

this is my val error:

and the config.profile:

this is my log file screenshot:

"Default process group has not been initialized" when run scripts in model zoo

run "bash scripts/cityscapes/hrnet/run_h_48_d_4_ocr.sh val 1". will get error as follows:

in segmentor/tester.py#L243(https://github.com/openseg-group/openseg.pytorch/blob/master/segmentor/tester.py#L243)

"Default process group has not been initialized"
AssertionError: Default process group is not initialized

How to draw pictures

Coarse Label Map，Offset Map，Refined Label Map，Distance Map， Direction Map and the last one，How to draw them。Which drawing software is used, which is a program, what is the name of the software, and can the program be open source?I want to apply Figure 2 and Figure 3 to my own grayscale map. If it can be open sourced, will it be possible in the near future?Thanks you very much.
您好，抱歉我的英语太渣了，想了解一下这3张图是如何制作的。哪些图用了画图软件，是什么软件，哪些用了程序，程序可以开源吗。我想把图2和图3应用到自己的灰度图上，如果可以开源，近期可以吗？谢谢各位大佬，万分感谢。

how can I train the hrnet+ocr+segfix?

thank you

how to use segmix directly

Hi，your work are so awesome！Congratulation！
I want to progress my prediction ,would you give a simple tutorial to use Segmix directly？
Thanks for your help!
您好，我想把Segmix应用到我的模型，请问能否出个关于Segmix代码的简单使用教程吗？
万分感谢！