Code Monkey home page Code Monkey logo

ademxapp's Introduction

ademxapp

Visual applications by the University of Adelaide

In designing our Model A, we did not over-optimize its structure for efficiency unless it was neccessary, which led us to a high-performance model without non-trivial building blocks. Besides, by doing so, we anticipate this model and its trivial variants to perform well when they are finetuned for new tasks, considering their better spatial efficiency and larger model sizes compared to conventional ResNet models.

In this work, we try to find a proper depth for ResNets, without grid-searching the whole space, especially when it is too costly to do so, e.g., on the ILSVRC 2012 classification dataset. For more details, refer to our report: Wider or Deeper: Revisiting the ResNet Model for Visual Recognition.

This code is a refactored version of the one that we used in the competition, and has not yet been tested extensively, so feel free to open an issue if you find any problem.

To use, first install MXNet.

Updates

  • Recent updates
    • Model A1 trained on Cityscapes
    • Model A1 trained on VOC
    • Training code for semantic image segmentation
    • Training code for image classification on ILSVRC 2012 (Still needs to be evaluated.)
  • History
    • Results on VOC using COCO for pre-training
    • Fix the bug in testing resulted from changing the EPS in BatchNorm layers
    • Model A1 for ADE20K trained using the train set with testing code
    • Segmentation results with multi-scale testing on VOC and Cityscapes
    • Model A and Model A1 for ILSVRC with testing code
    • Segmentation results with single-scale testing on VOC and Cityscapes

Image classification

Pre-trained models

  1. Download the ILSVRC 2012 classification val set 6.3GB, and put the extracted images into the directory:

    data/ilsvrc12/ILSVRC2012_val/
    
  2. Download the models as below, and put them into the directory:

    models/
    
  3. Check the classification performance of pre-trained models on the ILSVRC 2012 val set:

    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights models/ilsvrc-cls_rna-a_cls1000_ep-0001.params --split val --test-scales 320 --gpus 0 --no-choose-interp-method --pool-top-infer-style caffe
    
    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --split val --test-scales 320 --gpus 0 --no-choose-interp-method

Results on the ILSVRC 2012 val set tested with a single scale (320, without flipping):

model|top-1 error (%)|top-5 error (%)|download
:---:|:---:|:---:|:---:
[Model A](https://cdn.rawgit.com/itijyou/ademxapp/master/misc/ilsvrc_model_a.pdf)|19.20|4.73|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/V7dncO4H0ijzeRj)
[Model A1](https://cdn.rawgit.com/itijyou/ademxapp/master/misc/ilsvrc_model_a1.pdf)|19.54|4.75|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/NOPhJ247fhVDnZH)

Note: Due to a change of MXNet in padding at pooling layers, some of the computed feature maps in Model A will have different sizes from those stated in our report. However, this has no effect on Model A1, which always uses convolution layers (instead of pooling layers) for down-sampling. So, in most cases, just use Model A1, which was initialized from Model A, and tuned for 45k extra iterations.

New models

  1. Find a machine with 4 devices, each with at least 11G memories.

  2. Download the ILSVRC 2012 classification train set 138GB, and put the extracted images into the directory:

    data/ilsvrc12/ILSVRC2012_train/
    

    with the following structure:

    ILSVRC2012_train
    |-- n01440764
    |-- n01443537
    |-- ...
    `-- n15075141
    
  3. Train a new Model A from scratch, and check its performance:

    python iclass/ilsvrc.py --gpus 0,1,2,3 --data-root data/ilsvrc12 --output output --model ilsvrc-cls_rna-a_cls1000 --batch-images 256 --crop-size 224 --lr-type linear --base-lr 0.1 --to-epoch 90 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror
    
    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights output/ilsvrc-cls_rna-a_cls1000_ep-0090.params --split val --test-scales 320 --gpus 0
  4. Tune a Model A1 from our released Model A, and check its performance:

    python iclass/ilsvrc.py --gpus 0,1,2,3 --data-root data/ilsvrc12 --output output --model ilsvrc-cls_rna-a1_cls1000_from-a --batch-images 256 --crop-size 224 --weights models/ilsvrc-cls_rna-a_cls1000_ep-0001.params --lr-type linear --base-lr 0.01 --to-epoch 9 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror
    
    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights output/model ilsvrc-cls_rna-a1_cls1000_from-a_ep-0009.params --split val --test-scales 320 --gpus 0
  5. Or train a new Model A1 from scratch, and check its performance:

    python iclass/ilsvrc.py --gpus 0,1,2,3 --data-root data/ilsvrc12 --output output --model ilsvrc-cls_rna-a1_cls1000 --batch-images 256 --crop-size 224 --lr-type linear --base-lr 0.1 --to-epoch 90 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror
    
    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights output/ilsvrc-cls_rna-a1_cls1000_ep-0090.params --split val --test-scales 320 --gpus 0

It cost more than 40 days on our workstation with 4 Maxwell GTX Titan cards. So, be patient or try smaller models as described in our report.

Note: The best setting (prefetch-threads and prefetcher) for efficiency can vary depending on the circumstances (the provided CPUs, GPUs, and filesystem).

Note: This code may not accurately reproduce our reported results, since there are subtle differences in implementation, e.g., different cropping strategies, interpolation methods, and padding strategies.

Semantic image segmentation

We show the effectiveness of our models (as pre-trained features) by semantic image segmenatation using plain dilated FCNs initialized from our models. Several A1 models tuned on the train set of PASCAL VOC, Cityscapes and ADE20K are available.

  • To use, download and put them into the directory:

    models/
    

PASCAL VOC 2012:

  1. Download the PASCAL VOC 2012 dataset 2GB, and put the extracted images into the directory:

    data/VOCdevkit/VOC2012
    

    with the following structure:

    VOC2012
    |-- JPEGImages
    |-- SegmentationClass
    `-- ...
    
  2. Check the performance of the pre-trained models:

    python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_s8_ep-0001.params --split val --test-scales 500 --test-flipping --gpus 0
    
    python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_s8_coco_ep-0001.params --split val --test-scales 500 --test-flipping --gpus 0

Results on the val set:

model|training data|testing scale|mean IoU (%)|download
:---|:---:|:---:|:---:|:---:
Model A1, 2 conv.|VOC; SBD|500|80.84|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/YqNptRcboMD44Kd)
Model A1, 2 conv.|VOC; SBD; COCO|500|82.86|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/JKWePbLPlpfRDW4)

Results on the test set:

model|training data|testing scale|mean IoU (%)
:---|:---:|:---:|:---:
Model A1, 2 conv.|VOC; SBD|500|[82.5](http://host.robots.ox.ac.uk:8080/anonymous/H0KLZK.html)
Model A1, 2 conv.|VOC; SBD|multiple|[83.1](http://host.robots.ox.ac.uk:8080/anonymous/BEWE9S.html)
Model A1, 2 conv.|VOC; SBD; COCO|multiple|[84.9](http://host.robots.ox.ac.uk:8080/anonymous/JU1PXP.html)

Cityscapes:

  1. Download the Cityscapes dataset, and put the extracted images into the directory:

    data/cityscapes
    

    with the following structure:

    cityscapes
    |-- gtFine
    `-- leftImg8bit
    
  2. Clone the official Cityscapes toolkit:

    git clone https://github.com/mcordts/cityscapesScripts.git data/cityscapesScripts
  3. Check the performance of the pre-trained model:

    python issegm/voc.py --data-root data/cityscapes --output output --phase val --weights models/cityscapes_rna-a1_cls19_s8_ep-0001.params --split val --test-scales 2048 --test-flipping --gpus 0
  4. Tune a Model A1, and check its performance:

    python issegm/voc.py --gpus 0,1,2,3 --split train --data-root data/cityscapes --output output --model cityscapes_rna-a1_cls19_s8 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 8 --prefetcher process --cache-images 0 --backward-do-mirror
    
    python issegm/voc.py --gpus 0,1,2,3 --split train --data-root data/cityscapes --output output --model cityscapes_rna-a1_cls19_s8_x1-140 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights output/cityscapes_rna-a1_cls19_s8_ep-0140.params --lr-type linear --base-lr 0.0008 --to-epoch 64 --kvstore local --prefetch-threads 8 --prefetcher process --cache-images 0 --backward-do-mirror
    
    python issegm/voc.py --data-root data/cityscapes --output output --phase val --weights output/cityscapes_rna-a1_cls19_s8_x1-140_ep-0064.params --split val --test-scales 2048 --test-flipping --gpus 0

Results on the val set:

model|training data|testing scale|mean IoU (%)|download
:---|:---:|:---:|:---:|:---:
Model A1, 2 conv.|fine|1024x2048|78.08|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/2hbvpro6J4XKVIu)

Results on the test set:

model|training data|testing scale|class IoU (%)|class iIoU (%)| category IoU (%)| category iIoU(%)
:---|:---:|:---:|:---:|:---:|:---:|:---:
Model A2, 2 conv.|fine|1024x2048|78.4|59.1|90.9|81.1
Model A2, 2 conv.|fine|multiple|79.4|58.0|91.0|80.1
Model A2, 2 conv.|fine; coarse|1024x2048|79.9|59.7|91.2|80.8
Model A2, 2 conv.|fine; coarse|multiple|80.6|57.8|91.0|79.1

For more information, refer to the official leaderboard.

Note: Model A2 was initialized from Model A, and tuned for 45k extra iterations using the Places data in ILSVRC 2016.

MIT Scene Parsing Benchmark (ADE20K):

  1. Download the MIT Scene Parsing dataset, and put the extracted images into the directory:

    data/ade20k/
    

    with the following structure:

    ade20k
    |-- annotations
    |   |-- training
    |   `-- validation
    `-- images
        |-- testing
        |-- training
        `-- validation
    
  2. Check the performance of the pre-trained model:

    python issegm/voc.py --data-root data/ade20k --output output --phase val --weights models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 500 --test-flipping --test-steps 2 --gpus 0

Results on the val set:

model|testing scale|pixel accuracy (%)|mean IoU (%)|download
:---|:---:|:---:|:---:|:---:
[Model A1, 2 conv.](https://cdn.rawgit.com/itijyou/ademxapp/master/misc/ade20k_model_a1.pdf)|500|80.55|43.34|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/E4JeZpmssK50kpn)

Citation

If you use this code or these models in your research, please cite:

@Misc{word.zifeng.2016,
    author = {Zifeng Wu and Chunhua Shen and Anton van den Hengel},
    title = {Wider or Deeper: {R}evisiting the ResNet Model for Visual Recognition},
    year = {2016}
    howpublished = {arXiv:1611.10080}
}

License

This code is only for academic purpose. For commercial purpose, please contact us.

Acknowledgement

This work is supported with supercomputing resources provided by the PSG cluster at NVIDIA and the Phoenix HPC service at the University of Adelaide.

ademxapp's People

Contributors

itijyou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ademxapp's Issues

very low accuracy rates for ADE20K dataset??

Here is the results for two images, ADE_val_00001977.png and ADE_val_00000041.png. Any suggestions?

c:\ademxapp>python issegm/voc.py --data-root data\ade20k --output output --phase val --weight models\ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 504 --test-flipping --test-steps 2 --gpus 0
2017-01-13 10:47:14,700 Host start with arguments Namespace(backward_do_mirror=False, base_lr=None, cache_images=None, check_start=1, check_step=4, data_root='data\ade20k', dataset=None, debug=False, from_epoch=1, from_model='models\ade20k_rna-a1_cls150_s8_ep', gpus='0', kvstore='device', log_file='ade20k_rna-a1_cls150_s8_ep-0001.log', model='ade20k_rna-a1_cls150_s8', output='output', phase='val', prefetch_threads=1, prefetcher='thread', save_predictions=False, save_results=True, split='val', stop_epoch=None, test_flipping=True, test_scales='504', test_steps=2, to_epoch=None, weights='models\ade20k_rna-a1_cls150_s8_ep-0001.params')
2017-01-13 10:47:14,716 Host and model specs {'classes': 150, 'net_type': 'rna', 'net_name': 'a1', 'feat_stride': 8, 'dataset': 'ade20k'}
Level 0
[(64L, 3L, 224L, 224L)]
Level 1
[(64L, 64L, 224L, 224L)]
Level 2
First block on level 2, stride: 2, dilate: 1
[(64L, 128L, 112L, 112L)]
Level 3
First block on level 3, stride: 2, dilate: 1
[(64L, 256L, 56L, 56L)]
Level 4
First block on level 4, stride: 2, dilate: 1
[(64L, 512L, 28L, 28L)]
Level 5
First block on level 5, stride: 2, dilate: 1
[(64L, 1024L, 28L, 28L)]
Level 6
First block on level 6, stride: 2, dilate: 2
[(64L, 2048L, 28L, 28L)]
Level 7
[(64L, 4096L, 28L, 28L)]
2017-01-13 10:47:51,694 Host Done 1/2 with speed: 0.04/s
2017-01-13 10:47:51,694 Host pixel acc: 0.00%, mean acc: 0.00%, mean iou: 0.00%
2017-01-13 10:47:51,694 Host
[ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00]
2017-01-13 10:47:51,694 Host
[ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00]
2017-01-13 10:48:04,592 Host Done 2/2 with speed: 0.05/s
2017-01-13 10:48:04,592 Host pixel acc: 25.81%, mean acc: 0.67%, mean iou: 0.17%
2017-01-13 10:48:04,592 Host
[100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00]
2017-01-13 10:48:04,592 Host
[25.81 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00]
2017-01-13 10:48:04,608 Host Done in 40.04 s.

ilsvrc-cls_rna-a1_cls1000_ep-symbol.json

I want to use mx.model.FeedForward.load to load pretrained model, but it needs ilsvrc-cls_rna-a1_cls1000_ep-symbol.json.
Can you share ilsvrc-cls_rna-a1_cls1000_ep-symbol.json if you have one available?

Thanks,

retrain VOC data got very low IOU

I tuning a model from released Model A1(using weight voc_rna-a1_cls21_s8_coco_ep-0001.params ) on VOC data. the training params following:

python issegm/voc.py --gpus 3 --split train --data-root data/VOCdevkit --output output --model voc_rna-a1_cls21 --batch-images 20 --crop-size 224 --origin-size 500 --scale-rate-range 0.7,1.3 --weights models/voc_rna-a1_cls21_s8_coco_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 50 --prefetch-threads 4 --prefetcher thread --backward-do-mirror

And the Train-fcn_valid is:
2017-03-16 9 37 24

Then I use the my traing result voc_rna-a1_cls21_ep-0048.params to check its performance. and the val params following:

python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_ep-0048.params --split val --test-scales 500 --test-flipping --gpus 3

but the result IOU is only 56%, not 82.86%(the paper IOU). It's so weird.

2017-03-16 10:14:52,776 Host Done 1448/1449 with speed: 1.09/s
2017-03-16 10:14:52,776 Host pixel acc: 89.40%, mean acc: 67.73%, mean iou: 56.08%
2017-03-16 10:14:52,777 Host
[95.81 79.59 59.37 71.93 60.30 68.02 90.73 77.83 83.10 37.17 48.63 53.65
 61.79 68.01 77.06 89.20 40.15 65.97 45.22 69.16 79.67]
2017-03-16 10:14:52,777 Host
[89.65 64.67 38.20 58.13 51.16 54.65 71.36 70.44 68.06 26.75 45.69 47.13
 55.05 55.99 66.96 71.79 35.09 54.07 38.45 63.58 50.79]
2017-03-16 10:14:53,781 Host Done 1449/1449 with speed: 1.09/s
2017-03-16 10:14:53,782 Host pixel acc: 89.39%, mean acc: 67.70%, mean iou: 56.07%
2017-03-16 10:14:53,782 Host
[95.81 79.59 59.37 71.93 60.30 67.39 90.73 77.83 83.10 37.17 48.63 53.65
 61.79 68.01 77.06 89.20 40.15 65.97 45.22 69.16 79.67]
2017-03-16 10:14:53,783 Host
[89.64 64.67 38.20 58.13 51.07 55.00 71.01 70.44 68.06 26.75 45.69 47.08
 55.05 55.99 66.96 71.79 35.09 54.07 38.45 63.58 50.79]
2017-03-16 10:14:53,783 Host Done in 1331.76 s.

It's my training params not correct or other something setting wrong?

I guess the crop-size, origin-size in train and the test-scales in val affect the results. I'm not sure. Does anyone know ?

Building a model with 3D arrays

I have some data consisting of 3D arrays. I try to modify the network to accept the data.
Which functions should I change? I have changed the "data.py" to load my data correctly. Furthermore, the function "transformer.py" is changed to appropriate resample and resize the data. Right now the program processes arrays of size (224,224,224), but It return NaN's for each epoch and a reasonable training error for each batch.

Any ideas about what is going on?

Run errors

Trying to boot up this project I've come across a couple errors I was hoping you could point me in the right direction on. First, when downloading the ADE20K dataset, it doesn't provide an annotations/ directory like you show, only an images/. Are you unpacking this somehow?

Second, when running the example command:
python issegm/voc.py --data-root data/ade20k --output output --phase val --weight models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 504 --test-flipping --test-steps 2 --gpus 0

This fails on from util import transformer as ts because it is being executed from within the issegm/ directory. If you move that file up it then fails on loading other files. How are you running this currently?

how to get coco20 png annotations

I tried to re-implement the semantic segmentation experiment but found that coco images with .png annotations are included in train++.lst.
So how can I get the .png annotations?

Simple_bind error

@itijyou
Hello, I was running the program on VOC2012 dataset, but I found an error related to the MXnet program itself.
Following the example code provided by README, I called the code

python issegm/voc.py --data-root /data/workteam/leyang/py-faster-rcnn-master/data/VOCdevkit2012/ --output output --phase val --weights /data/workteam/leyang/ademxapp/models/voc_rna-a1_cls21_s8_ep-0001.params --split val --test -scales 500 --test-flipping --gpus 0

All paths here are verified to be listed correctly. However, I received the following error messages:

Traceback (most recent call last): File "issegm/voc.py", line 716, in <module> _val_impl(args, model_specs, logger) File "issegm/voc.py", line 639, in _val_impl mod.bind(data_shapes = dataiter.provide_data, label_shapes = dataiter.provide_label) File "/data/workteam/leyang/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/module/module.py", line 400, in bind state_names=self._state_names) File "/data/workteam/leyang/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/module/executor_group.py", line 214, in __init__ self.bind_exec(data_shapes, label_shapes, shared_group) File "/data/workteam/leyang/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/module/executor_group.py", line 310, in bind_exec shared_group)) File "/data/workteam/leyang/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/module/executor_group.py", line 586, in _bind_ith_exec shared_buffer=shared_data_arrays, **input_shapes) File "/data/workteam/leyang/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/symbol.py", line 1462, in simple_bind raise RuntimeError(error_msg) RuntimeError: simple_bind error. Arguments: data: (1, 3L, 504L, 504L) softmax_label: (1, 3969L) src/storage/storage.cc:95: Compile with USE_CUDA=1 to enable GPU usage

Can anyone help me please? Thanks in advance!

my own data set

hi,
i want to train my own data so i have some question.

  1. what is mean --crop-size and --origin-size like if my image in different size what is the origin-size and why i need max size also. and crop size why we do it becouse the ram or for the algorithem?
  2. my label now are in gray scale 0=bacground 100=myclass 255=ignorlabel there is another format that i need to do for my label and what id_to_label and label_to_id i need to create ?
    thank alot

Some questions about training which could be helpful for all

Hello,

I appreciate if you reply these questions. at least nobody will ask these on future.

  1. I have created a small dataset on VOC format and I want to train it using a pre-trained model. I should mention that number of classes are two. What should I do step by step.

  2. I have a single 6G GPU. can fine-tuning be done on this?

  3. How can I test the new model on an image, video or video stream (webcam or similar)?

MxNet, Python, CUDA, CUDNN versions

Hi,

Could you please share the versions of the following that you have used to train your models.
MxNet, Python, CUDA, CUDNN.

And maybe add it to the README also since it eases the task and are necessary information.

Thank you.

Training with new database

Hi,

I want to train the pre-trained model with other databases.

I just want to train a model with below script.

python issegm/voc.py --gpus 0,1,2,3 --split train --data-root ${New_database} --output output --model ${New_database}_rna-a1_cls19 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 4 --prefetcher thread --backward-do-mirror

What parts of codes do I have to modify? It is not easy to understand whole codes.

And, Do I have to use the same sizes of crop and origin (-crop-size 500 --origin-size 2048) in order to use pretrained weight?

Could you please explain it for me?

Thanks.

Expected training time on cityscape dataset

I am trying to train the cityscape model

However, it looks like my training gets stucked. I am using 1X k40C 12 GB and 1X Titan X 12 GB. with batch size equal 200 and crop size 100.
The last print from terminal was:
2017-02-10 00:27:06,021 Host Epoch[0] Batch [4] Speed: 24.24 samples/sec Train-fcn_valid=0.270395

Does anyone know the expected training time on this data?

Errors in full image test on cityscapes val dataset

I use the following codes for segmentation cityscapes val dataset.

python issegm/voc.py --data-root data/cityscapes --output output --phase val --weights models/cityscapes_rna-a1_cls19_s8_ep-0001.params --split val --test-scales 2048 --test-flipping --gpus 0

But I get the errors that are given below. However, if decrease the --test-scales to 1800, everything runs smoothly without troubles? I am sure it's not because of the GPU memory issues (I use a Titan X 12 Gb GPU). Any hints why this happens?


[22:28:43] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:304: [22:28:43] src/operator/./cudnn_convolution-inl.h:572: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1a64e8f) [0x7f5dd613ae8f]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x21e123) [0x7f5dd48f4123]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7f5dd52545bc]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

[22:28:43] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:304: [22:28:43] src/engine/./threaded_engine.h:329: [22:28:43] src/operator/./cudnn_convolution-inl.h:572: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1a64e8f) [0x7f5dd613ae8f]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x21e123) [0x7f5dd48f4123]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7f5dd52545bc]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e84f) [0x7f5dd525484f]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

terminate called after throwing an instance of 'dmlc::Error'
what(): [22:28:43] src/engine/./threaded_engine.h:329: [22:28:43] src/operator/./cudnn_convolution-inl.h:572: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1a64e8f) [0x7f5dd613ae8f]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x21e123) [0x7f5dd48f4123]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7f5dd52545bc]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e84f) [0x7f5dd525484f]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

Get very low accuracy rates on cityscapes dataset

I use 0.93 mxnet and use scripts below:

python issegm/voc.py --gpus 0,1 --split train --data-root /cityscapes --output output --model cityscapes_rna-a1_cls19_s8 --batch-images 10 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 8 --prefetcher process --cache-images 0 --backward-do-mirror

python issegm/voc.py --gpus 0,1 --split train --data-root /cityscapes --output output --model cityscapes_rna-a1_cls19_s8_x1-140 --batch-images 10 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights output/cityscapes_rna-a1_cls19_s8_ep-0140.params --lr-type linear --base-lr 0.0008 --to-epoch 64 --kvstore local --prefetch-threads 8 --prefetcher process --cache-images 0 --backward-do-mirror

python issegm/voc.py --data-root /cityscapes --output output --phase val --weights output/cityscapes_rna-a1_cls19_s8_x1-140_ep-0064.params --split val --test-scales 1024 --test-flipping --gpus 1

Host pixel acc: 37.65%, mean acc: 5.26%, mean iou: 1.98%

I know there is different from orignal command that use test-scales 2048, but i can load that with 12 GB memory, so i reduce this parameter.

I download the model and check md5, everything is ok.

is there gpu necessary?

hello , i got a RUN problem when i run the semantic segmentation code on my laptop ,
the error is "Operator _zeros cannot be run; requires at least one of FCompute......."
so i wonder whether gpu is necessary ?
my laptop has only cpus , am i right? thank you

How to run the semantic segmentation on my own images?

hi
so I did succesfully run the perf test on Pascal VOC with:
python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_s8_ep-0001.params --split val --test-scales 500 --test-flipping --gpus 0`

Now I only want to use the pre-trained model and test it on non-VOC images, so on my own images, but it seems that voc.py is highly specific to the voc dataset.

Can I use voc.py and some flag to run it on my own images or do I need to heavily modify the code so that it does not look for segmentationClass mask images from voc etc?

thanks
Tets

Run Errors, help

python issegm/voc.py --data-root data/ade20k --output output --phase val --weight models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 504 --test-flipping --test-steps 2 --gpus 0

and I get error from util import transformer as ts because it is being executed from within the issegm/

then I add sys.path.append(os.getcwd()) at issegm/voc.py line 16

rerun 'python issegm/voc.py --data-root data/ade20k --output output --phase val --weight models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 504 --test-flipping --test-steps 2 --gpus 0'

and I get Traceback (most recent call last): File "issegm/voc.py", line 578, in <module> _val_impl(args, model_specs, logger) File "issegm/voc.py", line 435, in _val_impl _, net_args, net_auxs = util.load_params(args.from_model, args.from_epoch) AttributeError: 'module' object has no attribute 'load_params'

so, what should I do next ? Thanks very much!

Scale rates of multiscale test in cityscapes

When I use the given ResNet 38-A1 model pretrained on cityscapes and do single scale testing, I get the mIOU 78.08% at val dataset.
As shown in the github page of ResNet 38, multiscale testing will boost the mIOU by about 1%. However, when I do multi-scale testing, (scales: 0.75, 0.875, 1) I get the mIOU 77.06%, that is lower than single scale testing. Could anyone tell me the scale rates for boosting the performance dramatically?

the predicted image is all black using trained model on VOC

Hello, I used the following command to train the segmentation model on PASCAL VOC 2012
/home/server6/Segmentation/Resnet/ademxapp-master/venv/bin/python2.7 issegm/voc.py --gpu 0,1 --split train --data-root /home/server6/xly/SSENet_self_supervised --output output --model voc_rna-a1_cls21 --batch-image 4 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights /home/server6/xly/SSENet_self_supervised/weights/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror

but when I used the trained model to predict the image in val set, the result is all black. Can you give me some advice on it? thanks~

On the architecture of your model

Hello, when I refer to the file ilsvrc_model_a.pdf, I found some units as following picture. It doesn't seem like residual unit (because of not identity map) and I failed to find any description in your paper. Could you give a clue? Thank you in advance!!

Where to get pretrained city scape model (cityscapes_rna-a1_cls19_s8_ep-0001.params)?

I can download the city scape dataset, but there's no model. The models directory is empty further, the cityscape website has no downloadable pre trained nets - just data. I open the checksum.md file and I see the following:

1faf29850bfa194678f0b8e1cbbffa98 ade20k_rna-a1_cls150_s8_ep-0001.params
226b3e861a6be7d0dc84e537f4eab154 cityscapes_rna-a1_cls19_s8_ep-0001.params
ff21f45d6bf03284100dcbec571edfad ilsvrc-cls_rna-a1_cls1000_ep-0001.params
2421c1945b6797cecd3f89db14ca73f6 ilsvrc-cls_rna-a_cls1000_ep-0001.params
328c0eca0c45b6345ada2f95edce68d4 voc_rna-a1_cls21_s8_coco_ep-0001.params
a34628a63d5f62dcb98c29c4e281f332 voc_rna-a1_cls21_s8_ep-0001.params

Where can I get these param files?

Thanks

Experimental setting for training ADE20K?

Hi @itijyou ,

Thanks for sharing this great work!
I'm trying to reproduce the result in ADE20K. Could you share your hyper-parameters when training on ADE20K, namely crop-size and origin-size, etc. Besides, if it's convenient for you, could you also share the tricks you used in testing stage?
Thanks!

Compatibility with latest mxnet (0.9.3)

Hi,
Thanks for sharing your work.

I'm newbie about mxnet.
So I just follow the instruction from official mxnet repo. (Install 0.9.3)
But I guess your code is not compatible with mxnet 0.9.3.
With mxnet 0.9.3, runtime error occured at voc.py.
https://github.com/itijyou/ademxapp/blob/master/issegm/voc.py#L639

src/operator/./cudnnconvolution-inl.h:517: Check failed: cudnnFindConvolutionForwardAlgorithm(s->dnn_handle_, indesc_, filter_desc_, conv_desc_, out_desc_, kMaxAlgos, &nalgo, fwd_algo) == CUDNN_STATUS_SUCCESS (4 vs. 0)

And Installing mxnet 0.8 you used is not clear for me.
Could you give any comment?

Thanks in advance.

Parameters for Pascal VOC segmentation

Hello,

I see you shared the training code for semantic segmentation and parameters used for the Cityscapes dataset for tuning from A1.

Could you please also share which settings where used for training voc_rna-a1_cls21_s8_ep-0001.params and voc_rna-a1_cls21_s8_coco_ep-0001.params from A1? Was it also with random cropping (or simple resize like at test-time), and is the number of epochs similar to the one of cityscapes?

Thank you

Only got 76.90% over Pascal VOC2012 val set

Hi,

I tried the Pascal VOC2012 trained model provided in the repository. However, I only got 76.90%, instead of 80.84% reported in the README.. I used the latest MXNet (v0.11.0). Do you have any idea?

Here is my log for the last three images.

2017-11-20 16:25:04,536 Host Done 1447/1449 with speed: 1.64/s
2017-11-20 16:25:04,537 Host pixel acc: 94.11%, mean acc: 82.53%, mean iou: 76.05%
2017-11-20 16:25:04,537 Host
[97.46 90.25 55.71 89.61 72.90 84.55 95.42 90.71 94.92 54.97 86.99 61.03
90.37 90.42 89.59 92.72 69.21 88.79 61.88 89.36 86.20]
2017-11-20 16:25:04,537 Host
[93.12 85.17 53.60 86.79 65.93 75.40 92.33 85.52 90.12 44.63 81.11 53.32
84.97 81.70 82.75 85.10 63.58 85.11 47.66 82.99 76.17]
2017-11-20 16:25:05,090 Host Done 1448/1449 with speed: 1.64/s
2017-11-20 16:25:05,091 Host pixel acc: 94.11%, mean acc: 82.53%, mean iou: 76.06%
2017-11-20 16:25:05,091 Host
[97.46 90.25 55.71 89.61 72.90 84.55 95.42 90.71 94.92 54.97 86.99 61.03
90.39 90.42 89.59 92.72 69.21 88.79 61.95 89.36 86.20]
2017-11-20 16:25:05,091 Host
[93.11 85.17 53.60 86.79 65.93 75.40 92.33 85.52 90.12 44.63 81.11 53.32
84.98 81.70 82.75 85.10 63.58 85.11 47.88 82.99 76.17]
2017-11-20 16:25:05,788 Host Done 1449/1449 with speed: 1.64/s
2017-11-20 16:25:05,788 Host pixel acc: 94.11%, mean acc: 82.53%, mean iou: 76.09%
2017-11-20 16:25:05,789 Host
[97.46 90.25 55.71 89.61 72.90 84.62 95.42 90.71 94.92 54.97 86.99 61.03
90.39 90.42 89.59 92.72 69.21 88.79 61.95 89.36 86.20]
2017-11-20 16:25:05,789 Host
[93.10 85.17 53.60 86.79 65.93 76.00 92.33 85.52 90.12 44.63 81.11 53.32
84.98 81.70 82.75 85.10 63.58 85.11 47.88 82.99 76.17]
2017-11-20 16:25:05,789 Host Done in 882.66 s.

MXNet error related to CUDA

Hi @itijyou,

On executing either the validation set/training set test I get the following error -

Check failed: e == cudaSuccess CUDA: invalid device ordinal

I have bene unsuccessful at resolving this error. The MNIST example from mxnet runs okay with GPU.
It seems like this error is coming from setting the wrong device_id for the GPU.
Any help in resolving this would be appreciated.

I am trying to run it on a machine with a single Nvidia GTX 1080.

Where is download link?

Readme.md has provide some instruction about how to evaluate pre-train model.
But i can't find any download link about those ones.
I only find checksum.md5 on model directory.

How to add one segmentation class based on VOC?

Hi,

I came across a problem when training model with new database according to the following scripts:

python issegm/voc.py --gpus 0,1 --split train --data-root data/VOCdevkit --output /media/bnrc/67d2526a-0bd3-46d4-bc05-5c67714aca9c/output/ --model pascal-context_rna-a1_cls22 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights models/voc_rna-a1_cls21_s8_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 600 --kvstore local --prefetch-threads 4 --prefetcher thread --backward-do-mirror

I added a class based on the original 21 classes. I labeled the new class with RGB(222,222,222) in the relative pictures and trained with the above scripts. But it is confused that there is no new class in the predicted pictures. Should I do something else?

Does anyone know this question? Lots of thanks.

Dixin

pre-trained model in ade20k

Hi, I wanna know where can I download "ade20k_rna-a1_cls150_s8_ep-0001.params" ? I read through readme in detail, but still didn't find the link for it. can you give me the link?? thank you very much!!

Question on training process on CityScape dataset

As described in article, the author said "we first resize an image by a ratio randomly sampled from [0.7, 1.3], and then generate a sample by cropping one 500ร—500 subwindow at a randomly selected location." If the input image is scaled, the ground truth should also be resized. Whether it is reasonable to resize a label map? I know the authors gave different interpolation methods in "util.py" , but I don't know if it is reasonable.

Caffe port

Hello,

Has anybody tried to create a train.prototxt for caffe?

Training semantic segmentation on greyscale images

I am using the format to train the network:
python issegm/voc.py --gpus 1 --split train --data-root ${New_database} --output output --model ${New_database}_rna-a1_cls${Number_of_classes} --batch-images 4 --crop-size 500 --origin-size 512 --scale-rate-range 0.7,1.3 --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 4 --prefetcher thread --backward-do-mirror

I have also edited the split files: prepare split files and save them into issegm/data/${New_database};

I have also edited in vog.py:

elif dataset == 'New_database':
num_classes = model_specs.get('classes', 2)
valid_labels = range(num_classes)
#
max_shape = np.array((512, 512))

Unfortunately, I receive the following error code:
issegm/voc.py:356: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 32 but corresponding boolean dimension is 512
pred_label = pred.argmax(1).ravel()[valid_flag]
Traceback (most recent call last):
File "issegm/voc.py", line 729, in
_train_impl(args, model_specs, logger)
File "issegm/voc.py", line 482, in _train_impl
num_epoch=args.stop_epoch,
File "/home/s123656/mxnet/python/mxnet/module/base_module.py", line 412, in fit
self.update_metric(eval_metric, data_batch.label)
File "/home/s123656/mxnet/python/mxnet/module/module.py", line 556, in update_metric
self._exec_group.update_metric(eval_metric, labels)
File "/home/s123656/mxnet/python/mxnet/module/executor_group.py", line 470, in update_metric
eval_metric.update(labels_slice, texec.outputs)
File "/home/s123656/mxnet/python/mxnet/metric.py", line 395, in update
reval = self._feval(label, pred)
File "issegm/voc.py", line 356, in _eval_func
pred_label = pred.argmax(1).ravel()[valid_flag]
IndexError: index 33 is out of bounds for axis 1 with size 32

Does this error have to do with the fact that I use greyscale images?
My images only contains binary label's : {1,2} respectively
How do you define the label images?

in the paper why the crop size is larger than the input size?

For networks trained with 224224 inputs, the testing crop size is 320320, following the setting used by He et al. [13]. For those with 112112 and 5656 inputs, we use 160160 and 8080 crops respectively.

as written is the paper,if we take 224*224 as input size,so why the testing crop size is 320*320(I think it should also be 224*224)?

Cityscapes Models Missing

Hi!

I am trying to test your method trained on the Cityscapes Dataset in another dataset, but couldn't find your model to download. Is it available?

Thanks!

Training a model with two classes only possible?

I tried to train a model with 2 classes only. As mask, I am using a PNG with valid values = {0, 1}. There are no other values in the mask. After training the model, class 1 has higher probability in all pixels in all images than class 0, thus class 1 always "wins". Furthermore, all pixels have then same values per class.

Question: Does the current implementation works for binary classification in general?

Training on VOC from Scratch

Hi,

I am attempting to train this network on VOC from scratch, essentially trying to recreate the pre-trained weights available for download; however, after 70+ epochs, my model is still just predicting background for an mIOU of 3.49%. Here is the command I am running to train:

python issegm/voc.py --gpus 1,2,3 --split train --data-root data/VOCdevkit/ --output train_out/ --model voc_rna-a1_cls21 --batch-images 12 --crop-size
500 --origin-size 2048 --scale-rate-range 0.7,1.3 --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 4 --prefetcher thread --backward-do-mirror

Inside data/VOCdevkit/VOC2012 I have the original download of JPEGImages and SegmentationClass, which provides the full color segmentation images. Any help would be much appreciated.

Here's a snippet of output that may or may not help, showing fcn_valid moving a lot. I'm not entirely sure what the output means, so any explanation on what it is could be useful.

2019-04-11 15:00:09,073 Host Epoch[78] Batch [66-67] Speed: 11.93 samples/sec fcn_valid=0.623302
2019-04-11 15:00:10,056 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:10,058 Host Labels: 0 0.6 -1.0
Waited for 2.59876251221e-05 seconds
2019-04-11 15:00:10,075 Host Epoch[78] Batch [67-68] Speed: 11.98 samples/sec fcn_valid=0.644102
2019-04-11 15:00:10,076 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:11,055 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:11,056 Host Labels: 0 0.6 -1.0
Waited for 3.50475311279e-05 seconds
2019-04-11 15:00:11,074 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:11,077 Host Epoch[78] Batch [68-69] Speed: 11.98 samples/sec fcn_valid=0.632405
2019-04-11 15:00:12,056 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:12,058 Host Labels: 0 0.6 -1.0
Waited for 2.50339508057e-05 seconds
2019-04-11 15:00:12,074 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:12,077 Host Epoch[78] Batch [69-70] Speed: 12.00 samples/sec fcn_valid=0.775874
2019-04-11 15:00:13,057 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:13,058 Host Labels: 0 0.6 -1.0
Waited for 2.59876251221e-05 seconds
2019-04-11 15:00:13,074 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:13,077 Host Epoch[78] Batch [70-71] Speed: 12.01 samples/sec fcn_valid=0.562744
2019-04-11 15:00:14,056 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:14,058 Host Labels: 0 0.6 -1.0
Waited for 0.000184059143066 seconds
2019-04-11 15:00:14,074 Host Labels: 0 0.6 -1.0
2019-04-11 15:00:14,075 Host Epoch[78] Batch [71-72] Speed: 12.03 samples/sec fcn_valid=0.552027

define rn or rna

What is the different between net_type= rn or rna?

(
models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params

if model_specs['net_type'] == 'rn':
return -1, np.array([123.68, 116.779, 103.939]).reshape((1, 1, 3)), None
if model_specs['net_type'] in ('rna',):
return (1.0/255,
np.array([0.485, 0.456, 0.406]).reshape((1, 1, 3)),
np.array([0.229, 0.224, 0.225]).reshape((1, 1, 3)))
return None, None, None
)

version of MXNet

I have tested three models, ade20k_rna-a1_cls150_s8_ep-0001, voc_rna-a1_cls21_s8_coco_ep-0001, and voc_rna-a1_cls21_s8_ep-0001. All of them produced very very low mean ious. Please let me know the verion of MXNet you are using currently. I would like to re-test these models. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.