Code Monkey home page Code Monkey logo

mnc's Introduction

Instance-aware Semantic Segmentation via Multi-task Network Cascades

By Jifeng Dai, Kaiming He, Jian Sun

This python version is re-implemented by Haozhi Qi when he was an intern at Microsoft Research.

Introduction

MNC is an instance-aware semantic segmentation system based on deep convolutional networks, which won the first place in COCO segmentation challenge 2015, and test at a fraction of a second per image. We decompose the task of instance-aware semantic segmentation into related sub-tasks, which are solved by multi-task network cascades (MNC) with shared features. The entire MNC network is trained end-to-end with error gradients across cascaded stages.

example

MNC was initially described in a CVPR 2016 oral paper.

This repository contains a python implementation of MNC, which is ~10% slower than the original matlab implementation.

This repository includes a bilinear RoI warping layer, which enables gradient back-propagation with respect to RoI coordinates.

Misc.

This code has been tested on Linux (Ubuntu 14.04), using K40/Titan X GPUs.

The code is built based on py-faster-rcnn.

MNC is released under the MIT License (refer to the LICENSE file for details).

Citing MNC

If you find MNC useful in your research, please consider citing:

@inproceedings{dai2016instance,
    title={Instance-aware Semantic Segmentation via Multi-task Network Cascades},
    author={Dai, Jifeng and He, Kaiming and Sun, Jian},
    booktitle={CVPR},
    year={2016}
}

Main Results

training data test data mAP^[email protected] mAP^[email protected] time (K40) time (Titian X)
MNC, VGG-16 VOC 12 train VOC 12 val 65.0% 46.3% 0.42sec/img 0.33sec/img

Installation guide

  1. Clone the MNC repository:
# Make sure to clone with --recursive
git clone --recursive https://github.com/daijifeng001/MNC.git
  1. Install Python packages: numpy, scipy, cython, python-opencv, easydict, yaml.

  2. Build the Cython modules and the gpu_nms, gpu_mask_voting modules by:

cd $MNC_ROOT/lib
make
  1. Install Caffe and pycaffe dependencies (see: Caffe installation instructions for official installation guide)

Note: Caffe must be built with support for Python layers!

# In your Makefile.config, make sure to have this line uncommented
WITH_PYTHON_LAYER := 1
# CUDNN is recommended in building to reduce memory footprint
USE_CUDNN := 1
  1. Build Caffe and pycaffe:
    cd $MNC_ROOT/caffe-mnc
    # If you have all of the requirements installed
    # and your Makefile.config in place, then simply do:
    make -j8 && make pycaffe

Demo

First, download the trained MNC model.

./data/scripts/fetch_mnc_model.sh

Run the demo:

cd $MNC_ROOT
./tools/demo.py

Result demo images will be stored to data/demo/.

The demo performs instance-aware semantic segmentation with a trained MNC model (using VGG-16 net). The model is pre-trained on ImageNet, and finetuned on VOC 2012 train set with additional annotations from SBD. The mAP^r of the model is 65.0% on VOC 2012 validation set. The test speed per image is ~0.33sec on Titian X and ~0.42sec on K40.

Training

This repository contains code to end-to-end train MNC for instance-aware semantic segmentation, where gradients across cascaded stages are counted in training.

Preparation:

  1. Run ./data/scripts/fetch_imagenet_models.sh to download the ImageNet pre-trained VGG-16 net.
  2. Download the VOC 2007 dataset to ./data/VOCdevkit2007
  3. Run ./data/scripts/fetch_sbd_data.sh to download the VOC 2012 dataset together with the additional segmentation annotations in SBD to ./data/VOCdevkitSDS.

1. End-to-end training of MNC for instance-aware semantic segmentation

To end-to-end train a 5-stage MNC model (on VOC 2012 train), use experiments/scripts/mnc_5stage.sh. Final mAP^[email protected] should be ~65.0% (mAP^[email protected] should be ~46.3%), on VOC 2012 validation.

cd $MNC_ROOT
./experiments/scripts/mnc_5stage.sh [GPU_ID] VGG16 [--set ...]
# GPU_ID is the GPU you want to train on
# --set ... allows you to specify fast_rcnn.config options, e.g.
#   --set EXP_DIR seed_rng 1701 RNG_SEED 1701

2. Training of CFM for instance-aware semantic segmentation

The code also includes an entry to train a convolutional feature masking (CFM) model for instance aware semantic segmentation.

@inproceedings{dai2015convolutional,
    title={Convolutional Feature Masking for Joint Object and Stuff Segmentation},
    author={Dai, Jifeng and He, Kaiming and Sun, Jian},
    booktitle={CVPR},
    year={2015}
}
2.1. Download pre-computed MCG proposals

Download and process the pre-computed MCG proposals.

cd $MNC_ROOT
./data/scripts/fetch_mcg_data.sh
python ./tools/prepare_mcg_maskdb.py --para_job 24 --db train --output data/cache/voc_2012_train_mcg_maskdb/
python ./tools/prepare_mcg_maskdb.py --para_job 24 --db val --output data/cache/voc_2012_val_mcg_maskdb/

Resulting proposals would be at folder data/MCG/.

2.2. Train the model

Run experiments/scripts/cfm.sh to train on VOC 2012 train set. Final mAP^[email protected] should be ~60.5% (mAP^[email protected] should be ~42.6%), on VOC 2012 validation.

cd $MNC_ROOT
./experiments/scripts/cfm.sh [GPU_ID] VGG16 [--set ...]
# GPU_ID is the GPU you want to train on
# --set ... allows you to specify fast_rcnn.config options, e.g.
#   --set EXP_DIR seed_rng 1701 RNG_SEED 1701

3. End-to-end training of Faster-RCNN for object detection

Faster-RCNN can be viewed as a 2-stage cascades composed of region proposal network (RPN) and object detection network. Run script experiments/scripts/faster_rcnn_end2end.sh to train a Faster-RCNN model on VOC 2007 trainval. Final mAP^b should be ~69.1% on VOC 2007 test.

cd $MNC_ROOT
./experiments/scripts/faster_rcnn_end2end.sh [GPU_ID] VGG16 [--set ...]
# GPU_ID is the GPU you want to train on
# --set ... allows you to specify fast_rcnn.config options, e.g.
#   --set EXP_DIR seed_rng1701 RNG_SEED 1701

mnc's People

Contributors

daijifeng001 avatar haozhiqi avatar leduckhc avatar yuwenxiong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mnc's Issues

faster-rnn with resnet code available?

Hello,

Thank you for sharing your codes! Currently I am working on with my dataset and try to use the faster-rnn for segmentation. In your paper, you have mentioned that faster-rnn worked better with resnet. Do you also have the code in python?

Thank you in advance.

Error with buildling Cython modules

Hi,

I followed the installation guide and got the following error with make:

python setup.py build_ext --inplace
running build_ext
building 'nms.cpu_nms' extension
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/nms
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/software/python-2.7-modules/lib/python2.7/site-packages/numpy/core/include -I/opt/python27/include/python2.7 -c nms/cpu_nms.c -o build/temp.linux-x86_64-2.7/nms/cpu_nms.o -Wno-cpp -Wno-unused-function
gcc: error: nms/cpu_nms.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
error: command 'gcc' failed with exit status 1
make: *** [all] Error 1

I was wondering if you have any thoughts on how I can fix it. Thanks!

The gradient derivation in ROIWarping Layer

@oh233 ,I can't well understand the code:
if (coordinate_index == 1) {
// \par f / \par x1
weight = 0.5 * dxc - dw;
} else if (coordinate_index == 2) {
// \par f / \par y1
weight = 0.5 * dyc - dh;
} else if (coordinate_index == 3) {
// \par f / \par w
weight = 0.5 * dxc + dw;
} else if (coordinate_index == 4) {
// \par f / \par h
weight = 0.5 * dyc + dh;
}
in the function of get_coordinate_gradient in ROIWarpingLayer.cu in https://github.com/daijifeng001/caffe-mnc/blob/mnc/src/caffe/layers/roi_warping_layer.cu ,would you or anyone kindly please to afford me a detail explaination?I kown this is a partial derivative but I really can't figure out where 0.5 comes from?

ResNet runs out of memory in MNC

Hi guys,

I have been working on modifying the prototxt files to use the ResNet network instead of VGG16. Initially I wanted to try ResNet50, but if that works I was hoping to expand to ResNet101/ResNet152. However, ResNet50 already appears to run out of memory. I made two variants of ResNet50, a 3-stage and a 5-stage variant.

The 3-stage ResNet50 train prototxt:
http://ethereon.github.io/netscope/#/gist/c3912c84e77f3da958933d2e76be841f
https://gist.github.com/hgaiser/c3912c84e77f3da958933d2e76be841f

The 5-stage ResNet50 train prototxt:
http://ethereon.github.io/netscope/#/gist/93d50b8f889a1701c8b93cc91e6ff8a1
https://gist.github.com/hgaiser/93d50b8f889a1701c8b93cc91e6ff8a1

ResNet50 3-stage takes 8.7Gb on my GTX Titan X, ResNet50 5-stage is crashing on start because it requires more than the 12Gb I have available. I tried making all convolutional layers of ResNet fixed (by setting param { lr_mult: 0 }) but it still crashes due to memory. Is there anything I can do to reduce the memory usage? Am I doing something wrong perhaps?

Also something that struck me as weird, ResNet50 3-stage took 8.7Gb during training, but 10Gb during testing. I would assume it needs less memory during testing because it doesn't need to perform backpropagation.. What is the reason for this? When training VGG16 5-stage it takes 5.6Gb and indeed when testing it uses less memory, 3.2Gb.

For clarity, the ResNet50 3-stage network I trained did seem to work, but it simply required a lot of memory. I did run into something where I wasn't sure what to do with it. The roi_interpolate_conv5 layer output shape is 14x14. roi_interpolate_conv5_box processes this output with stride 2, so this reduces the size to 7x7. res5a_branch2a and res5a_branch1 are connected to roi_interpolate_conv5_box and also stride with size 2. This results in an output of 4x4. The pool5 layer a bit further down the network pools with a kernel size of 7x7, but because of the previous layers it receives an input of 4x4 (which causes an error). To fix this, I changed the stride of roi_interpolate_conv5_box to 1, but I am unsure if this is the correct fix. This issue is perhaps related.

I am using cudnn 5 and CUDA 7.5 installed on arch, using an upstream version of Caffe with the changes from caffe-mnc from @oh233 . I don't think this significantly affected my results, but thought it might be worth mentioning.

@oh233 , can you help me out with this one? It would be greatly appreciated.

Best regards,
Hans

VisibleDeprecationWarning and train failure

Hello, I have a problem when I trained mnc using ./experiments/scripts/mnc_5stage.sh. Can anyone help me? Thanks in advance.

I0320 15:46:29.860514  2121 net.cpp:270] This network produces output seg_cls_loss
I0320 15:46:29.860517  2121 net.cpp:270] This network produces output seg_cls_loss_ext
I0320 15:46:29.862728  2121 net.cpp:283] Network initialization done.
I0320 15:46:29.862998  2121 solver.cpp:60] Solver scaffolding done.
Loading pretrained model weights from data/imagenet_models/VGG16.mask.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 1024780411
I0320 15:46:30.187852  2121 net.cpp:810] Ignoring source layer rpn_conv/3x3
I0320 15:46:30.187872  2121 net.cpp:810] Ignoring source layer rpn_relu/3x3
I0320 15:46:30.187875  2121 net.cpp:810] Ignoring source layer rpn/output_rpn_relu/3x3_0_split
I0320 15:46:30.244598  2121 net.cpp:810] Ignoring source layer drop6
I0320 15:46:30.253931  2121 net.cpp:810] Ignoring source layer drop7
I0320 15:46:30.310539  2121 net.cpp:810] Ignoring source layer drop6_mask
I0320 15:46:30.319871  2121 net.cpp:810] Ignoring source layer drop7_mask
Solving...
/MNC/tools/../lib/pylayer/proposal_target_layer.py:152: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  cur_inds = npr.choice(cur_inds, size=cur_rois_this_image, replace=False)
/MNC/tools/../lib/transform/bbox_transform.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
/MNC/tools/../lib/transform/bbox_transform.py:202: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
/MNC/tools/../lib/pylayer/proposal_target_layer.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_box = scaled_gt_boxes[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:193: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_masks[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:194: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask_info = mask_info[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:195: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_mask[0:gt_mask_info[0], 0:gt_mask_info[1]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  top_mask_info[i, 0] = gt_assignment[val]
F0320 15:46:43.415727  2121 smooth_L1_loss_layer.cpp:54] Not Implemented Yet
*** Check failure stack trace: ***
./experiments/scripts/mnc_5stage.sh: line 35:  2121 Aborted                 (core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${NET}/mnc_5stage/solver.prototxt --weights ${NET_INIT} --imdb ${DATASET_TRAIN} --iters ${ITERS} --cfg experiments/cfgs/${NET}/mnc_5stage.yml ${EXTRA_ARGS}

Question: ResNet-101 and MS COCO

Hi daijifeng001,

thank you for sharing this wonderful library.

Do you intend to release the (pre-trained) network, which won the 1st place in the COCO segmentation track? (i.e. the MNC using ResNet-101, COCO trainval images, context modeling and multi-scale testing)

Carrying out these steps looks quite time-consuming / complex. I can imagine, that the community would really benefit from publishing this pre-trained network.

Best regards,
Sebastian

get empty predicted boxes when run 'tools/demo.py'

hi @daijifeng001 when i run the tools/demo.py, i print the pred_dict['boxes'], i get empty results as below:

Demo for data/demo/2008_000533.jpg
forward time 0.124833
pred_dict['boxes']:
[]

Demo for data/demo/2008_000910.jpg
forward time 0.194351
pred_dict['boxes']:
[]

Demo for data/demo/2008_001602.jpg
forward time 0.176894
pred_dict['boxes']:
[]

Demo for data/demo/2008_001717.jpg
forward time 0.171917
pred_dict['boxes']:
[]

Demo for data/demo/2008_008093.jpg
forward time 0.194658
pred_dict['boxes']:
[]

can you do me a favor? how to fix it?
thanks.

cannot download caffemodel

As the onedrive is not accessable , './data/scripts/fetch_mnc_model.sh' and'./data/scripts/fetch_imagenet_models.sh' cannot download the caffemodel. Is there other way to download them?

VOC2012 training failure, Memory error

Hello, I have another problem when running ./experiments/scripts/mnc_5stage.sh 0 VGG16, I don't find out why it happens. Thank people who will concern this issue!

I0322 10:32:43.313863 29651 net.cpp:270] This network produces output seg_cls_loss_ext
I0322 10:32:43.400629 29651 net.cpp:283] Network initialization done.
I0322 10:32:43.400925 29651 solver.cpp:60] Solver scaffolding done.
Loading pretrained model weights from data/imagenet_models/VGG16.mask.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 1024780411
I0322 10:32:43.722609 29651 net.cpp:810] Ignoring source layer rpn_conv/3x3
I0322 10:32:43.722632 29651 net.cpp:810] Ignoring source layer rpn_relu/3x3
I0322 10:32:43.722635 29651 net.cpp:810] Ignoring source layer rpn/output_rpn_relu/3x3_0_split
I0322 10:32:43.781419 29651 net.cpp:810] Ignoring source layer drop6
I0322 10:32:43.791087 29651 net.cpp:810] Ignoring source layer drop7
I0322 10:32:43.849704 29651 net.cpp:810] Ignoring source layer drop6_mask
I0322 10:32:43.859282 29651 net.cpp:810] Ignoring source layer drop7_mask
Solving...
/MNC/tools/../lib/pylayer/proposal_target_layer.py:152: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  cur_inds = npr.choice(cur_inds, size=cur_rois_this_image, replace=False)
/MNC/tools/../lib/transform/bbox_transform.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
/MNC/tools/../lib/transform/bbox_transform.py:202: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
/MNC/tools/../lib/pylayer/proposal_target_layer.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_box = scaled_gt_boxes[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:193: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_masks[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:194: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask_info = mask_info[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:195: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_mask[0:gt_mask_info[0], 0:gt_mask_info[1]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  top_mask_info[i, 0] = gt_assignment[val]
/MNC/tools/../lib/pylayer/mask_layer.py:75: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_masks[info[0]][0:info[1], 0:info[2]]
/MNC/tools/../lib/pylayer/stage_bridge_layer.py:224: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_mask[0:gt_mask_info[0], 0:gt_mask_info[1]]
Traceback (most recent call last):
  File "./tools/train_net.py", line 96, in <module>
    _solver.train_model(args.max_iters)
  File "/MNC/tools/../lib/caffeWrapper/SolverWrapper.py", line 127, in train_model
    self.solver.step(1)
MemoryError

Bad memory management?

Hello,

When training a model, the amount of RAM memory used increases over time. For my RAM size I can't train for more than 1.6M iterations without start filling the swap memory.

Am I the only one experiencing this condition? Normally I train for 1.5M iterations and then train again using the generated model as pre-weights.

I don't know if it matters but I'm training a network with 5 convolutions on a dataset of my own. Everything is working fine (except for this memory issue).

Thank you very much.

F0102 20:50:54.342540 4460 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory

I0102 20:50:53.544600 4460 net.cpp:865] Ignoring source layer rpn_loss_bbox
I0102 20:50:53.544636 4460 net.cpp:865] Ignoring source layer rpn_loss_cls
I0102 20:50:53.544839 4460 net.cpp:865] Ignoring source layer seg_cls_score_ext_seg_cls_score_ext_0_split
I0102 20:50:53.544852 4460 net.cpp:865] Ignoring source layer seg_cls_score_seg_cls_score_0_split

Demo for data/demo/2008_000533.jpg
F0102 20:50:54.342540  4460 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
已放弃 (核心已转储)
I use  GPU has  8G of RAM(GTX 1070) .not use cuDNN .because cuDNN v5 not support .
The problem is solved by using cuDNN v4.

F0221 18:59:03.658958 3906 blob.cpp:115] Check failed: data_

Hi, could you help me please?
When I fine-tuning the model with:
experiments/scripts/mnc_5stage.sh
then there is an error after the Network initialization
F0221 18:59:03.658958 3906 blob.cpp:115] Check failed: data_

Error messages

I0221 18:58:46.251495 3906 net.cpp:283] Network initialization done. I0221 18:58:46.252101 3906 solver.cpp:60] Solver scaffolding done. Loading pretrained model weights from data/imagenet_models/VGG16.mask.caffemodel [libprotobuf WARNING google/protobuf/io/coded_stream.cc:605] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:82] The total number of bytes read was 1024780411 I0221 18:59:01.514374 3906 upgrade_proto.cpp:67] Attempting to upgrade input file specified using deprecated input fields: data/imagenet_models/VGG16.mask.caffemodel I0221 18:59:01.514454 3906 upgrade_proto.cpp:70] Successfully upgraded file specified using deprecated input fields. W0221 18:59:01.514470 3906 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields. I0221 18:59:01.537618 3906 net.cpp:761] Ignoring source layer rpn_conv/3x3 I0221 18:59:01.537744 3906 net.cpp:761] Ignoring source layer rpn_relu/3x3 I0221 18:59:01.537762 3906 net.cpp:761] Ignoring source layer rpn/output_rpn_relu/3x3_0_split I0221 18:59:01.667882 3906 net.cpp:761] Ignoring source layer drop6 I0221 18:59:01.680392 3906 net.cpp:761] Ignoring source layer drop7 I0221 18:59:01.761777 3906 net.cpp:761] Ignoring source layer drop6_mask I0221 18:59:01.777510 3906 net.cpp:761] Ignoring source layer drop7_mask Solving... /media/G/yangshu/MNC/tools/../lib/pylayer/proposal_target_layer.py:152: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future cur_inds = npr.choice(cur_inds, size=cur_rois_this_image, replace=False) /media/G/yangshu/MNC/tools/../lib/transform/bbox_transform.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] /media/G/yangshu/MNC/tools/../lib/transform/bbox_transform.py:202: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS /media/G/yangshu/MNC/tools/../lib/pylayer/proposal_target_layer.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future gt_box = scaled_gt_boxes[gt_assignment[val]] /media/G/yangshu/MNC/tools/../lib/pylayer/proposal_target_layer.py:193: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future gt_mask = gt_masks[gt_assignment[val]] /media/G/yangshu/MNC/tools/../lib/pylayer/proposal_target_layer.py:194: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future gt_mask_info = mask_info[gt_assignment[val]] /media/G/yangshu/MNC/tools/../lib/pylayer/proposal_target_layer.py:195: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future gt_mask = gt_mask[0:gt_mask_info[0], 0:gt_mask_info[1]] /media/G/yangshu/MNC/tools/../lib/pylayer/proposal_target_layer.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future top_mask_info[i, 0] = gt_assignment[val] F0221 18:59:03.658958 3906 blob.cpp:115] Check failed: data_ *** Check failure stack trace: *** ./experiments/scripts/mnc_5stage.sh: line 35: 3906 Aborted (core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${NET}/mnc_5stage/solver.prototxt --weights ${NET_INIT} --imdb ${DATASET_TRAIN} --iters ${ITERS} --cfg experiments/cfgs/${NET}/mnc_5stage.yml ${EXTRA_ARGS}

Errors when Building the Cython modules and the gpu_nms, gpu_mask_voting modules

I can't figure out what's wrong. Any idea?

yihuihe ~ $ cd MNC/lib/
yihuihe (master) lib $ make
python setup.py build_ext --inplace
running build_ext
skipping 'utils/bbox.c' Cython extension (up-to-date)
building 'utils.cython_bbox' extension
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/utils
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c utils/bbox.c -o build/temp.linux-x86_64-2.7/utils/bbox.o -Wno-cpp -Wno-unused-function
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/utils/bbox.o -o /home/yihuihe/MNC/lib/utils/cython_bbox.so
cythoning nms/cpu_nms.pyx to nms/cpu_nms.c
building 'nms.cpu_nms' extension
creating build/temp.linux-x86_64-2.7/nms
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c nms/cpu_nms.c -o build/temp.linux-x86_64-2.7/nms/cpu_nms.o -Wno-cpp -Wno-unused-function
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/nms/cpu_nms.o -o /home/yihuihe/MNC/lib/nms/cpu_nms.so
cythoning nms/gpu_nms.pyx to nms/gpu_nms.cpp
building 'nms.gpu_nms' extension
/usr/local/cuda/bin/nvcc -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/local/cuda/include -I/usr/include/python2.7 -c nms/nms_kernel.cu -o build/temp.linux-x86_64-2.7/nms/nms_kernel.o -arch=sm_35 --ptxas-options=-v -c --compiler-options '-fPIC'
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z10nms_kernelifPKfPy' for 'sm_35'
ptxas info    : Function properties for _Z10nms_kernelifPKfPy
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 25 registers, 1280 bytes smem, 344 bytes cmem[0], 8 bytes cmem[2]
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/local/cuda/include -I/usr/include/python2.7 -c nms/gpu_nms.cpp -o build/temp.linux-x86_64-2.7/nms/gpu_nms.o -Wno-unused-function
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
In file included from /usr/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1761:0,
                 from /usr/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarrayobject.h:17,
                 from /usr/lib/python2.7/dist-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from nms/gpu_nms.cpp:352:
/usr/lib/python2.7/dist-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^
c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/nms/nms_kernel.o build/temp.linux-x86_64-2.7/nms/gpu_nms.o -L/usr/local/cuda/lib64 -Wl,-R/usr/local/cuda/lib64 -lcudart -o /home/yihuihe/MNC/lib/nms/gpu_nms.so
cythoning nms/gpu_mv.pyx to nms/gpu_mv.cpp

Error compiling Cython file:
------------------------------------------------------------
...
cimport numpy as np

assert sizeof(int) == sizeof(np.int32_t)

cdef extern from "gpu_mv.hpp":
    void _mv(np.float32_t* all_boxes, np.float32_t* all_masks, np.int32_t all_boxes_num, np.int32_t* candidate_inds, np.int32_t* candidate_start, np.float32_t* candidate_weights, np.int32_t candidate_num, np.int32_t image_height, np.int32_t image_width, np.int32_t box_dim, np.int32_t mask_size, np.int32_t result_num, np.float32_t* result_mask, np.int32_t* result_box, np.int32_t device_id);
                                                                                                                                                                                                                                                                                                                                                                                                      ^
------------------------------------------------------------

nms/gpu_mv.pyx:8:391: Syntax error in C variable declaration
building 'nms.mv' extension
/usr/local/cuda/bin/nvcc -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/local/cuda/include -I/usr/include/python2.7 -c nms/mv_kernel.cu -o build/temp.linux-x86_64-2.7/nms/mv_kernel.o -arch=sm_35 --ptxas-options=-v -c --compiler-options '-fPIC'
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z15reduce_mask_rowiPKfiiPb' for 'sm_35'
ptxas info    : Function properties for _Z15reduce_mask_rowiPKfiiPb
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 12 registers, 352 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Compiling entry function '_Z14mask_aggregateiPKfPfPKiS3_S0_ii' for 'sm_35'
ptxas info    : Function properties for _Z14mask_aggregateiPKfPfPKiS3_S0_ii
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 31 registers, 376 bytes cmem[0]
ptxas info    : Compiling entry function '_Z17reduce_bounding_yiPKbPii' for 'sm_35'
ptxas info    : Function properties for _Z17reduce_bounding_yiPKbPii
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 15 registers, 348 bytes cmem[0]
ptxas info    : Compiling entry function '_Z15reduce_mask_coliPKfiiPb' for 'sm_35'
ptxas info    : Function properties for _Z15reduce_mask_coliPKfiiPb
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 14 registers, 352 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Compiling entry function '_Z11mask_resizeiPKfPKiS2_Pfiii' for 'sm_35'
ptxas info    : Function properties for _Z11mask_resizeiPKfPKiS2_Pfiii
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 35 registers, 372 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Compiling entry function '_Z11mask_renderiPKfS0_iiiiPf' for 'sm_35'
ptxas info    : Function properties for _Z11mask_renderiPKfS0_iiiiPf
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 37 registers, 368 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Compiling entry function '_Z17reduce_bounding_xiPKbPii' for 'sm_35'
ptxas info    : Function properties for _Z17reduce_bounding_xiPKbPii
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 15 registers, 348 bytes cmem[0]
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/local/cuda/include -I/usr/include/python2.7 -c nms/gpu_mv.cpp -o build/temp.linux-x86_64-2.7/nms/gpu_mv.o -Wno-unused-function
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
nms/gpu_mv.cpp:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
 #error Do not use this file, it is the result of a failed Cython compilation.
  ^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
make: *** [all] Error 1

Speed benchmark

Hi @daijifeng001,

How is fast the MNC in inference per image compare with The Faster RCNN in term of detecting box level ?

Thanks

speed

It is quite slow to process one frame, like 300-400ms/fame on a titanX card.
R-FCN is faster from the paper, but I am not able to make it working because of lacking Matlab.
Just wondering how much speed will improve if I use Alexnet or Squeezenet instead of VGG16

Thanks,

Overflow occurs when training MNC with the VGG16 net

Hi everyone.

I'm trying to train the default VGG16 implementation of MNC with the command
./experiments/scripts/mnc_5stage.sh 0 VGG16

However, after some iterations I run into an overflow error:

Error messages

/home/juliano/MNC/tools/../lib/pylayer/stage_bridge_layer.py:107: RuntimeWarning: overflow encountered in exp bottom[0].diff[i, 3] = dfdw[ind] * (delta_x + np.exp(delta_w)) /home/juliano/MNC/tools/../lib/pylayer/proposal_layer.py:213: RuntimeWarning: invalid value encountered in multiply dfdxc * anchor_w * weight_out_proposal * weight_out_anchor /home/juliano/MNC/tools/../lib/pylayer/proposal_layer.py:217: RuntimeWarning: invalid value encountered in multiply dfdw * np.exp(bottom[1].data[0, 4*c+2, h, w]) * anchor_w * weight_out_proposal * weight_out_anchor /home/juliano/MNC/tools/../lib/pylayer/stage_bridge_layer.py:107: RuntimeWarning: invalid value encountered in float_scalars bottom[0].diff[i, 3] = dfdw[ind] * (delta_x + np.exp(delta_w)) /home/juliano/MNC/tools/../lib/pylayer/proposal_layer.py:183: RuntimeWarning: invalid value encountered in greater top_non_zero_ind = np.unique(np.where(abs(top[0].diff[:, :]) > 0)[0]) /home/juliano/MNC/tools/../lib/transform/bbox_transform.py:86: RuntimeWarning: overflow encountered in exp pred_w = np.exp(dw) * widths[:, np.newaxis] /home/juliano/MNC/tools/../lib/transform/bbox_transform.py:129: RuntimeWarning: invalid value encountered in greater_equal keep = np.where((ws >= min_size) & (hs >= min_size))[0] ./experiments/scripts/mnc_5stage.sh: line 35: 22873 Floating point exception(core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${NET}/mnc_5stage/solver.prototxt --weights ${NET_INIT} --imdb ${DATASET_TRAIN} --iters ${ITERS} --cfg experiments/cfgs/${NET}/mnc_5stage.yml ${EXTRA_ARGS}


I saw in issue #22 that user @brisker experienced the same error when trying to train the MNC with his own dataset. The advice given there was to lower the training rate. Lowering it also helped in my case, but even at 1/10th of the original learning rate the same problem occurs, only later in the training process. User @souryuu mentioned that he needed to use a learning rate 100x times smaller to avoid this problem, which lead to a poorer performance of the end-result net (possibly because he ran for the same number of iterations, not 100 times longer).

Wasn't anyone able to run the training with the default learning rate provided by the creators, but without running into overflow problems? I'm simply trying to train the default implementation of the network, with the default dataset. I'm guessing this means it should be possible to use the default learning rate, no?

Couldn't open ./data/mnc_model/mnc_model.caffemodel.h5

I am very new to caffe and mnc, and was trying to run the demo, but received error like below
(only error log that is after Network initialization done was copied). I was wondering if there is anything to do with GPU since I modified the code and ran the cup mode, but I'm not sure... really new to this. Really appreciate it if anyone can help me out of this!

I1204 16:42:29.921504 32654 net.cpp:283] Network initialization done.
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 140144062981952:
#000: ../../../src/H5F.c line 1586 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#1: ../../../src/H5F.c line 1377 in H5F_open(): unable to read superblock
major: File accessibilty
minor: Read failed
#2: ../../../src/H5Fsuper.c line 334 in H5F_super_read(): unable to find file signature
major: File accessibilty
minor: Not an HDF5 file
#3: ../../../src/H5Fsuper.c line 155 in H5F_locate_signature(): unable to find a valid file signature
major: Low-level I/O
minor: Unable to initialize object
F1204 16:42:29.922904 32654 net.cpp:858] Check failed: file_hid >= 0 (-1 vs. 0) Couldn't open ./data/mnc_model/mnc_model.caffemodel.h5
*** Check failure stack trace: ***
Aborted (core dumped)

change the number of classes

I0325 12:54:42.795264 6120 solver.cpp:245] Train net output #11: rpn_loss_bbox = 0.0137722 (* 1 = 0.0137722 loss)
I0325 12:54:42.795266 6120 solver.cpp:245] Train net output #12: seg_cls_loss = 0.137927 (* 1 = 0.137927 loss)
I0325 12:54:42.795269 6120 solver.cpp:245] Train net output #13: seg_cls_loss_ext = 0.541152 (* 1 = 0.541152 loss)
I0325 12:54:42.795272 6120 sgd_solver.cpp:106] Iteration 10, lr = 0.0001
Traceback (most recent call last):
File "/home/dl/work/MNC/tools/train_net.py", line 97, in
_solver.train_model(args.max_iters)
File "/home/dl/work/MNC/tools/../lib/caffeWrapper/SolverWrapper.py", line 138, in train_model
self.snapshot()
File "/home/dl/work/MNC/tools/../lib/caffeWrapper/SolverWrapper.py", line 92, in snapshot
self.bbox_stds[:, np.newaxis])
ValueError: operands could not be broadcast together with shapes (84,8192) (8,1)

out of memory when training on my own data

hi, all!
I am use the MNC to train my model on my data.
every image in my data has vary 1 to 200 masks.
when I run it on about 10G GPU,
it happen to this error:(I print the mask_list in lib/pylayer/mnc_data_layer.py, line 150)

image_size = (406, 438)
mask_list:2
image_size = (406, 438)
mask_list:1
image_size = (406, 438)
mask_list:42
image_size = (406, 439)
mask_list:21
image_size = (406, 438)
mask_list:2
image_size = (406, 438)
mask_list:3
image_size = (406, 439)
mask_list:71
Traceback (most recent call last):
File "./tools/train_net.py", line 96, in
_solver.train_model(args.max_iters)
File "/data2/qinhaifang/MNC/tools/../lib/caffeWrapper/SolverWrapper.py", line 128, in train_model
self.solver.step(1)
MemoryError

thank you advance for your help!

training error with my own datasets with RuntimeWarning, Help?

the error looks llike this:
I1019 20:25:01.929584 24937 solver.cpp:245] Train net output #12: seg_cls_loss = 3.18091 (* 1 = 3.18091 loss)
I1019 20:25:01.929592 24937 solver.cpp:245] Train net output #13: seg_cls_loss_ext = 3.28305 (* 1 = 3.28305 loss)
I1019 20:25:01.929605 24937 sgd_solver.cpp:106] Iteration 0, lr = 0.001
/home/sjtu/code/MNC-master/tools/../lib/pylayer/stage_bridge_layer.py:106: RuntimeWarning: overflow encountered in exp
bottom[0].diff[i, 3] = dfdw[ind] * (delta_x + np.exp(delta_w))
/home/sjtu/code/MNC-master/tools/../lib/pylayer/stage_bridge_layer.py:106: RuntimeWarning: invalid value encountered in float_scalars
bottom[0].diff[i, 3] = dfdw[ind] * (delta_x + np.exp(delta_w))
/home/sjtu/code/MNC-master/tools/../lib/pylayer/stage_bridge_layer.py:107: RuntimeWarning: overflow encountered in exp
bottom[0].diff[i, 4] = dfdh[ind] * (delta_y + np.exp(delta_h))
/home/sjtu/code/MNC-master/tools/../lib/pylayer/stage_bridge_layer.py:107: RuntimeWarning: invalid value encountered in float_scalars
bottom[0].diff[i, 4] = dfdh[ind] * (delta_y + np.exp(delta_h))
/home/sjtu/code/MNC-master/tools/../lib/pylayer/proposal_layer.py:183: RuntimeWarning: invalid value encountered in greater
top_non_zero_ind = np.unique(np.where(abs(top[0].diff[:, :]) > 0)[0])
/home/sjtu/code/MNC-master/tools/../lib/transform/bbox_transform.py:129: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
./experiments/scripts/mnc_5stage.sh: 行 35: 24937 浮点数例外 (核心已转储) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${NET}/mnc_5stage/solver.prototxt --weights ${NET_INIT} --imdb ${DATASET_TRAIN} --iters ${ITERS} --cfg experiments/cfgs/${NET}/mnc_5stage.yml ${EXTRA_ARGS}

Out-of-memory on TitanX (maxwell) w/ VGG16

Hi,

I have just tried out MNC w/ VGG16 end-to-end training using pretty much out-of-the-box configuration (I believe).

I followed instructions to the point where I skipped running "Demo" in the README.
I went straight into "Training" after running 3 preparation steps to prepare data sample.
Then I ran the command suggested:
cd $MNC_ROOT
./experiments/scripts/mnc_5stage.sh 0 VGG16
The code runs to some point. In particular it construct the net fine.
But when it starts spitting out various task loss values @ iteration 0, it gets out-of-memory error.
I attach the full log file but here's the last lines that might be useful for a quick look.

I1205 12:16:01.066789 22451 sgd_solver.cpp:106] Iteration 0, lr = 0.001
F1205 12:16:01.084414 22451 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
./experiments/scripts/mnc_5stage.sh: line 35: 22451 Aborted (core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${NET}/mnc_5stage/solver.prototxt --weights ${NET_INIT} --imdb ${DATASET_TRAIN} --iters ${ITERS} --cfg experiments/cfgs/${NET}/mnc_5stage.yml ${EXTRA_ARGS}

Question on Custom Data Training

Hi,

I am trying to train my own models using my own image data. I am pretty much copying and modifying your mnc_5stage code for Pascal VOC dataset.

  1. In yml file, should MASK_SIZE be the same as the number of classes?

  2. I use 3000 image files and took roughly 3 days to do 4000 iterations. Does this sound right? I was originally using 5000 images and 45000 iterations, but it was taking too long..

  3. My model at this moment, it has only one type of objects, so I believe 3000 images are enough. But would you say 4000 iterations are enough as well?

  4. In the log, the accuracy_det and accuracy_det_ext appear to be over 95%, but when I manually test it on IPython Notebook using your Demo codes, it seems to detect only one or two instances and more often zeros when I test it on training dataset which the model should know well. And it doesn't seem to detect instances correctly on testing dataset. Could you give me some tips on improving its performance?

  5. Also along with the line of accuracy, it seems to put bounding boxes a lot smaller than actual objects. Which threshold can I play with to control the bounding boxes?

This is a lot of questions, but I appreciate your patience and response. :)

IndexError: index 4 is out of bounds for axis 1 with size 4 while running demo.py

Hi, @oh233 and @daijifeng001

Thanks a lot for sharing your great work.

I have trained the mnc-5stage network using my own training dataset(which only contains one type of object) and got a AP value on my testing data. But when I ran demo.py to visualize the output, I got this error. Can you help me please? Thanks in advance.

Traceback (most recent call last):
File "./tools/demo.py", line 151, in
100, im.shape[1], im.shape[0])
File "/home/cewu/MNC/tools/../lib/transform/mask_transform.py", line 234, in gpu_mask_voting
inds = nms(dets, cfg.TEST.MASK_MERGE_NMS_THRESH)
File "/home/cewu/MNC/tools/../lib/nms/nms_wrapper.py", line 19, in nms
return gpu_nms(dets, thresh, device_id=cfg.GPU_ID)
File "nms/gpu_nms.pyx", line 24, in nms.gpu_nms.gpu_nms (nms/gpu_nms.cpp:1754)
IndexError: index 4 is out of bounds for axis 1 with size 4

Error when make

It seems like this MNC project has some conflict with the latest version of cudnn(5.1)? Because if I turn off the "USE_CUDNN", there won't be any error.

What version of cudnn does this MNC project use, please?
Thank you!

make: *** [.build_release/src/caffe/solvers/adagrad_solver.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/net.hpp:10,
                 from ./include/caffe/solver.hpp:7,
                 from ./include/caffe/sgd_solvers.hpp:7,
                 from src/caffe/solvers/adam_solver.cpp:3:
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::createPoolingDesc(cudnnPoolingStruct**, caffe::PoolingParameter_PoolMethod, cudnnPoolingMode_t*, int, int, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:127:41: error: too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’
         pad_h, pad_w, stride_h, stride_w));
                                         ^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro ‘CUDNN_CHECK’
     cudnnStatus_t status = condition; \
                            ^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
                 from ./include/caffe/util/device_alternate.hpp:40,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/net.hpp:10,
                 from ./include/caffe/solver.hpp:7,
                 from ./include/caffe/sgd_solvers.hpp:7,
                 from src/caffe/solvers/adam_solver.cpp:3:
/usr/local/cuda/include/cudnn.h:803:27: note: declared here
 cudnnStatus_t CUDNNWINAPI cudnnSetPooling2dDescriptor(
                           ^
make: *** [.build_release/src/caffe/solvers/adam_solver.o] Error 1
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/net.hpp:10,
                 from ./include/caffe/solver.hpp:7,
                 from ./include/caffe/sgd_solvers.hpp:7,
                 from src/caffe/solvers/rmsprop_solver.cpp:3:
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::createPoolingDesc(cudnnPoolingStruct**, caffe::PoolingParameter_PoolMethod, cudnnPoolingMode_t*, int, int, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:127:41: error: too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’
         pad_h, pad_w, stride_h, stride_w));

KeyError: 'make_proposal'

Hi, running the demo.py leads to following error:

  File "/home/timo/MNC/tools/demo.py", line 139, in <module>
    _, _, _ = im_detect(im, net)
  File "/home/timo/MNC/tools/demo.py", line 90, in im_detect
    masks_phase1 = net.blobs['make_proposal'].data[...]
KeyError: 'make_proposal'```

ROI Warping output width and height incorrect

I'm not sure if the paper is correct or if the model here on github is correct, but there is a discrepancy between the two. The code says the pooled width and height should be 14x14, however the paper claims it should be 28x28:

We expect the RoI warping layer to produce a sufficiently
fine resolution, which is set as W' × H' = 28 × 28 in this
paper. A max pooling layer is then applied to produce a
lower-resolution output, e.g., 7×7 for VGG-16.

Am I interpreting something wrong, or is there a reason for this?

Help? Training my own dataset is done, test on images error : Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered *** Check failure stack trace: ***

I1020 12:30:46.939435 28978 net.cpp:865] Ignoring source layer rpn_loss_bbox
I1020 12:30:46.939491 28978 net.cpp:865] Ignoring source layer rpn_loss_cls
I1020 12:30:46.939963 28978 net.cpp:865] Ignoring source layer seg_cls_score_ext_seg_cls_score_ext_0_split
I1020 12:30:46.939996 28978 net.cpp:865] Ignoring source layer seg_cls_score_seg_cls_score_0_split
F1020 12:30:47.198374 28978 syncedmem.hpp:18] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
*** Check failure stack trace: ***

Docker file

Would you consider creating a docker (or nvidia-docker) file so the compilation gets easier?

F1205 03:06:39.782464 14338 roi_warping_layer.cpp:47] Not Implemented Yet in cpu_only mode

I'm doing a cpu_only mode, and have commented/modified gpu related file in nms, tansform, setup.py mnc_config.py, but getting error like this. From closed issue#23 it is mentioned that could be out of memory. I was wondering if there remains any else configuration to modify to minimized runtime ram? Or is this could be other problem? Thanks a lot.

I1205 03:06:16.793164 14338 net.cpp:283] Network initialization done.
I1205 03:06:16.798741 14338 net.cpp:865] Ignoring source layer accuracy_det
I1205 03:06:16.798768 14338 net.cpp:865] Ignoring source layer accuracy_det_ext
I1205 03:06:16.798775 14338 net.cpp:865] Ignoring source layer accuracy_seg
I1205 03:06:16.798782 14338 net.cpp:865] Ignoring source layer accuracy_seg_ext
I1205 03:06:16.799073 14338 hdf5.cpp:32] Datatype class: H5T_FLOAT
I1205 03:06:16.805341 14338 net.cpp:865] Ignoring source layer bbox_pred_bbox_pred_0_split
I1205 03:06:16.808097 14338 net.cpp:865] Ignoring source layer cls_score_cls_score_0_split
I1205 03:06:16.808130 14338 net.cpp:865] Ignoring source layer cls_score_ext_cls_score_ext_0_split
I1205 03:06:18.953465 14338 net.cpp:865] Ignoring source layer gt_boxes_input-data_2_split
I1205 03:06:18.953495 14338 net.cpp:865] Ignoring source layer gt_masks_input-data_3_split
I1205 03:06:18.953505 14338 net.cpp:865] Ignoring source layer im_info_input-data_1_split
I1205 03:06:18.953514 14338 net.cpp:865] Ignoring source layer input-data
I1205 03:06:18.953582 14338 net.cpp:865] Ignoring source layer labels_ext_stage_bridge_1_split
I1205 03:06:18.953594 14338 net.cpp:865] Ignoring source layer labels_roi-data_1_split
I1205 03:06:18.953604 14338 net.cpp:865] Ignoring source layer loss_bbox
I1205 03:06:18.953613 14338 net.cpp:865] Ignoring source layer loss_bbox_ext
I1205 03:06:18.953624 14338 net.cpp:865] Ignoring source layer loss_cls
I1205 03:06:18.953632 14338 net.cpp:865] Ignoring source layer loss_cls_ext
I1205 03:06:18.953642 14338 net.cpp:865] Ignoring source layer loss_mask
I1205 03:06:18.953651 14338 net.cpp:865] Ignoring source layer loss_mask_ext
I1205 03:06:18.953660 14338 net.cpp:865] Ignoring source layer loss_seg_cls
I1205 03:06:18.953670 14338 net.cpp:865] Ignoring source layer loss_seg_cls_ext
I1205 03:06:18.953680 14338 net.cpp:865] Ignoring source layer mask_info_input-data_4_split
I1205 03:06:18.954733 14338 net.cpp:865] Ignoring source layer mask_pred_ext_mask_pred_ext_0_split
I1205 03:06:18.954753 14338 net.cpp:865] Ignoring source layer mask_pred_mask_pred_0_split
I1205 03:06:18.954789 14338 net.cpp:865] Ignoring source layer mask_proposal_label_ext_mask_proposal_ext_1_split
I1205 03:06:18.954802 14338 net.cpp:865] Ignoring source layer mask_proposal_label_mask_proposal_1_split
I1205 03:06:18.955157 14338 net.cpp:865] Ignoring source layer roi-data
I1205 03:06:18.955226 14338 net.cpp:865] Ignoring source layer roi_interpolate_conv5_ext_premax
I1205 03:06:18.955307 14338 net.cpp:865] Ignoring source layer rois_roi-data_0_split
I1205 03:06:18.955320 14338 net.cpp:865] Ignoring source layer rpn-data
I1205 03:06:18.956171 14338 net.cpp:865] Ignoring source layer rpn_bbox_pred_rpn_bbox_pred_0_split
I1205 03:06:18.956997 14338 net.cpp:865] Ignoring source layer rpn_cls_score_reshape_rpn_cls_score_reshape_0_split
I1205 03:06:18.957015 14338 net.cpp:865] Ignoring source layer rpn_cls_score_rpn_cls_score_0_split
I1205 03:06:18.975651 14338 net.cpp:865] Ignoring source layer rpn_loss_bbox
I1205 03:06:18.975675 14338 net.cpp:865] Ignoring source layer rpn_loss_cls
I1205 03:06:18.977620 14338 net.cpp:865] Ignoring source layer seg_cls_score_ext_seg_cls_score_ext_0_split
I1205 03:06:18.977633 14338 net.cpp:865] Ignoring source layer seg_cls_score_seg_cls_score_0_split
F1205 03:06:39.782464 14338 roi_warping_layer.cpp:47] Not Implemented Yet
*** Check failure stack trace: ***
Aborted (core dumped)

F0221 20:15:47.362769 11176 roi_warping_layer.cu:121] Check failed: error == cudaSuccess (9 vs. 0) invalid configuration argument

F0221 20:15:47.362769 11176 roi_warping_layer.cu:121] Check failed: error == cudaSuccess (9 vs. 0) invalid configuration argument
*** Check failure stack trace: ***
./experiments/scripts/mnc_5stage.sh: line 46: 11176 Aborted (core dumped) ./tools/test_net.py --gpu ${GPU_ID} --def models/${NET}/mnc_5stage/test.prototxt --net ${NET_FINAL} --imdb ${DATASET_TEST} --cfg experiments/cfgs/${NET}/mnc_5stage.yml --task seg

This error came when I ran mnc_5stage.sh.
I ran the shell script as the example suggest:
./experiments/scripts/mnc_5stage.sh 0 VGG16 --set EXP_DIR foobar RNG_SEED 42 TRAIN.SCALES "[400,500,600,700]"
It seems that it goes well with the training period but gets this error with the testing period.

Resnet50 out of memory

hi~ I am run your code in my machine which is 12G titanx. datasets is Coco. when I am traing the resnet 50 +3stage, It could not work, It will out of memory. Could you please tell me how do you train resnet101+5stage on a 12G titanx?
thank you for your help in advance!

Training on different dataset - how should I change the number of classes?

Hi,

I'm trying to train on a different dataset. I've built it similar to Pascal VOC2012, but I only have 3 classes (or 4 with background).

When trying to change the number of classes in pascal_voc_det.py by changing self._classes, I'm getting this error when trying to save the trained network snapshot -

File "/home/ubuntu/MNC/tools/../lib/caffeWrapper/SolverWrapper.py", line 91, in snapshot
self.bbox_stds[:, np.newaxis])
ValueError: operands could not be broadcast together with shapes (84,8192) (16,1)

When trying to change "num_classes" in train.prototxt from 21 to 4, I'm getting this error when solving the network -

F0211 22:44:32.050034 3286 smooth_L1_loss_layer.cpp:28] Check failed: bottom[0]->channels() == bottom[1]->channels() (84 vs. 16)

Can you say which parameters I need to change in order to change the number of classes I classify?

Thanks!!

Yotam

The question about solverstate

Hello, when I used the 5 stage model, I want to save my model's solverstate not just the caffemodel.h5 . however, I do not find any function in pycaffe to save solverstate which is bigger than 2GB. How can I do?? Thanks.

nms.mv missing - Running MNC under CPU mode

I have been trying to run MNC under CPU mode. Followed all steps mentioned in the following post:
rbgirshick/py-faster-rcnn#123

However, I run into an issue which states that there is a missing module named 'mv' in the nms folder.

Here is a snapshot of the error:

image

It would be great if anyone could give me some insights on this issue.

Thanks in advance!

Training MNC on MS COCO dataset

Dear @daijifeng001 @oh233 @liyi14

Thanks a lot for sharing your marvelous project.
I just started to learn deep learning several weeks ago.
I have some problem about training MNC on MS COCO dataset.

First, regarding preparing dataset, I found I should modify the code providing data (MNC_ROOT/lib/datasets/* ). Beside datasets, is there any code need to be re-implement?

Second, there are some functions to do evaluation in MNC_ROOT/lib/datasets/pascal_voc_seg.py. How can I do evaluation on MS COCO for instance-aware semantic segmentation? I can't find api to evaluate segmentation (only api for evaluating detection in COCO api @@).

Third, could you please provide your data provider of MS COCO? There are few examples about providing data for instance-aware semantic segmentation and it is quite hard for me to do this.

Sorry for asking so many questions. I am quite new to deep learning and caffe, so there are many details I don't know.

Best regards

conv5_3 layers contains NANs causing SIGFPE

Hi. I was examining the cause of SIGFPE 8 when fine-tuning the network for VOCpascalSDS from pre-trained.VGG16 model. For me, yet not obvious reasons, in the 3rd iteration the weights of conv5_3 (and almost all the layers below conv5_{1,2}, conv4_{1,2,3}, conv3_{1,2,3}, etc) contain NANs. This cause unpredictable results for RPN, Roiwarping, ProposalLayer. It may also be the reason for SIGFPE 8, np.exp overflows etc.

I examined by looking at

print {k: v.data for k, v in self.solver.net.blobs.items()}
print {k: v[0].data for k, v in self.solver.net.params.items()}
# v[0] are weights, v[1] are biases

These might also be related #41 #22 #52

mnc doesnt recognize any object

I try to run the demo, with the suggested trained MNC model and the original 5 images, but the output images getting from the algorithm are black without any recognized objects in the picture.I notice that the im_detect function return very low scores, lower than 0.1
I didn't change anything in the code, what can be the problem?

Standard Caffe compatibility

Is your MNC-caffe a direct replacement for the standard Caffe? If I remove the standard Caffe from the system and install/compile yours, will other programs still work as usual (in particular, the Python Caffe interface)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.