multimodallearning / pytorch-mask-rcnn Goto Github PK

View Code? Open in Web Editor NEW

2.0K 34.0 555.0 8.75 MB

License: Other

Python 84.55% Cuda 5.75% C 9.62% C++ 0.08%

pytorch-mask-rcnn's People

Stargazers

Watchers

Forkers

akash5474 elenacuoco samuelschen dimplesl zhanghaoinf u39kun britefury meizhoubao fireae kekedan ladyia swall0w soapisnotfat shubhampachori12110095 liyuanyaun briansp2020 soledad89 rongchangzhao codes-kzhan eaglep91 kuonangzhe yiqinggit ashstuff liusj-gh chentianle1993 zhjpqq qidiso spmohanty hli2020 karanftd skrish13 lxy598951 louis-she smallflyfly fortisaqua cherryxiongyw xiepengyue kunwangv ngxbac locussam patrickket dodler wuhuhu800 yobajnin snooble pandinosaurus xiaoqiqi177 amwons daijucug tobyclh xiaoliang008 mingchaoxu lvzhaoyang coderhhx shadowyell dengzelu kexinyi zumbalamambo jxchen01 kaihuatang woozch prithv1 zengzhi2015 qiaod liu3xing3long junweima xiaoyigwr hqleeustc yskim041 fengqian1989 zarjagen lith0613 haiminzhang junan007 chizhannt 94mia yangyongguang marcoruizrueda spartag117 briando2005 pebblecoin insigh xychenunc woodfrog lexuszhi1990 arpitkath afcarl rizkywellyanto xllau lidongyv winwinjjiang drawzeropoint niluanwudidadi xtellurian yxlijun goldentimecoolk mfattouh bigtreeljc gayathrimahalingam nutufts

pytorch-mask-rcnn's Issues

Error while training using coco.py

Traceback (most recent call last):

File "coco.py", line 501, in <module>
   layers='heads')
 File "/mnt/data/repos/pytorch-mask-rcnn/model.py", line 1803, in train_model
   loss, loss_rpn_class, loss_rpn_bbox, loss_mrcnn_class, loss_mrcnn_bbox, loss_mrcnn_mask = self.train_epoch(train_generator, optimizer, self.config.STEPS_PER_EPOCH)
 File "/mnt/data/repos/pytorch-mask-rcnn/model.py", line 1865, in train_epoch
   self.predict([images, image_metas, gt_class_ids, gt_boxes, gt_masks], mode='training')
 File "/mnt/data/repos/pytorch-mask-rcnn/model.py", line 1719, in predict
   detection_target_layer(rpn_rois, gt_class_ids, gt_boxes, gt_masks, self.config)
 File "/mnt/data/repos/pytorch-mask-rcnn/model.py", line 569, in detection_target_layer
   crowd_ix = torch.nonzero(gt_class_ids < 0)[:, 0]
IndexError: too many indices for tensor of dimension 1

Value '[sm_61]' is not defined for option 'gpu-architecture'

Hi, @lasseha ,
when I input the following command "nvcc -c -o nms_kernel.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=[sm_61]" , it reported the "nvcc fatal : Value '[sm_61]' is not defined for option 'gpu-architecture'" error. My cuda version is 9.0, my GPU is 1080. And I have installed the nvcc and add to the path. Thanks!

KeyError: 'unexpected key "conv1.weight" in state_dict'

I am using imagenet pretrain and when I run the training, I got the error. How to fix it? Thanks. My pytorch is 0.4.0, python 3.6
The pretrain model I got from torchvision (official pretrain model). I guess we have to use keras pretrain model to fix it. However, how could we use official pretrain model in https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py?

Loading weights  /home/john/pytorch-mask-rcnn/resnet101-5d3b4d8f.pth
Traceback (most recent call last):
  File "train.py", line 28, in <module>
    model.load_weights(model_path)
  File "/home/john/pytorch-mask-rcnn/model.py", line 1563, in load_weights
    self.load_state_dict(torch.load(filepath))
  File "/home/john/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 538, in load_state_dict
    .format(name))
KeyError: 'unexpected key "conv1.weight" in state_dict'

Are there any pytorch resnet-101 imagenet pretrained network? How to download?

Thank you so much

A little point about the comment in model.py

In model.py line 884
The comment is
# Reshape to [batch, 2, anchors]
I think it might be [batch, anchors, 2]
That's why I have been confused about Softmax Layer with dim = 2

Anyway, the code is pretty good.

NMS module compile error

NMS module compile error When "python build.py"

what do ['arch'] means for?

When I run 'nvcc -c -o nms_kernel.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=[arch]', it show info as follows:
nvcc fatal : Value '[arch]' is not defined for option 'gpu-architecture'
I don't know what is ['arch'], what can I do for this?

Why do we need detach the level box in ROIAlign?

As in Line 463, why do we need to stop gradient propogation to ROI proposals?

Potential bug on setting trainable layers for each stage

pytorch-mask-rcnn/model.py

Line 1493 in 809abba

param[1].requires_grad = False

C1-C4 will always be set to requires_grad=False even after stage heads is passed onto 4+ or all.

linux 16.04 shutdown directly when evaluation

when I run this code for evaluation, the system directly shutdown. I tried to debug it. When I monitor the GPU memory, I found it soaring when jump into the forward() function of class Classifier, then shutdown. When I delete the conv1-bn1-relu-conv2-bn2-relu in forward(), just remain roi_align, the evaluation can run well. But I don't figure out what's wrong with class classifier?
It's an urgent problem, SOS! Plz help! @lasseha

def forward(self, x, rois):
    x = pyramid_roi_align([rois]+x, self.pool_size, self.image_shape)
    x =  @self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.conv2(x)
    x = self.bn2(x)
    x = self.relu(x)

    x = x.view(-1,1024)
    mrcnn_class_logits = self.linear_class(x)
    mrcnn_probs = self.softmax(mrcnn_class_logits)

    mrcnn_bbox = self.linear_bbox(x)
    mrcnn_bbox = mrcnn_bbox.view(mrcnn_bbox.size()[0], -1, 4)

    return [mrcnn_class_logits, mrcnn_probs, mrcnn_bbox]

I wrote a multi-gpu version based you on repo

Hi @lasseha, I wrote a multi-gpu version based on your repo. Changed a lot on the whole code framework so I won't pull a request here. It is still in beta version. I will let you know (possibly open a brand new repo) when it is available (around the end of May I guess). Please have a (preliminary) look here.

Thanks for you hard work! Right now the new multi-gpu version achieves around 36% mAP on COCO, without any additional data augmentation (except the one you use: horizontal flip) or tricks. Trained on 8 TItan X. Around 4 days. Not fully optimized. I expect the training phase to be shorter.

I have a few questions. Hopefully you can respond in a prompt manner.

COCO pretrain model: how do you get this? there are weights on mask.conv1.bias, size: torch.Size([256]), rpn, classifier. Did you train a Mask-RCNN detection model and use it as pretrain? It is supposed to be classification model?
batch norm learnable or not. In line (here)[https://github.com/multimodallearning/pytorch-mask-rcnn/blob/master/model.py#L1461], you init BN layer as requires_grad=False, in this sense when the training is performed as stages (heads, 4+, all), the BN layer is not trained at all, right? However, when you set the optimizer, the weight decay is explicitly removed in BN layer, why? Any experimental evidence?

Thanks!

error: run "python build.py" in “nms”

`C:\Users\dell\Anaconda3\python.exe C:/Users/dell/Desktop/Mask-RCNN-master/Mask-RCNN-master/nms/build.py
Including CUDA code.
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms
generating C:\Users\dell\AppData\Local\Temp\tmp29alsaqj_nms.c
setting the current directory to 'C:\Users\dell\AppData\Local\Temp\tmp29alsaqj'
running build_ext
building '_nms' extension
creating Release
creating Release\Users
creating Release\Users\dell
creating Release\Users\dell\Desktop
creating Release\Users\dell\Desktop\Mask-RCNN-master
creating Release\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master
creating Release\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms
creating Release\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DWITH_CUDA -IC:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib\include -IC:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib\include\TH -IC:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0/include" -IC:\Users\dell\Anaconda3\include -IC:\Users\dell\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\winrt" /Tc_nms.c /Fo.\Release_nms.obj
_nms.c
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DWITH_CUDA -IC:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib\include -IC:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib\include\TH -IC:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0/include" -IC:\Users\dell\Anaconda3\include -IC:\Users\dell\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\winrt" /TcC:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.c /Fo.\Release\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.obj
nms.c
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.c(7): warning C4133: 'function': incompatible types - from 'THFloatTensor *' to 'const THLongTensor *'
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.c(9): warning C4133: 'function': incompatible types - from 'THFloatTensor *' to 'const THLongTensor *'
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.c(11): warning C4244: 'initializing': conversion from 'int64_t' to 'long', possible loss of data
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.c(12): warning C4244: 'initializing': conversion from 'int64_t' to 'long', possible loss of data
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.c(14): warning C4133: 'initializing': incompatible types - from 'int64_t *' to 'long *'
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.c(16): warning C4133: 'initializing': incompatible types - from 'int64_t *' to 'long *'
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.c(65): warning C4133: 'initializing': incompatible types - from 'int64_t *' to 'long *'
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DWITH_CUDA -IC:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib\include -IC:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib\include\TH -IC:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0/include" -IC:\Users\dell\Anaconda3\include -IC:\Users\dell\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\winrt" /TcC:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms_cuda.c /Fo.\Release\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms_cuda.obj
nms_cuda.c
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms_cuda.c(22): warning C4244: 'initializing': conversion from 'int64_t' to 'int', possible loss of data
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms_cuda.c(23): warning C4244: 'initializing': conversion from 'int64_t' to 'int', possible loss of data
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms_cuda.c(43): warning C4133: 'initializing': incompatible types - from 'int64_t *' to 'long *'
C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms_cuda.c(60): warning C4133: 'initializing': incompatible types - from 'int64_t *' to 'long *'
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0/lib/x64" /LIBPATH:C:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi....\lib /LIBPATH:C:\Users\dell\Anaconda3\libs /LIBPATH:C:\Users\dell\Anaconda3\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\LIB\amd64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\um\x64" /EXPORT:PyInit__nms .\Release_nms.obj .\Release\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms.obj .\Release\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src\nms_cuda.obj C:\Users\dell\Desktop\Mask-RCNN-master\Mask-RCNN-master\nms\src/cuda/nms_kernel.cu.o /OUT:._nms.pyd /IMPLIB:.\Release_nms.lib
_nms.obj : warning LNK4197: export 'PyInit__nms' specified multiple times; using first specification
Creating library .\Release_nms.lib and object .\Release_nms.exp
LINK : warning LNK4098: defaultlib 'LIBCMT' conflicts with use of other libs; use /NODEFAULTLIB:library
nms.obj : error LNK2001: unresolved external symbol __imp_THFloatTensor_data
nms.obj : error LNK2001: unresolved external symbol __imp_THByteTensor_fill
nms.obj : error LNK2001: unresolved external symbol __imp_THByteTensor_data
nms.obj : error LNK2001: unresolved external symbol __imp_THByteTensor_free
nms.obj : error LNK2001: unresolved external symbol __imp__THArgCheck
nms.obj : error LNK2001: unresolved external symbol __imp_THByteTensor_newWithSize1d
nms.obj : error LNK2001: unresolved external symbol __imp_THLongTensor_isContiguous
nms.obj : error LNK2001: unresolved external symbol __imp_THLongTensor_data
nms.obj : error LNK2001: unresolved external symbol __imp_THFloatTensor_size
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THLongTensor_free
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THLongTensor_newWithSize2d
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaLongTensor_newWithSize2d
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THLongTensor_copyCuda
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THLongTensor_newWithSize1d
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THLongTensor_fill
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaTensor_data
nms_cuda.obj : error LNK2001: unresolved external symbol state
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaTensor_isContiguous
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaLongTensor_data
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaTensor_size
nms_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaLongTensor_free
nms_kernel.cu.o : error LNK2001: unresolved external symbol cudaConfigureCall
nms_kernel.cu.o : error LNK2001: unresolved external symbol cudaSetupArgument
nms_kernel.cu.o : error LNK2001: unresolved external symbol cudaLaunch
nms_kernel.cu.o : error LNK2001: unresolved external symbol __cudaRegisterFatBinary
nms_kernel.cu.o : error LNK2001: unresolved external symbol __cudaUnregisterFatBinary
nms_kernel.cu.o : error LNK2001: unresolved external symbol __cudaRegisterFunction
._nms.pyd : fatal error LNK1120: 27 unresolved externals
Traceback (most recent call last):
File "C:\Users\dell\Anaconda3\lib\distutils_msvccompiler.py", line 519, in link
self.spawn([self.linker] + ld_args)
File "C:\Users\dell\Anaconda3\lib\distutils_msvccompiler.py", line 542, in spawn
return super().spawn(cmd)
File "C:\Users\dell\Anaconda3\lib\distutils\ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "C:\Users\dell\Anaconda3\lib\distutils\spawn.py", line 38, in spawn
_spawn_nt(cmd, search_path, dry_run=dry_run)
File "C:\Users\dell\Anaconda3\lib\distutils\spawn.py", line 81, in _spawn_nt
"command %r failed with exit status %d" % (cmd, rc))
distutils.errors.DistutilsExecError: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\link.exe' failed with exit status 1120

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\dell\Anaconda3\lib\site-packages\cffi\ffiplatform.py", line 51, in _build
dist.run_command('build_ext')
File "C:\Users\dell\Anaconda3\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "C:\Users\dell\Anaconda3\lib\site-packages\setuptools\command\build_ext.py", line 75, in run
_build_ext.run(self)
File "C:\Users\dell\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "C:\Users\dell\Anaconda3\lib\distutils\command\build_ext.py", line 339, in run
self.build_extensions()
File "C:\Users\dell\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 194, in build_extensions
self.build_extension(ext)
File "C:\Users\dell\Anaconda3\lib\site-packages\setuptools\command\build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "C:\Users\dell\Anaconda3\lib\distutils\command\build_ext.py", line 558, in build_extension
target_lang=language)
File "C:\Users\dell\Anaconda3\lib\distutils\ccompiler.py", line 717, in link_shared_object
extra_preargs, extra_postargs, build_temp, target_lang)
File "C:\Users\dell\Anaconda3\lib\distutils_msvccompiler.py", line 522, in link
raise LinkError(msg)
distutils.errors.LinkError: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\link.exe' failed with exit status 1120

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/dell/Desktop/Mask-RCNN-master/Mask-RCNN-master/nms/build.py", line 34, in
ffi.build()
File "C:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi_init_.py", line 184, in build
build_extension(ffi, cffi_wrapper_name, target_dir, verbose)
File "C:\Users\dell\Anaconda3\lib\site-packages\torch\utils\ffi_init.py", line 108, in _build_extension
outfile = ffi.compile(tmpdir=tmpdir, verbose=verbose, target=libname)
File "C:\Users\dell\Anaconda3\lib\site-packages\cffi\api.py", line 690, in compile
compiler_verbose=verbose, debug=debug, **kwds)
File "C:\Users\dell\Anaconda3\lib\site-packages\cffi\recompiler.py", line 1515, in recompile
compiler_verbose, debug)
File "C:\Users\dell\Anaconda3\lib\site-packages\cffi\ffiplatform.py", line 22, in compile
outputfilename = _build(tmpdir, ext, compiler_verbose, debug)
File "C:\Users\dell\Anaconda3\lib\site-packages\cffi\ffiplatform.py", line 58, in _build
raise VerificationError('%s: %s' % (e.class.name, e))
cffi.error.VerificationError: LinkError: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\link.exe' failed with exit status 1120

Process finished with exit code 1
`

Help me look at this problem. Thank you!

My fork of pytorch-mask-rcnn: PyTorch 0.4, batch size > 1, Windows compatibility, etc

I have made a fork of pytorch-mask-rcnn here:

https://github.com/Britefury/pytorch-mask-rcnn/tree/refactor (note the refactor branch; not master)

It fixes a number of issues:

batch size can be > 1
compatible with PyTorch 0.4
extensions (NMS etc) compile and run on Windows
multi-GPU training possible (although may not work well)
significant refactor; Mask R-CNN inherits Faster R-CNN inherits RPN, so you can choose which bits of the model you want
Detectron compatibility; I dropped compatibility with Matterport weights and focused on compatibility with Detectron instead. You can load Detectron weights and have it generate results that are not quite correct, yet...
quite a few other things I forget....

Unfortunately its not very well documented and the original Coco based examples are unlikely to work. There are some new examples though, but they need work.

Basically, I'm wondering if I call it a new project going on, or.... Would be interested to hear @lasseha's thoughts :)

cffi.error.VerificationError: LinkError: command 'gcc' failed with exit status 1

when i do 'python build.py ',it got an error as follow.How can i solve it.
ce-code/pytorch-mask-rcnn/nms/src/nms.o /home/miaoshuyu/source-code/pytorch-mask-rcnn/nms/src/cuda/nms_kernel.cu.o -o ./_nms.so
gcc: error: /home/miaoshuyu/source-code/pytorch-mask-rcnn/nms/src/cuda/nms_kernel.cu.o: No such file or directory
Traceback (most recent call last):
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/unixccompiler.py", line 197, in link
self.spawn(linker + ld_args)
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/miaoshuyu/anaconda3/lib/python3.6/site-packages/cffi/ffiplatform.py", line 51, in _build
dist.run_command('build_ext')
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/command/build_ext.py", line 558, in build_extension
target_lang=language)
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/ccompiler.py", line 717, in link_shared_object
extra_preargs, extra_postargs, build_temp, target_lang)
File "/home/miaoshuyu/anaconda3/lib/python3.6/distutils/unixccompiler.py", line 199, in link
raise LinkError(msg)
distutils.errors.LinkError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "build.py", line 34, in
ffi.build()
File "/home/miaoshuyu/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 189, in build
_build_extension(ffi, cffi_wrapper_name, target_dir, verbose)
File "/home/miaoshuyu/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 111, in _build_extension
outfile = ffi.compile(tmpdir=tmpdir, verbose=verbose, target=libname)
File "/home/miaoshuyu/anaconda3/lib/python3.6/site-packages/cffi/api.py", line 697, in compile
compiler_verbose=verbose, debug=debug, **kwds)
File "/home/miaoshuyu/anaconda3/lib/python3.6/site-packages/cffi/recompiler.py", line 1520, in recompile
compiler_verbose, debug)
File "/home/miaoshuyu/anaconda3/lib/python3.6/site-packages/cffi/ffiplatform.py", line 22, in compile
outputfilename = _build(tmpdir, ext, compiler_verbose, debug)
File "/home/miaoshuyu/anaconda3/lib/python3.6/site-packages/cffi/ffiplatform.py", line 58, in _build
raise VerificationError('%s: %s' % (e.class.name, e))
cffi.error.VerificationError: LinkError: command 'gcc' failed with exit status 1

roi_level = 4 + log2(torch.sqrt(h*w)/(224.0/torch.sqrt(image_area)))

I still cannot figure out this meaning. Can any one explain this for me? Much appreciation!
roi_level = 4 + log2(torch.sqrt(h*w)/(224.0/torch.sqrt(image_area)))
Equation 1 in the Feature Pyramid Networks paper. Account for
the fact that our coordinates are normalized here.
e.g. a 224x224 ROI (in pixels) maps to P4

The time question

how long had you train the model? I trained it on coco2017. It has use 4 days but only got 45 epoch. Is it normal?

how to train my data

hello,I want to know how to train my dataset?You only provide train the coco dataset,but I labeled datasets which are also many .json files.So how to train so many .json files.

Unable to build roialign

I run code as follows:
cd roialign/roi_align/src/cuda/ nvcc -c -o crop_and_resize_kernel.cu.o crop_and_resize_kernel.cu -x cu -Xcompiler -fPIC -arch=[arch] cd ../../ python build.py and it comes out an error
----------------------------------------------------Error Shows------------------------------------------------------------
Including CUDA code.
/home/mohong/workspace/pytorch-mask-rcnn/roialign/roi_align
generating /tmp/tmpHfqKxT/_crop_and_resize.c
setting the current directory to '/tmp/tmpHfqKxT'
running build_ext
building '_crop_and_resize' extension
creating home
creating home/mohong
creating home/mohong/workspace
creating home/mohong/workspace/pytorch-mask-rcnn
creating home/mohong/workspace/pytorch-mask-rcnn/roialign
creating home/mohong/workspace/pytorch-mask-rcnn/roialign/roi_align
creating home/mohong/workspace/pytorch-mask-rcnn/roialign/roi_align/src
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-nbjU53/python2.7-2.7.15rc1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DWITH_CUDA -I/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python2.7 -c _crop_and_resize.c -o ./_crop_and_resize.o -std=c99 -fopenmp -std=c99
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-nbjU53/python2.7-2.7.15rc1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DWITH_CUDA -I/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python2.7 -c /home/mohong/workspace/pytorch-mask-rcnn/roialign/roi_align/src/crop_and_resize.c -o ./home/mohong/workspace/pytorch-mask-rcnn/roialign/roi_align/src/crop_and_resize.o -std=c99 -fopenmp -std=c99
/home/mohong/workspace/pytorch-mask-rcnn/roialign/roi_align/src/crop_and_resize.c: In function ‘crop_and_resize_forward’:
/home/mohong/workspace/pytorch-mask-rcnn/roialign/roi_align/src/crop_and_resize.c:124:33: error: dereferencing pointer to incomplete type ‘THTensor {aka struct THTensor}’
const int batch_size = image->size[0];
^~
Traceback (most recent call last):
File "build.py", line 40, in
ffi.build()
File "/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/init.py", line 189, in build
_build_extension(ffi, cffi_wrapper_name, target_dir, verbose)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/init.py", line 111, in _build_extension
outfile = ffi.compile(tmpdir=tmpdir, verbose=verbose, target=libname)
File "/home/mohong/.local/lib/python2.7/site-packages/cffi/api.py", line 697, in compile
compiler_verbose=verbose, debug=debug, **kwds)
File "/home/mohong/.local/lib/python2.7/site-packages/cffi/recompiler.py", line 1520, in recompile
compiler_verbose, debug)
File "/home/mohong/.local/lib/python2.7/site-packages/cffi/ffiplatform.py", line 22, in compile
outputfilename = _build(tmpdir, ext, compiler_verbose, debug)
File "/home/mohong/.local/lib/python2.7/site-packages/cffi/ffiplatform.py", line 58, in _build
raise VerificationError('%s: %s' % (e.class.name, e))
cffi.error.VerificationError: CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1
-----------------------------------------------------What I have do----------------------------------------------------------
And I install essentials,
sudo apt-get install python-dev
build-essential libssl-dev libffi-dev
libxml2-dev libxslt1-dev zlib1g-dev
python-pip
libblas-dev libatlas-base-dev\

BUT, it still didn't work right. What's wrong with it? @tkepp @ozan-oktay @mattiaspaul @magzHL @ewoifjd

-----------------------------------------------Environment----------------------------------------------------------------
Pythorch 0.4.1, CUDA 9.0, python 2.7, Linux 18.04

bug in NMS

NMS is not working correctly, since the detections fed in are not sorted:

in nms/path_nms.py line 49:

nms.gpu_nms(keep, num_out, dets_temp, thresh)

the unordered 'dets_temp' are fed in instead of the ordered 'dets'.
This is solvable by either changing line 43 to :

dets_temp = dets_temp[order].contiguous()

or just getting rid of the dets_temp and feeding in dets, since NMS should be rotation invariant.

a PTX JIT compilation failed

Hello, when I train, I have an error likes this:
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCStorage.cu line=58 error=78 : a PTX JIT compilation failed Traceback (most recent call last): File "coco.py", line 374, in <module> model_dir=args.logs) File "/home/amax/Codes/pytorch-mask-rcnn/model.py", line 1416, in __init__ self.build(config=config) File "/home/amax/Codes/pytorch-mask-rcnn/model.py", line 1450, in build self.anchors = self.anchors.cuda() File "/home/amax/anaconda2/envs/py3/lib/python3.6/site-packages/torch/autograd/variable.py", line 298, in cuda return CudaTransfer.apply(self, device, async) File "/home/amax/anaconda2/envs/py3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 201, in forward return i.cuda(async=async) File "/home/amax/anaconda2/envs/py3/lib/python3.6/site-packages/torch/_utils.py", line 69, in _cuda return new_type(self.size()).copy_(self, async) File "/home/amax/anaconda2/envs/py3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 387, in _lazy_new return super(_CudaBase, cls).__new__(cls, *args, **kwargs) RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCStorage.cu:58
how to solve it?

train on multi gpus

I want to train this model on multi gpus, so I add codes in coco.py likes this：
model = torch.nn.DataParallel(model,device_ids=[0,1]) model = model.cuda()
but when I run, it still only runs on gpu0, so what happened? How can I do?
Thanks!

Pytorch 0.4

Training is not working in pytorch 0.4. Mainly due to some conditions in if require zero size tensor to be None type while pytorch 0.4 now support zero size tensor.... But after correct these if statement, now I sometime got boxes in pyramid_roi_align() to be size zero.
Anyone else trying to make it running with pytorch 0.4? Any ideas?

Is the training code working? If not is anyone working on getting it to work?

Pretrained weights for your version of Resnet101

Can the pretrained weights for your version of Resnet101 be found anywhere?

Thank you.

error: dereferencing pointer to incomplete type

I use Python3.5 and Torch 0.4.1 on Ubuntu16.04, Teslak40 GPU with CUDA8.0 and comp.capability of 3.5

When building RoIAlign:

python3.5 build.by

I get the error: dereferencing pointer to incomplete type const in the crop_and_resize.c file. The NMS class compiled without errors. I believe this is something to do with the architecture of CUDA. I compiled RoiAlign without errors:

nvcc -c -o crop_and_resize_kernel.cu.o crop_and_resize_kernel.cu -x cu -Xcompiler -fPIC -arch=compute_35 -code=sm_35

I don't know much about C/C++, could I do something without amending the code?

should I finish the utils.load_mask?

If I want to use your excellent code to train mask r-cnn on COCO, do I need to finish the load_mask in utils?
As far as I am concerned, the unfinished load_mask is a severe problem which prevent trainer from taking the advantage of semantic segment ground truth.
Do I understand your code right?
Did you train your mask r-cnn model on COCO without the true mask, which resulted in a 5 points loss from the paper?

hard-to-solve problem happens when applying new trained model on evaluation

I train the model on COCO for several epochs. I load one of the models got during training by changing the model_path in coco.py and it works apparently.
But errors happened in model.py, in refine_detections
keep = torch.nonzero(keep_bool)[:,0]

IndexError: trying to index 2 dimensions of a 0 dimensional tensor

The problem is that all the elements in keep_bool is zero. It results in a

torch.cuda.LongTensor with no dimension

output of torch.nonzero(keep_bool).

I've noticed that codes latter use keep.data as a index of list for many times, so I guess it's really hard to make up here.
I want to confirm is it true that the all-zero keep_bool should not happens in your design?

StopIteration

I am getting above statement while validating the trained model on a new dataset using Mask-RCNN, please let me know how could I solve this. Thank you in advance

About nvcc

Can someone provide a code version? It is compiled by nvcc after the version of roialign, nms. My nvcc environment is wrong and I tried many ways. Thank you very much.

ERROR: RuntimeError: cuda runtime error (2) : out of memory

~/pytorch-mask-rcnn$ python3 demo.py
Configurations:
BACKBONE_SHAPES [[256 256]
[128 128]
[ 64 64]
[ 32 32]
[ 16 16]]
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
GPU_COUNT 1
IMAGENET_MODEL_PATH /home/qq4060/pytorch-mask-rcnn/resnet50_imagenet.pth
IMAGES_PER_GPU 1
IMAGE_MAX_DIM 1024
IMAGE_MIN_DIM 800
IMAGE_PADDING True
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME coco
NUM_CLASSES 81
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 1000
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001

/home/qq4060/pytorch-mask-rcnn/model.py:1474: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
nn.init.xavier_uniform(m.weight)
/usr/lib/python3/dist-packages/scipy/misc/pilutil.py:479: FutureWarning: Conversion of the second argument of issubdtype from int to np.signedinteger is deprecated. In future, it will be treated as np.int64 == np.dtype(int).type.
if issubdtype(ts, int):
/usr/lib/python3/dist-packages/scipy/misc/pilutil.py:482: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
elif issubdtype(type(size), float):
/home/qq4060/pytorch-mask-rcnn/model.py:1595: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
molded_images = Variable(molded_images, volatile=True)
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "demo.py", line 74, in
results = model.detect([image])
File "/home/qq4060/pytorch-mask-rcnn/model.py", line 1598, in detect
detections, mrcnn_mask = self.predict([molded_images, image_metas], mode='inference')
File "/home/qq4060/pytorch-mask-rcnn/model.py", line 1645, in predict
layer_outputs.append(self.rpn(p))
File "/home/qq4060/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/qq4060/pytorch-mask-rcnn/model.py", line 879, in forward
x = self.relu(self.conv_shared(self.padding(x)))
File "/home/qq4060/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/qq4060/.local/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

several issues

Hi,
first of all thanks for the implementation.
I encountered some issues which i would like to report:

I believe the FPN, RPN, classifier, and mask nets all are missing a .cuda()
so that would be lines 1456, 1468, 1471, 1474 in model.py. If you dont do this an error occurs sayin the weights are not not in a GPUTensor. Or am I missing something?
The way you do sequential batch sizing leads to always batch size one, because gradients are deleted before the last batch sample:

if (batch_count % self.config.BATCH_SIZE) == 0:
            optimizer.zero_grad()
            loss.backward()
            torch.nn.utils.clip_grad_norm(self.parameters(), 5.0)
            if (batch_count % self.config.BATCH_SIZE) == 0:
                optimizer.step()
                batch_count = 0

If you would do optimizer.zero_grad() after the optimizer.step() this would allow for batch_sizes > 1 to be updated. Again, please correct me if I am wrong.

the matterport version trains much faster and better than this implementation - on my problem. Especially the RPN. Have there been any changes w.r.t matterport which were not mentioned yet?

Unable to build NMS

Running

 cd nms/src/cuda/
 nvcc -c -o nms_kernel.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=[arch]
 cd ../../
 python build.py

Breaks at the build.py step

/home/sidney/anaconda3/envs/ml/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h: In funct
ion ‘TH_trigammaf’:
/home/sidney/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h:278:3: er
ror: ‘for’ loop initial declarations are only allowed in C99 mode
   for (int i = 0; i < 6; ++i) {
   ^

got this sad error in pytorch 0.4: TypeError: Expected bytes, got str

Checkpoint Path: /home/bigtree/PycharmProjects/pytorch-mask-rcnn/logs/coco20180706T2256/mask_rcnn_coco_{:04d}.pth
Epoch 1/40.
Traceback (most recent call last):
File "coco.py", line 498, in
layers='heads')
File "/home/bigtree/PycharmProjects/pytorch-mask-rcnn/model.py", line 1793, in train_model
loss, loss_rpn_class, loss_rpn_bbox, loss_mrcnn_class, loss_mrcnn_bbox, loss_mrcnn_mask = self.train_epoch(train_generator, optimizer, self.config.STEPS_PER_EPOCH)
File "/home/bigtree/PycharmProjects/pytorch-mask-rcnn/model.py", line 1822, in train_epoch
for inputs in datagenerator:
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 286, in next
return self._process_next_batch(batch)
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/bigtree/PycharmProjects/pytorch-mask-rcnn/model.py", line 1359, in getitem
use_mini_mask=self.config.USE_MINI_MASK)
File "/home/bigtree/PycharmProjects/pytorch-mask-rcnn/model.py", line 1162, in load_image_gt
mask, class_ids = dataset.load_mask(image_id)
File "coco.py", line 245, in load_mask
image_info["width"])
File "coco.py", line 304, in annToMask
rle = self.annToRLE(ann, height, width)
File "coco.py", line 290, in annToRLE
rle = maskUtils.merge(rles)
File "pycocotools/_mask.pyx", line 145, in pycocotools._mask.merge (pycocotools/_mask.c:3173)
File "pycocotools/_mask.pyx", line 122, in pycocotools._mask._frString (pycocotools/_mask.c:2605)
TypeError: Expected bytes, got str

Only batchsize=1?

Hi, the code dose not support batchsize>1?

Where do I get mask_rcnn_coco_pth?

Do I have to train the backend on my own or could I download it (where)?

ImportError

Traceback (most recent call last):
File "/home/PycharmProjects/models/pytorch-mask-rcnn/demo.py", line 10, in
import coco
File "/home/PycharmProjects/models/pytorch-mask-rcnn/coco.py", line 50, in
import model as modellib
File "/home/PycharmProjects/models/pytorch-mask-rcnn/model.py", line 26, in
from nms.nms_wrapper import nms
File "/home/PycharmProjects/models/pytorch-mask-rcnn/nms/nms_wrapper.py", line 11, in
from nms.pth_nms import pth_nms
File "/home/PycharmProjects/models/pytorch-mask-rcnn/nms/pth_nms.py", line 2, in
from ._ext import nms
File "/home/PycharmProjects/models/pytorch-mask-rcnn/nms/_ext/nms/init.py", line 3, in
from ._nms import lib as _lib, ffi as _ffi
ImportError: /home/PycharmProjects/models/pytorch-mask-rcnn/nms/_ext/nms/_nms.so: undefined symbol: PyInt_FromLong

Any suggestions to finetune the mask-rcnn model to train own dataset? Which output layers should be tuned?

Cannot run demo.py

Hi, I have trained the model using the code and have stored the model in mask_rcnn_coco.pth. However when I try to run:
python demo.py
I am not able to run the demo script and the error is shown:
Traceback (most recent call last):
File "demo.py", line 75, in
results = model.detect([image])
File "/phd/pytorch-mask-rcnn/model.py", line 1598, in detect
detections, mrcnn_mask = self.predict([molded_images, image_metas], mode='inference')
File "/phd/pytorch-mask-rcnn/model.py", line 1673, in predict
detections = detection_layer(self.config, rpn_rois, mrcnn_class, mrcnn_bbox, image_metas)
File "/phd/pytorch-mask-rcnn/model.py", line 841, in detection_layer
detections = refine_detections(rois, mrcnn_class, mrcnn_bbox, window, config)
File "/phd/pytorch-mask-rcnn/model.py", line 786, in refine_detections
keep = torch.nonzero(keep_bool)[:,0]
File "/home/xyc/anaconda3/envs/map-seg-py35/lib/python3.5/site-packages/torch/autograd/variable.py", line 78, in getitem
return Index.apply(self, key)
File "/home/xyc/anaconda3/envs/map-seg-py35/lib/python3.5/site-packages/torch/autograd/_functions/tensor.py", line 89, in forward
result = i.index(ctx.index)
IndexError: trying to index 2 dimensions of a 0 dimensional tensor

Wondering if you can replicate this problem? I am running on Titan XP. The training seems all right. Thanks!

Not able to build NMS

Hi, after running python build.py inside pytorch-mask-rcnn/nms I get a following error [1]. Do you have any ideas of how to build nms correctly?

Cheers,
Andrzej

[1]

x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DWITH_CUDA -I/mnt/ml-tea
m/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/uti
ls/ffi/../../lib/include/TH -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.5m
 -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/include/python3.5m -c _nms.c -o ./_nms.o
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DWITH_CUDA -I/mnt/ml-tea
m/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/uti
ls/ffi/../../lib/include/TH -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.5m
 -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/include/python3.5m -c /mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms.c -o ./mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms.o
In file included from /mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/TH.h:4:0,
                 from /mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms.c:1:
/mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms.c: In function ‘cpu_nms’:
/mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms.c:7:42: warning: passing argument 1 of ‘THLongTensor_isContiguous’ from incompatible pointer type [-Wincompatible-pointer-types]
     THArgCheck(THLongTensor_isContiguous(boxes), 2, "boxes must be contiguous");
                                          ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/THGeneral.h:80:35: note: in definition of macro ‘THArgCheck’
   _THArgCheck(__FILE__, __LINE__, __VA_ARGS__);                       \
                                   ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/THTensor.h:8:39: note: expected ‘const THLongTensor * {aka const struct THLongTens
or *}’ but argument is of type ‘THFloatTensor * {aka struct THFloatTensor *}’
 #define THTensor_(NAME)   TH_CONCAT_4(TH,Real,Tensor_,NAME)
                                       ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/THGeneral.h:118:37: note: in definition of macro ‘TH_CONCAT_4_EXPAND’
 #define TH_CONCAT_4_EXPAND(x,y,z,w) x ## y ## z ## w
                                     ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/THTensor.h:8:27: note: in expansion of macro ‘TH_CONCAT_4’
 #define THTensor_(NAME)   TH_CONCAT_4(TH,Real,Tensor_,NAME)
                           ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/generic/THTensor.h:115:12: note: in expansion of macro ‘THTensor_’
 TH_API int THTensor_(isContiguous)(const THTensor *self);
            ^
/mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms.c:9:42: warning: passing argument 1 of ‘THLongTensor_isContiguous’ from incompatible pointer type [-Wincompatible-pointer-types]
     THArgCheck(THLongTensor_isContiguous(areas), 4, "areas must be contiguous");
                                          ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/THGeneral.h:80:35: note: in definition of macro ‘THArgCheck’
   _THArgCheck(__FILE__, __LINE__, __VA_ARGS__);                       \
                                   ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/THTensor.h:8:39: note: expected ‘const THLongTensor * {aka const struct THLongTensor *}’ but argument is of type ‘THFloatTensor * {aka struct THFloatTensor *}’
 #define THTensor_(NAME)   TH_CONCAT_4(TH,Real,Tensor_,NAME)
                                       ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/THGeneral.h:118:37: note: in definition of macro ‘TH_CONCAT_4_EXPAND’
 #define TH_CONCAT_4_EXPAND(x,y,z,w) x ## y ## z ## w
                                     ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/THTensor.h:8:27: note: in expansion of macro ‘TH_CONCAT_4’
 #define THTensor_(NAME)   TH_CONCAT_4(TH,Real,Tensor_,NAME)
                           ^
/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH/generic/THTensor.h:115:12: note: in expansion of macro ‘THTensor_’
 TH_API int THTensor_(isContiguous)(const THTensor *self);
            ^
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DWITH_CUDA -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.5m -I/mnt/ml-team/homes/andrzej.pyskir/mapping_challenge/venv/include/python3.5m -c /mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms_cuda.c -o ./mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms_cuda.o
/mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms_cuda.c: In function ‘gpu_nms’:
/mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms_cuda.c:29:35: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
   unsigned long long* mask_flat = THCudaLongTensor_data(state, mask);
                                   ^
/mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms_cuda.c:37:40: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
   unsigned long long * mask_cpu_flat = THLongTensor_data(mask_cpu);
                                        ^
/mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms_cuda.c:40:39: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
   unsigned long long* remv_cpu_flat = THLongTensor_data(remv_cpu);
                                       ^
/mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms_cuda.c:23:7: warning: unused variable ‘boxes_dim’ [-Wunused-variable]
   int boxes_dim = THCudaTensor_size(state, boxes, 1);
       ^
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 ./_nms.o ./mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms.o ./mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/nms_cuda.o /mnt/ml-team/warta/andrzej/pytorch-mask-rcnn/nms/src/cuda/nms_kernel.cu.o -o ./_nms.so

Trying to Export to ONNX

Hey guys,

I was wondering if anyone has tried converting this Pytorch implementation of Mask RCNN in the ONNX format. I've been attempting to do it but keep running into errors.

possible bug in set_trainable()?

Following the coco.py, at the beginning, for training only the heads, the set_trainable function will make all the params not belonging to the heads with requires_grad = False, which is fine, but later when for example we continue the training with the intention to update all the layers, set_trainable will not set the requires_grad property of all params to True; according to code it only ensures that the params not intended to be trained will have requires_grad = False...

GPU COUNT and BATCH SIZE not work

Thanks for the great work!
The problem I met is on gpu and batch size.

The GPU_COUNT only affects the number of images processed in each epoch, but there is only one gpu that is really working. The MaskRCNN module was not implemented with DataParallel, while it CANNOT be paralleled on various gpus as I tried.

The BATCH_SIZE only affects the number of images in each epoch, but in dataloader, the setting for each batch is again 1. When setting to 2 or higher, there will be error for not dealling with data shapes to make them batchable.

Is there any plan to fix these issues? Thank you so much!

The size of resnet is not the same as described in config file

The config file describes the stride is BACKBONE_STRIDES = [4, 8, 16, 32, 64] which corresponding to C1, C2, C3, C4,C5. But when I realize the code, size(C1)=size(C2)= 1/4 size(input)? how can explain this?

how to use multi Gpus in your code?

i have change the GPU_COUNT=2 in this code ,and it still only one gpu works.
I want to know if it is necessary to add another one when build --arch option

thanks a lot

out of memory on 4G GPU memory

Thanks for the great work!
When running the provided demo I encounter out of memory runtime error.

My machine is Linux with Nvidia 1050 that has 4GB of memory, cuda 9.1, cudnn 7.1 and torch 0.4

error output is:

/path-to/pytorch-mask-rcnn/model.py:1474: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  nn.init.xavier_uniform(m.weight)
/path-to/pytorch-mask-rcnn/model.py:1595: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  molded_images = Variable(molded_images, volatile=True)
THCudaCheck FAIL file=/path-to/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "demo.py", line 74, in <module>
    results = model.detect([image])
  File "/path-to/pytorch-mask-rcnn/model.py", line 1598, in detect
    detections, mrcnn_mask = self.predict([molded_images, image_metas], mode='inference')
  File "/path-to/pytorch-mask-rcnn/model.py", line 1636, in predict
    [p2_out, p3_out, p4_out, p5_out, p6_out] = self.fpn(molded_images)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 371, in __call__
    result = self.forward(*input, **kwargs)
  File "/path-to/pytorch-mask-rcnn/model.py", line 181, in forward
    p2_out = self.P2_conv1(c2_out) + F.upsample(p3_out, scale_factor=2)
RuntimeError: cuda runtime error (2) : out of memory at /path-to/pytorch/aten/src/THC/generic/THCStorage.cu:58

I managed to bypass the problem by using with torch.no_grad(): around variables that use volatile flag according to https://discuss.pytorch.org/t/torch-no-grad/12296, but I have no idea what are the repercussions to doing so (newbie to pytorch).

RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)

Does any one meet this problem, it come up after many batch trainning ?

Traceback (most recent call last):
File "coco.py", line 498, in
layers='heads')
File "/home/dong/dong/pytorch/2-mask-rcnn/model.py", line 1796, in train_model
loss, loss_rpn_class, loss_rpn_bbox, loss_mrcnn_class, loss_mrcnn_bbox, loss_mrcnn_mask = self.train_epoch(train_generator, optimizer, self.config.STEPS_PER_EPOCH
File "/home/dong/dong/pytorch/2-mask-rcnn/model.py", line 1858, in train_epoch
self.predict([images, image_metas, gt_class_ids, gt_boxes, gt_masks], mode='training')
File "/home/dong/dong/pytorch/2-mask-rcnn/model.py", line 1732, in predict
mrcnn_class_logits, mrcnn_class, mrcnn_bbox = self.classifier(mrcnn_feature_maps, rois)
File "/home/dong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/dong/zhangdong/pytorch/2-mask-rcnn/model.py", line 929, in forward
x = pyramid_roi_align([rois]+x, self.pool_size, self.image_shape)
File "/home/dong/dong/pytorch/2-mask-rcnn/model.py", line 436, in pyramid_roi_align
y1, x1, y2, x2 = boxes.chunk(4, dim=1)
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)

The number of anchors per pixel may be wrong?

In line1452, model.py,

self.rpn = RPN(len(config.RPN_ANCHOR_RATIOS), config.RPN_ANCHOR_STRIDE, 256)

should be

self.rpn = RPN(len(config.RPN_ANCHOR_RATIOS)*len(config. RPN_ANCHOR_SCALES), config.RPN_ANCHOR_STRIDE, 256) ?

version of pytorch

Can I run your code with pytorch0.2 ?

Error in demo.py: IndexError: trying to index 2 dimensions of a 0 dimensional tensor

I used pretrained weights and I have received the above error. The error is seen for some images. For others, the code runs perfectly. I have no clue how to solve it. Could someone please help me? I have used the code from the github repo.

Traceback (most recent call last):
File "/home/omkar/pycharm-community-2017.3.4/helpers/pydev/pydev_run_in_console.py", line 53, in run_file
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/omkar/pycharm-community-2017.3.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/omkar/PycharmProjects/mask_rcnn/demo.py", line 149, in
result = model.detect([image])
File "/home/omkar/PycharmProjects/mask_rcnn/model.py", line 1599, in detect
detections, mrcnn_mask = self.predict([molded_images, image_metas], mode='inference')
File "/home/omkar/PycharmProjects/mask_rcnn/model.py", line 1674, in predict
detections = detection_layer(self.config, rpn_rois, mrcnn_class, mrcnn_bbox, image_metas)
File "/home/omkar/PycharmProjects/mask_rcnn/model.py", line 842, in detection_layer
detections = refine_detections(rois, mrcnn_class, mrcnn_bbox, window, config)
File "/home/omkar/PycharmProjects/mask_rcnn/model.py", line 787, in refine_detections
keep = torch.nonzero(keep_bool)[:,0]
File "/home/omkar/anaconda3/envs/mask_rcnn/lib/python3.6/site-packages/torch/autograd/variable.py", line 78, in getitem
return Index.apply(self, key)
File "/home/omkar/anaconda3/envs/mask_rcnn/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 89, in forward
result = i.index(ctx.index)
IndexError: trying to index 2 dimensions of a 0 dimensional tensor

Thanking you in anticipation.
Regards,
Omkar.

nvcc fatal

nvcc -c -o nms_kernel.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch='sm_61'
nvcc fatal : Value 'sm_61' is not defined for option 'gpu-architecture'

multimodallearning / pytorch-mask-rcnn Goto Github PK

pytorch-mask-rcnn's People

Stargazers

Watchers

Forkers

pytorch-mask-rcnn's Issues

Recommend Projects

Recommend Topics

Recommend Org