Hi,
I was trying to compare a Domain Adaptive (DA) model against a normally (i.e. without any adaptation) trained model on my custom datasets. DA model is successfully trained but I am getting following error while trying to train the normal model (I set DOMAIN_ADAPTATION: False and passed DATASETS: my_dataset_path to do normal training). While I am trying to dig deeper, could you please provide any pointer as to what may be wrong? Thanks.
Note that I am able to train normal models using detectron's original repo.
##################### Error log #############
[E pybind_state.h:424] Exception encountered running PythonOp function: KeyError: ('da_rois',)
At:
/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/roi_data/fast_rcnn.py(126): add_fast_rcnn_blobs
/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/ops/generate_proposal_labels.py(52): forward
[E net_async_base.cc:377] [enforce fail at pybind_state.h:425] . Exception encountered running PythonOp function: KeyError: ('da_rois',)
At:
/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/roi_data/fast_rcnn.py(126): add_fast_rcnn_blobs
/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/ops/generate_proposal_labels.py(52): forward
Error from operator:
input: "gpu_0/rpn_rois" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois" output: "gpu_0/labels_int32" output: "gpu_0/bbox_targets" output: "gpu_0/bbox_inside_weights" output: "gpu_0/bbox_outside_weights" name: "GenerateProposalLabelsOp:rpn_rois,roidb,im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:1" } arg { name: "grad_output_indices" } device_option { device_type: 0 }Error from operator:
input: "gpu_0/rpn_rois" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois" output: "gpu_0/labels_int32" output: "gpu_0/bbox_targets" output: "gpu_0/bbox_inside_weights" output: "gpu_0/bbox_outside_weights" name: "GenerateProposalLabelsOp:rpn_rois,roidb,im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:1" } arg { name: "grad_output_indices" } device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x50 (0x7f3b8b5a8000 in /home/neo/ML/pytorch/build/lib/libc10.so)
frame #1: + 0x9ddfa (0x7f3b8bce1dfa in /home/neo/ML/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #2: + 0x9b98b (0x7f3b8bcdf98b in /home/neo/ML/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #3: + 0xea2b3 (0x7f3b8bd2e2b3 in /home/neo/ML/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #4: + 0xe8f74 (0x7f3b8bd2cf74 in /home/neo/ML/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f3b6bf88eb4 in /home/neo/ML/pytorch/build/lib/libcaffe2.so)
frame #6: + 0x159a0a9 (0x7f3b6bf8f0a9 in /home/neo/ML/pytorch/build/lib/libcaffe2.so)
frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x283 (0x7f3b6b08f183 in /home/neo/ML/pytorch/build/lib/libcaffe2.so)
frame #8: + 0x91c10 (0x7f3b916a2c10 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #9: + 0x8184 (0x7f3b981dd184 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #10: clone + 0x6d (0x7f3b97f0a37d in /lib/x86_64-linux-gnu/libc.so.6)
, op Python
[E net_async_base.cc:129] Rethrowing exception from the run of 'generalized_rcnn'
WARNING workspace.py: 205: Original python traceback for operator 38
in network generalized_rcnn
in exception above (most recent call last):
WARNING workspace.py: 210: File "tools/train_net.py", line 132, in
WARNING workspace.py: 210: File "tools/train_net.py", line 114, in main
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/utils/train.py", line 53, in train_model
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/utils/train.py", line 145, in create_model
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/modeling/model_builder.py", line 125, in create
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/modeling/model_builder.py", line 90, in generalized_rcnn
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/modeling/model_builder.py", line 241, in build_generic_detection_model
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/modeling/optimizer.py", line 40, in build_data_parallel_model
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/modeling/optimizer.py", line 63, in _build_forward_graph
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/modeling/model_builder.py", line 193, in _single_gpu_build_func
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/modeling/rpn_heads.py", line 51, in add_generic_rpn_outputs
WARNING workspace.py: 210: File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/modeling/rpn_heads.py", line 160, in add_single_scale_rpn_losses
Traceback (most recent call last):
File "tools/train_net.py", line 132, in
main()
File "tools/train_net.py", line 114, in main
checkpoints = detectron.utils.train.train_model()
File "/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/utils/train.py", line 67, in train_model
workspace.RunNet(model.net.Proto().name)
File "/home/neo/ML/pytorch/build/caffe2/python/workspace.py", line 237, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/neo/ML/pytorch/build/caffe2/python/workspace.py", line 198, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: [enforce fail at pybind_state.h:425] . Exception encountered running PythonOp function: KeyError: ('da_rois',)
At:
/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/roi_data/fast_rcnn.py(126): add_fast_rcnn_blobs
/home/neo/ML/Detectron-DA-Faster-RCNN/detectron/ops/generate_proposal_labels.py(52): forward
Error from operator:
input: "gpu_0/rpn_rois" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois" output: "gpu_0/labels_int32" output: "gpu_0/bbox_targets" output: "gpu_0/bbox_inside_weights" output: "gpu_0/bbox_outside_weights" name: "GenerateProposalLabelsOp:rpn_rois,roidb,im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:1" } arg { name: "grad_output_indices" } device_option { device_type: 0 }Error from operator:
input: "gpu_0/rpn_rois" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois" output: "gpu_0/labels_int32" output: "gpu_0/bbox_targets" output: "gpu_0/bbox_inside_weights" output: "gpu_0/bbox_outside_weights" name: "GenerateProposalLabelsOp:rpn_rois,roidb,im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:1" } arg { name: "grad_output_indices" } device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x50 (0x7f3b8b5a8000 in /home/neo/ML/pytorch/build/lib/libc10.so)
frame #1: + 0x9ddfa (0x7f3b8bce1dfa in /home/neo/ML/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #2: + 0x9b98b (0x7f3b8bcdf98b in /home/neo/ML/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #3: + 0xea2b3 (0x7f3b8bd2e2b3 in /home/neo/ML/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #4: + 0xe8f74 (0x7f3b8bd2cf74 in /home/neo/ML/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f3b6bf88eb4 in /home/neo/ML/pytorch/build/lib/libcaffe2.so)
frame #6: + 0x159a0a9 (0x7f3b6bf8f0a9 in /home/neo/ML/pytorch/build/lib/libcaffe2.so)
frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x283 (0x7f3b6b08f183 in /home/neo/ML/pytorch/build/lib/libcaffe2.so)
frame #8: + 0x91c10 (0x7f3b916a2c10 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #9: + 0x8184 (0x7f3b981dd184 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #10: clone + 0x6d (0x7f3b97f0a37d in /lib/x86_64-linux-gnu/libc.so.6)