Code Monkey home page Code Monkey logo

py-bottom-up-attention's Introduction

Bottom-up Attention with Detectron2

The detectron2 system with exactly the same model and weight as the Caffe VG Faster R-CNN provided in bottom-up-attetion.

The original bottom-up-attetion is implemented based on Caffe, which is not easy to install and is inconsistent with the training code in PyTorch. Our project thus transfers the weights and models to detectron2 that could be few-line installed and has PyTorch front-end.

The features extracted from this repo is compatible with LXMERT code and pre-trained models here. Results have been locally verified.

Installation

git clone https://github.com/airsplay/py-bottom-up-attention.git
cd py-bottom-up-attention

# Install python libraries
pip install -r requirements.txt
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

# Install detectron2
python setup.py build develop

# or if you are on macOS
# MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop

# or, as an alternative to `setup.py`, do
# pip install [--editable] .

Demos

Object Detection

demo vg detection

Feature Extraction

With Attributes:

  1. Single image: demo extraction
  2. Single image (Given boxes): demo extraction

Without Attributes:

  1. Single image: demo extraction
  2. Single image (Given boxes): demo extraction

Feature Extraction Scripts for MS COCO

Note: this script does not include attribute. If you want to use attributes, please modify it according to the demo

  1. For MS COCO (VQA): vqa script

Note

  1. The default weight is same to the 'alternative pretrained model' in the original github here, which is trained with 36 bbxes. If you want to use the original detetion trained with 10~100 bbxes, please use the following weight:
    http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe_attr_original.pkl
    
  2. The coordinate generated from the code is (x_left_corner, y_top_corner, x_right_corner, y_bottom_corner). Here is a visualization. Suppose the box = [x0, y0, x1, y1], it annotates an RoI of:
    0-------------------------------------
     |                                   |
     y0 box[1]   |-----------|           |
     |           |           |           |
     |           |  Object   |           |
     y1 box[3]   |-----------|           |
     |                                   |
    H----------x0 box[0]-----x1 box[2]----
     0                                   W
    
  3. If the link breaks, please use this Google Drive: https://drive.google.com/drive/folders/1ICBed8F9JaayAshptGMiGtRj78esg3m4?usp=sharing.

External Links

  1. The orignal CAFFE implementation https://github.com/peteanderson80/bottom-up-attention, and its docker image.
  2. bottom-up-attention.pytorch maintained by MIL-LAB.

Proof of Correctness

  1. As shown in demo

Note: You might find a little difference between the caffe features and pytorch features in this verification demo. It is because the verification uses the setup "Given box" instead of "Predicted boxes". If the features are extracted from scratch (i.e., features with predicted boxes), they are exactly the same.

Detailed explanation is here; "Given box" will use feature with the final predicted boxes (after box regression), however, the extracted features will use the features of the proposals. I illustrate this in below:

Feature extraction (using predicted boxes):

ResNet --> RPN --> RoiPooling + Res5 --> Box Regression --> BOX
                                      |-------------------> Feature --> Label
                                                                  |-> Attribute

Feature extraction (using given boxes):

ResNet --> RPN --> RoiPooling + Res5 --> Box Regression --> BOX
                                           |--> RoIPooling + Res5 --> Feature --> Label
                                                                              |-> Attribute

Acknowledgement

The Caffe2PyTorch conversion code (not released here) is based on Ruotian Luo's PyTorch-ResNet project. The project also refers to Ross Girshick's old py-faster-rcnn on its way.

References

Detectron2:

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

Bottom-up Attention:

@inproceedings{Anderson2017up-down,
  author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
  title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
  booktitle={CVPR},
  year = {2018}
}

LXMERT:

@inproceedings{tan2019lxmert,
  title={LXMERT: Learning Cross-Modality Encoder Representations from Transformers},
  author={Tan, Hao and Bansal, Mohit},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
  year={2019}
}

py-bottom-up-attention's People

Contributors

airsplay avatar arutyunovg avatar bigbookplus avatar botcs avatar bryant1410 avatar donnydonny123 avatar endernewton avatar facebook-github-bot avatar higumachan avatar invisprints avatar jahaniam avatar lyttonhao avatar marload avatar maxfrei750 avatar nero19960329 avatar ppwwyyxx avatar raymondcm avatar rbgirshick avatar sampepose avatar shapovalov avatar shenyunhang avatar skeletonone avatar srishti-nema avatar timgates42 avatar viven12138 avatar vkhalidov avatar wangg12 avatar wanyenlo avatar yanicklandry avatar zxf8665905 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

py-bottom-up-attention's Issues

How can I transfer the label number to its class name?

❓ How to use Detectron2

edit 1: Sorry It's from LXMERT.. I was confused. but anyway It would be very helpful if I can get an answer!

Thanks for your great work!

I got the file "vg_gqa_obj36.tsv" from gdrive
and
among fields, there is objects_id
which i guess indicates a label of an object
then, How can I alias this label to the real class name?
Is there a kind of like LABEL2NAME file?

`"""
Example in obj tsv:

FIELDNAMES = ["img_id", "img_h", "img_w", "objects_id", "objects_conf", "attrs_id", "attrs_conf", "num_boxes", "boxes", "features"]
"""`

'
OrderedDict([('img_id', 'n116329'),
('img_h', 427),
('img_w', 640),
('objects_id',
array([177, 397, 453, 397, 453, 177, 291, 90, 397, 128, 177, 397, 299,
308, 50, 236, 397, 314, 291, 98, 364, 50, 601, 50, 50, 299,
50, 50, 209, 397, 51, 299, 50, 453, 51, 776])),
'
177 -> What?

Thank you!

setuptools.errors

❓ How to use Detectron2

Questions like:

I had the following problem after installing requirements:
AttributeError: module 'setuptools.errors' has no attribute 'CompileError'

NOTE:

  1. If you met any unexpected issue when using detectron2 and wish to know why,
    please use the "setuptools.errors" issue template.
  2. If the problem is my version, can you provide a suitable version

Load weights from original pretrained model

Thanks for your effort to migrate the slightly modified Faster-RCNN from Caffe to PyTorch!

Your README.md states that

The detectron2 system with exact same model and weight as the Caffe VG Faster R-CNN provided in bottom-up-attetion.

The features extracted from this repo is compatible with LXMERT code and pre-trained models here. The original [bottom-up-attetion] is implemented based on Caffe, which is not easy to install and is inconsistent with the training code in PyTorch. Our project thus transfers the weights and models to detectron2 that could be few-line installed and has PyTorch front-end.

When going through your code and extracting the diff towards detectron2, I could see how you transferred the model from bottom-up-attention using the additional options:

  • CAFFE_MAXPOOL
  • PROPOSAL_GENERATOR.HID_CHANNELS
  • ROI_BOX_HEAD.RES5HALVE

see defaults.py and faster_rcnn_R_101_C4_caffemaxpool.yaml.

However, I cannot see in the code how the original weights are transferred.

Is it possible to load the weights from the original Caffe alternative pretrained model resnet101_faster_rcnn_final.caffemodel from bottom-up-attention#demo into your modified PyTorch/detectron2 model?

A plan to reveal a Batch-based RoI feature extractor?

Thanks for your great job.
I am trying to use the demo tools you have revealed to extract RoI and box features.
Since It is too slow to extract features by inputing single image, would you plan to release a batch-based extractor?

What is the JSON file "coco_minival_img_ids.json"?

Hi, thanks for your share.
And i don't know what is the file "coco_minival_img_ids.json" when i extracted features from 'val2014'.

so, how can i get the file

can you provide it? or how can i generate it

Why are the feature dimensions different?

Questions like:
I use the following code to extract features based on the FPN model. The config is:
"COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"

Feature extraction code:
'''
img_path = "input.jpg"
img_ori = cv2.imread(img_path)
height, width = img_ori.shape[:2]
img = predictor.transform_gen.get_transform(img_ori).apply_image(img_ori)
img = torch.as_tensor(img.astype("float32").transpose(2, 0, 1))
inputs = [{"image": img, "height": height, "width": width}]
with torch.no_grad():
imglists = predictor.model.preprocess_image(inputs) # don't forget to preprocess
features = predictor.model.backbone(imglists.tensor) # set of cnn features
proposals, _ = predictor.model.proposal_generator(imglists, features, None) # RPN

proposal_boxes = [x.proposal_boxes for x in proposals]
features_list = [features[f] for f in predictor.model.roi_heads.in_features]
proposal_rois = predictor.model.roi_heads.box_pooler(features_list,  proposal_boxes)
**box_features** = predictor.model.roi_heads.box_head(proposal_rois) 

'''

I use box_features as the feature of the object detection. But its dimension is 1024, which is inconsistent with the original bottom-up-attention image feature dimension in 2048. They all use residual-101 as the backbone network, so why are the feature dimensions inconsistent?

I apologized if the answer is obvious, I am very new to object detection.

Thank you!

How to run detectron2_mscoco_proposal_maxnms.py with multiple gpus?

Hi,

I am new to detectron2. I want to extract the feature with multi-gpu, but it seems detectron2_mscoco_proposal_maxnms.py only uses 1 gpu.

Could you give me some guidance how to run detectron2_mscoco_proposal_maxnms.py with multiple gpus?

Looking forward to your reply.

WARNING [01/11 17:21:52 d2.config.compat]: Config '/home/lvxinyu/AVVP/Others/py-bottom-up-attention/configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml' has no VERSION. Assuming it to be compatible with latest v2. Modifications for VG in RPN (modeling/proposal_generator/rpn.py): Use hidden dim 512 instead fo the same dim as Res4 (1024). Modifications for VG in RoI heads (modeling/roi_heads/roi_heads.py): 1. Change the stride of conv1 and shortcut in Res5.Block1 from 2 to 1. 2. Modifying all conv2 with (padding: 1 --> 2) and (dilation: 1 --> 2). For more details, please check 'https://github.com/peteanderson80/bottom-up-attention/blob/master/models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt'.

Hi! When I run the demo_vg_detection.ipynb, it has such logs without predicted results. Can you help me with that?

WARNING [01/11 17:21:52 d2.config.compat]: Config '/home/lvxinyu/AVVP/Others/py-bottom-up-attention/configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Modifications for VG in RPN (modeling/proposal_generator/rpn.py):
Use hidden dim 512 instead fo the same dim as Res4 (1024).

Modifications for VG in RoI heads (modeling/roi_heads/roi_heads.py):
1. Change the stride of conv1 and shortcut in Res5.Block1 from 2 to 1.
2. Modifying all conv2 with (padding: 1 --> 2) and (dilation: 1 --> 2).
For more details, please check 'https://github.com/peteanderson80/bottom-up-attention/blob/master/models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt'.

'Non-existent config key: MODEL.PROPOSAL_GENERATOR.HID_CHANNELS'

When I run the cell in demo_feature_extraction

cfg = get_cfg()
cfg.merge_from_file("../configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml")
cfg.MODEL.RPN.POST_NMS_TOPK_TEST = 300
cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST = 0.6
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.2
# VG Weight
cfg.MODEL.WEIGHTS = "http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe.pkl
predictor = DefaultPredictor(cfg)

it occurs

Config '../configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml' has no VERSION. Assuming it to be compatible with latest v2.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-f894caa4b31a> in <module>
      1 cfg = get_cfg()
----> 2 cfg.merge_from_file("../configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml")
      3 cfg.MODEL.RPN.POST_NMS_TOPK_TEST = 300
      4 cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST = 0.6
      5 cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.2

~/anaconda3/envs/vln/lib/python3.6/site-packages/detectron2/config/config.py in merge_from_file(self, cfg_filename, allow_unsafe)
     47 
     48         if loaded_ver == self.VERSION:
---> 49             self.merge_from_other_cfg(loaded_cfg)
     50         else:
     51             # compat.py needs to import CfgNode

~/anaconda3/envs/vln/lib/python3.6/site-packages/fvcore/common/config.py in merge_from_other_cfg(self, cfg_other)
    116             BASE_KEY not in cfg_other  # pyre-ignore
    117         ), "The reserved key '{}' can only be used in files!".format(BASE_KEY)
--> 118         return super().merge_from_other_cfg(cfg_other)
    119 
    120     def merge_from_list(self, cfg_list: List[object]) -> Callable[[], None]:

~/anaconda3/envs/vln/lib/python3.6/site-packages/yacs/config.py in merge_from_other_cfg(self, cfg_other)
    215     def merge_from_other_cfg(self, cfg_other):
    216         """Merge `cfg_other` into this CfgNode."""
--> 217         _merge_a_into_b(cfg_other, self, self, [])
    218 
    219     def merge_from_list(self, cfg_list):

~/anaconda3/envs/vln/lib/python3.6/site-packages/yacs/config.py in _merge_a_into_b(a, b, root, key_list)
    462             if isinstance(v, CfgNode):
    463                 try:
--> 464                     _merge_a_into_b(v, b[k], root, key_list + [k])
    465                 except BaseException:
    466                     raise

~/anaconda3/envs/vln/lib/python3.6/site-packages/yacs/config.py in _merge_a_into_b(a, b, root, key_list)
    462             if isinstance(v, CfgNode):
    463                 try:
--> 464                     _merge_a_into_b(v, b[k], root, key_list + [k])
    465                 except BaseException:
    466                     raise

~/anaconda3/envs/vln/lib/python3.6/site-packages/yacs/config.py in _merge_a_into_b(a, b, root, key_list)
    475                 root.raise_key_rename_error(full_key)
    476             else:
--> 477                 raise KeyError("Non-existent config key: {}".format(full_key))
    478 
    479 

KeyError: 'Non-existent config key: MODEL.PROPOSAL_GENERATOR.HID_CHANNELS'

ValueError cannot reshape array into shape when loading in generated COCO features

Instructions To Reproduce the Issue

We extracted features of the COCO train2017 split with the detectron2_mscoco_proposal_maxnms.py script. This completed without errors.

Afterwards we try to read in the features from disk with the following function.

AIRSPLAY_FIELDNAMES = ['img_id', 'img_w', 'img_h', 'objects_id', 'objects_conf', 'attrs_id', 'attrs_conf',
                       'num_boxes', 'boxes', 'features']

def read_airsplay_tsv(infile, year='2017'):
    data = {}
    with open(infile, "r") as tsv_in_file:
        reader = csv.DictReader(tsv_in_file, delimiter='\t', fieldnames=AIRSPLAY_FIELDNAMES)
        for item in tqdm(reader):
            data_item = {}
            data_item['image_id'] = int(item['img_id']) if year == '2017' else int(item['img_id'].split('_')[-1])
            data_item['image_h'] = int(item['img_h'])
            data_item['image_w'] = int(item['img_w'])
            data_item['num_boxes'] = int(item['num_boxes'])
            for field, dtype in [('boxes', np.float32),
                                 ('features', np.float32),
                                 ('objects_id', np.int64),
                                 ('objects_conf', np.float32)]:
                feature = np.frombuffer(base64.b64decode(item[field]), dtype=dtype)
                feature = feature.reshape((data_item['num_boxes'], -1))
                data_item[field] = feature
            data[data_item['image_id']] = data_item
    return data

This gives the following error in iteration 12663:

Traceback (most recent call last):
  File "/cw/liir/NoCsBack/testliir/rubenc/miniconda3/envs/vpcfg_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-067698b32ccd>", line 1, in <module>
    data2 = read_airsplay_tsv('/cw/liir/NoCsBack/testliir/rubenc/py-bottom-up-attention/data/mscoco_imgfeat/train2017_d2obj36_batch_2.tsv', year='2017')
  File "/cw/liir/NoCsBack/testliir/rubenc/vpcfg-dev/latent-structure-tools/common_tools/tools/read_tsv.py", line 58, in read_airsplay_tsv
    feature = feature.reshape((data_item['num_boxes'], -1))
ValueError: cannot reshape array of size 73728 into shape (35,newaxis)

The problem seems to be that 35, and not 36 boxes have been extracted for this image, but that the dimensions of the features do not correspond. The 'num_boxes' field equals 35, the 'boxes' field can be reshaped to (35, 4), but the 'features' field cannot be reshaped to (35, -1), it can however be reshaped to (36, -1).

Reading in the generated features for the val2017, val2014 and train2014 splits in the same way does work without errors.

How can we solve this (how can I make the detectron2_mscoco_proposal_maxnms.py script correctly save 36 boxes per image)?
If we reshape the feature to (36, -1) instead of (35, -1) and only use the first 35 rows, will these correctly correspond to the 35 saved boxes?

EDIT: same problem after running the script with MIN_BOXES = 10, MAX_BOXES = 100.

Environment

Please paste the output of python -m detectron2.utils.collect_env, or use python detectron2/utils/collect_env.py if detectron2 hasn't been successfully installed.


sys.platform linux
Python 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0]
Numpy 1.21.2
Detectron2 Compiler GCC 7.5
Detectron2 CUDA Compiler 11.4
DETECTRON2_ENV_MODULE
PyTorch 1.4.0
PyTorch Debug Build False
torchvision 0.5.0
CUDA available True
GPU 0 NVIDIA GeForce RTX 2080 Ti
GPU 1,2 NVIDIA TITAN Xp
GPU 3 NVIDIA GeForce GTX 1080 Ti
CUDA_HOME /usr/local/cuda
NVCC Build cuda_11.4.r11.4/compiler.30300941_0
Pillow 8.3.2
cv2 4.5.3


PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

Converting the output tensor which is a Detectron 2 tensor to a normal PyTorch tensor

How can I convert the Detectron 2 output tensors to normal PyTorch tensors?

I am sharing the results with someone who doesn't and can't install Detectron2 on his machine due to limitations and wonder how I could convert them to normal PyTorch tensors?

I have to set of .pt files one of them is the ROI extracted features for 36 ROIs in the image

and the other one is the .pt relating to bounding boxes, their prediction, and class number.

detectron2_ft
detectron2_tensors

The person is also able to use them in Google Colab but didn't find an easy installation of your code or detectron 2 for Google Colab.

Coordinates for feature prediction given boxes

Feature prediction given boxes

If I want to predict features given box coordinates, following the script in your demos, are the coordinates to provide in the format XYXY, or in format XYWH?

Thanks!

predict 10~100 bbxes in feature extraction

Thanks for your codes and models!
I notice that you have provided the model trained with 10-100 bbxes.
I wonder how I can use this model to predict 10-100 bbxes in feature extraction.
It seems that the number of features is closely related to the hyperparameter 'NUM_OBJECTS'.
I don't know how to set it in the adaptive feature extraction mode.
Thanks!

Getting size mismatch message when loading model

Thanks for your amazing work. I am trying to get your repo running with a later version of detectron2.
I am trying to run the demo ipython notebook https://github.com/airsplay/py-bottom-up-attention/blob/master/demo/demo_feature_extraction_attr.ipynb with a later version of detectron2 (version 0.2). I was getting some error and to resolve them I had to add few parameters in the detectron2/config/defaults.py file, based on your suggestion in another issue. I added these parameters - _C.MODEL.PROPOSAL_GENERATOR.HID_CHANNELS, _C.MODEL.ROI_BOX_HEAD.RES5HALVE, _C.MODEL.ROI_BOX_HEAD.ATTR, _C.MODEL.ROI_BOX_HEAD.NUM_ATTRS, and _C.MODEL.CAFFE_MAXPOOL.

When I am running the demo now, I am getting a message at this line - predictor = DefaultPredictor(cfg). Seems there is a dimension mismatch between the weights and the architecture. Below is the message

Skip loading parameter 'proposal_generator.rpn_head.conv.weight' to the model due to incompatible shapes: (512, 1024, 3, 3) in the checkpoint but (1024, 1024, 3, 3) in the model! You might want to double check if this is expected.
Skip loading parameter 'proposal_generator.rpn_head.conv.bias' to the model due to incompatible shapes: (512,) in the checkpoint but (1024,) in the model! You might want to double check if this is expected.
Skip loading parameter 'proposal_generator.rpn_head.objectness_logits.weight' to the model due to incompatible shapes: (12, 512, 1, 1) in the checkpoint but (12, 1024, 1, 1) in the model! You might want to double check if this is expected.
Skip loading parameter 'proposal_generator.rpn_head.anchor_deltas.weight' to the model due to incompatible shapes: (48, 512, 1, 1) in the checkpoint but (48, 1024, 1, 1) in the model! You might want to double check if this is expected.

Can you give me some directions on what could be causing the issue or where to look in the code to debug this?

To give more context about the setting. Below are the config parameters:

ipdb> print(cfg.dump())
CUDNN_BENCHMARK: false
DATALOADER:
  ASPECT_RATIO_GROUPING: true
  FILTER_EMPTY_ANNOTATIONS: true
  NUM_WORKERS: 4
  REPEAT_THRESHOLD: 0.0
  SAMPLER_TRAIN: TrainingSampler
DATASETS:
  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
  PROPOSAL_FILES_TEST: []
  PROPOSAL_FILES_TRAIN: []
  TEST:
  - coco_2017_val
  TRAIN:
  - coco_2017_train
GLOBAL:
  HACK: 1.0
INPUT:
  CROP:
    ENABLED: false
    SIZE:
    - 0.9
    - 0.9
    TYPE: relative_range
  FORMAT: BGR
  MASK_FORMAT: polygon
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
  MIN_SIZE_TRAIN_SAMPLING: choice
MODEL:
  ANCHOR_GENERATOR:
    ANGLES:
    - - -90
      - 0
      - 90
    ASPECT_RATIOS:
    - - 0.5
      - 1.0
      - 2.0
    NAME: DefaultAnchorGenerator
    OFFSET: 0.0
    SIZES:
    - - 64
      - 128
      - 256
      - 512
  BACKBONE:
    FREEZE_AT: 2
    NAME: build_resnet_backbone
  CAFFE_MAXPOOL: true
  DEVICE: cuda
  FPN:
    FUSE_TYPE: sum
    IN_FEATURES: []
    NORM: ''
    OUT_CHANNELS: 256
  KEYPOINT_ON: false
  LOAD_PROPOSALS: false
  MASK_ON: false
  META_ARCHITECTURE: GeneralizedRCNN
  PANOPTIC_FPN:
    COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
    INSTANCE_LOSS_WEIGHT: 1.0
  PIXEL_MEAN:
  - 102.9801
  - 115.9465
  - 122.7717
  PIXEL_STD:
  - 1.0
  - 1.0
  - 1.0
  PROPOSAL_GENERATOR:
    HID_CHANNELS: 512
    MIN_SIZE: 0
    NAME: RPN
  RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    - false
    - false
    - false
    - false
    DEPTH: 101
    NORM: FrozenBN
    NUM_GROUPS: 1
    OUT_FEATURES:
    - res4
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: true
    WIDTH_PER_GROUP: 64
  RETINANET:
    BBOX_REG_WEIGHTS:
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    FOCAL_LOSS_ALPHA: 0.25
    FOCAL_LOSS_GAMMA: 2.0
    IN_FEATURES:
    - p3
    - p4
    - p5
    - p6
    - p7
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.4
    - 0.5
    NMS_THRESH_TEST: 0.5
    NUM_CLASSES: 80
    NUM_CONVS: 4
    PRIOR_PROB: 0.01
    SCORE_THRESH_TEST: 0.05
    SMOOTH_L1_LOSS_BETA: 0.1
    TOPK_CANDIDATES_TEST: 1000
  ROI_BOX_CASCADE_HEAD:
    BBOX_REG_WEIGHTS:
    - - 10.0
      - 10.0
      - 5.0
      - 5.0
    - - 20.0
      - 20.0
      - 10.0
      - 10.0
    - - 30.0
      - 30.0
      - 15.0
      - 15.0
    IOUS:
    - 0.5
    - 0.6
    - 0.7
  ROI_BOX_HEAD:
    ATTR: true
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS:
    - 10.0
    - 10.0
    - 5.0
    - 5.0
    CLS_AGNOSTIC_BBOX_REG: false
    CONV_DIM: 256
    FC_DIM: 1024
    NAME: ''
    NORM: ''
    NUM_ATTRS: 400
    NUM_CONV: 0
    NUM_FC: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIPool
    RES5HALVE: false
    SMOOTH_L1_BETA: 0.0
    TRAIN_ON_PRED_BOXES: false
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    IN_FEATURES:
    - res4
    IOU_LABELS:
    - 0
    - 1
    IOU_THRESHOLDS:
    - 0.5
    NAME: Res5ROIHeads
    NMS_THRESH_TEST: 0.6
    NUM_CLASSES: 1600
    POSITIVE_FRACTION: 0.25
    PROPOSAL_APPEND_GT: true
    SCORE_THRESH_TEST: 0.2
  ROI_KEYPOINT_HEAD:
    CONV_DIMS:
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    LOSS_WEIGHT: 1.0
    MIN_KEYPOINTS_PER_IMAGE: 1
    NAME: KRCNNConvDeconvUpsampleHead
    NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
    NUM_KEYPOINTS: 17
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  ROI_MASK_HEAD:
    CLS_AGNOSTIC_MASK: false
    CONV_DIM: 256
    NAME: MaskRCNNConvUpsampleHead
    NORM: ''
    NUM_CONV: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  RPN:
    BATCH_SIZE_PER_IMAGE: 256
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS:
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    BOUNDARY_THRESH: -1
    HEAD_NAME: StandardRPNHead
    IN_FEATURES:
    - res4
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.3
    - 0.7
    LOSS_WEIGHT: 1.0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOPK_TEST: 300
    POST_NMS_TOPK_TRAIN: 2000
    PRE_NMS_TOPK_TEST: 6000
    PRE_NMS_TOPK_TRAIN: 12000
    SMOOTH_L1_BETA: 0.0
  SEM_SEG_HEAD:
    COMMON_STRIDE: 4
    CONVS_DIM: 128
    IGNORE_VALUE: 255
    IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    LOSS_WEIGHT: 1.0
    NAME: SemSegFPNHead
    NORM: GN
    NUM_CLASSES: 54
  WEIGHTS: weights/faster_rcnn_from_caffe_attr_original.pkl
OUTPUT_DIR: ./output

How can i generate label.tsv?

❓ How to use Detectron2

Questions like:

I followed GETTING_STARTED.md. But it doesn't make a result file which include object's vector and label.

I want to get them. In this case, what can I do?

How do you train this model?

Could you maybe elaborate (i) which files you need to train a bottom-up attention model from scratch, i.e: do you need to convert VG to the COCO format and (ii) how do you train the model with said files.
Thanks!

Getting weights of attribute head from Caffe model

Hi,

Thanks for sharing the weights converted from the Caffe model. But it seems that the weights of the attribute head are removed. Is there way to convert the weights of the attribute head from Caffe model also?

Really appreciate your advice.

Thanks!

Feature extraction demo (with attributes) not working ... ValueError: not enough values to unpack (expected 3, got 2)

Hi,

Thanks for sharing your job. I'm interested in using this approach to extract features with attributes. I have installed Detectron's 2 v0.1.1 (around March 2020), PyTorch1.4 and I'm using faster_rcnn_R_101_C4_attr_caffemaxpool.yaml configuration file with http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe_attr.pkl but I got above error when trying to run the demo.

I don't understand why predictor.model.roi_heads.box_predictor(feature_pooled) is returning a tuple of 2 tensors. It seems the pred_attr_logits are not returned.

Sorry if I'm missing some basic point. I'm new on the Computer Vision domain.

If interested please see related Colab Notebook here

Thanks!

Integration with LXMERT

If I want to use this repo to extract RCNN image features to train LXMERT, how can I do that? Do I just dump the features from

# Show the boxes, labels, and features
pred = instances.to('cpu')
v = Visualizer(im[:, :, :], MetadataCatalog.get("vg"), scale=1.2)
v = v.draw_instance_predictions(pred)
showarray(v.get_image()[:, :, ::-1])
print('instances:\n', instances)
print()
print('boxes:\n', instances.pred_boxes)
print()
print('Shape of features:\n', features.shape)

(from https://github.com/airsplay/py-bottom-up-attention/blob/master/demo/demo_feature_extraction_attr.ipynb)

into a .tsv file?

Btw, what is the difference between with and without attributes? Thanks!

Requires AMD ROCM to work (no hipcc found), merged from here into fb/detectron2 as solution

Problem

Tried Installing on a Linux GPU V100 cuda 10.1 machine with GLIBC 2.17 , This repo requires AMD ROCM (no hipcc found was the gcc error). AMD ROCM needs multiple dependencies which need GLIBC 2.22.

In the most recent detectron2 setup.py They have made ROCM optional.

Solution

I ported the changes in this repo to most recent detectron2 master (dated 21st June 2020, 0de7e8c) at here https://github.com/faizanahemad/detectron2.git . This removes ROCM need and also updates Detectron2. See demo/demo_caffe_frcnn_feature_extraction_attr.py or demo/demo_caffe_frcnn_feature_extraction_attr.ipynb for feature extraction for LXMERT.

Install as python -m pip install 'git+https://github.com/faizanahemad/detectron2.git'

Results different from the original bottom up attention

Since Bottom Up Attention only provides GT Boxes and features, I run the model it provides to get the object category and attribute category of GT Boxes.Here is an example.
image
When I used the pre-training model you provided to extract the object categories and attribute categories of this image for given Gt-boxes, I found that some categories were different from those obtained in the original bottom up attention.
image
This picture is from MSCOCO2014/VAL2014/COCO_val2014_00000039185.jpg. If you have time, you can verify it. I would like to know why the two are different.

How do I extract the features using an FPN based model?

❓ How to use Detectron2

Questions like:

  1. How to do extract the features with detectron2 as shown in your demo? I am using this config
    I have a pretrained model (trained on custom data) and would like to be able to extract the features of the bounding boxes.
    I keep getting
    AttributeError: 'StandardROIHeads' object has no attribute '_shared_roi_transform'

I apologized if the answer is obvious, I am very new to object detection.

Thank you!

EDIT

I am confused, I have detectron2 installed. Can I install this onto of my current installation and use it? I ask because it looks like there is a fork of detectron2 in this package which is different from the original detectron2. If I train a model using vanilla detectron2 and install this, can I just load the model weights, and extract the features from there?

Config '../configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml' has no VERSION. Assuming it to be compatible with latest v2.

Traceback (most recent call last):
File "/Users/mingyang.mmy/Documents/project/multimodel/detect/py-bottom-up-attention/demo/demo_feature_extraction.py", line 47, in
cfg.merge_from_file("../configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml")
File "/Users/mingyang.mmy/opt/anaconda3/envs/python3.7/lib/python3.7/site-packages/detectron2/config/config.py", line 49, in merge_from_file
self.merge_from_other_cfg(loaded_cfg)
File "/Users/mingyang.mmy/opt/anaconda3/envs/python3.7/lib/python3.7/site-packages/fvcore/common/config.py", line 120, in merge_from_other_cfg
return super().merge_from_other_cfg(cfg_other)
File "/Users/mingyang.mmy/opt/anaconda3/envs/python3.7/lib/python3.7/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
_merge_a_into_b(cfg_other, self, self, [])
File "/Users/mingyang.mmy/opt/anaconda3/envs/python3.7/lib/python3.7/site-packages/yacs/config.py", line 478, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/Users/mingyang.mmy/opt/anaconda3/envs/python3.7/lib/python3.7/site-packages/yacs/config.py", line 478, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/Users/mingyang.mmy/opt/anaconda3/envs/python3.7/lib/python3.7/site-packages/yacs/config.py", line 491, in _merge_a_into_b
raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.PROPOSAL_GENERATOR.HID_CHANNELS'
WARNING [08/18 16:15:39 d2.config.compat]: Config '../configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml' has no VERSION. Assuming it to be compatible with latest v2.

Process finished with exit code 1

how can i solve it without changing the detectron2 version

No such file or directory: 'data/genome/1600-400-20/objects_vocab.txt'?where is the data?need I download?

If you do not know the root cause of the problem / bug, and wish someone to help you, please
post according to this template:

Instructions To Reproduce the Issue

  1. what changes you made (git diff) or what code you wrote
<put diff or code here>
  1. what exact command you run:
  2. what you observed (including the full logs):
<put logs here>
  1. please also simplify the steps as much as possible so they do not require additional resources to
    run, such as a private dataset.

Expected behavior

If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.

If you expect the model to converge / work better, note that we do not give suggestions
on how to train your model.
Only in one of the two conditions we will help with it:
(1) You're unable to reproduce the results in detectron2 model zoo.
(2) It indicates a detectron2 bug.

Environment

Please paste the output of python -m detectron2.utils.collect_env,
or use python detectron2/utils/collect_env.py
if detectron2 hasn't been successfully installed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.