facebookresearch / grid-feats-vqa Goto Github PK

View Code? Open in Web Editor NEW

268.0 10.0 48.0 46 KB

Grid features pre-training code for visual question answering

Home Page: https://arxiv.org/abs/2001.03615

License: Apache License 2.0

Python 100.00%

grid-feats-vqa's Introduction

In Defense of Grid Features for Visual Question Answering

Grid Feature Pre-Training Code

This is a feature pre-training code release of the paper:

@InProceedings{jiang2020defense,
  title={In Defense of Grid Features for Visual Question Answering},
  author={Jiang, Huaizu and Misra, Ishan and Rohrbach, Marcus and Learned-Miller, Erik and Chen, Xinlei},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

For more sustained maintenance, we release code using Detectron2 instead of mask-rcnn-benchmark which the original code is based on. The current repository should reproduce the results reported in the paper, e.g., reporting ~72.5 single model VQA score for a X-101 backbone paired with MCAN-large.

Installation

Install Detectron 2 following INSTALL.md. Since Detectron 2 is also being actively updated which can result in breaking behaviors, it is highly recommended to install via the following command:

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git@ffff8ac'

Commits before or after ffff8ac might also work, but it could be risky. Then clone this repository:

git clone [email protected]:facebookresearch/grid-feats-vqa.git
cd grid-feats-vqa

Data

Visual Genome train+val splits released from the bottom-up-attention code are used for pre-training, and test split is used for evaluating detection performance. All of them are prepared in COCO format but include an additional field for attribute prediction. We provide the .json files here which can be directly loaded by Detectron2. Same as in Detectron2, the expected dataset structure under the DETECTRON2_DATASETS (default is ./datasets relative to your current working directory) folder should be:

visual_genome/
  annotations/
    visual_genome_{train,val,test}.json
  images/
    # visual genome images (~108K)

Training

Once the dataset is setup, to train a model, run (by default we use 8 GPUs):

python train_net.py --num-gpus 8 --config-file <config.yaml>

For example, to launch grid-feature pre-training with ResNet-50 backbone on 8 GPUs, one should execute:

python train_net.py --num-gpus 8 --config-file configs/R-50-grid.yaml

The final model by default should be saved under ./output of your current working directory once it is done training. We also provide the region-feature pre-training configuration configs/R-50-updn.yaml for reference. Note that we use 0.2 attribute loss (MODEL.ROI_ATTRIBUTE_HEAD.LOSS_WEIGHT = 0.2), which is better for down-stream tasks like VQA per our analysis.

We also release the configuration (configs/R-50-updn.yaml) for training the region features described in bottom-up-attention paper, which is a faithful re-implementation of the original one in Detectron2.

Feature Extraction

Grid feature extraction can be done by simply running once the model is trained (or you can directly download our pre-trained models, see below):

python extract_grid_feature.py -config-file configs/R-50-grid.yaml --dataset <dataset>

and the code will load the final model from cfg.OUTPUT_DIR (which one can override in command line) and start extracting features for <dataset>, we provide three options for the dataset: coco_2014_train, coco_2014_val and coco_2015_test, they correspond to train, val and test splits of the VQA dataset. The extracted features can be conveniently loaded in Pythia.

To extract features on your customized dataset, you may want to dump the image information into COCO .json format, and add the dataset information to use extract_grid_feature.py, or you can hack extract_grid_feature.py and directly loop over images.

Pre-Trained Models and Features

We release several pre-trained models for grid features: one with R-50 backbone, one with X-101, one with X-152, and one with additional improvements used for the 2020 VQA Challenge (see X-152-challenge.yaml). The models can be used directly to extract features. For your convenience, we also release the pre-extracted features for direct download.

Backbone	AP_50:95	Download
R-50	3.1	model \| metrics \| features
X-101	4.3	model \| metrics \| features
X-152	4.7	model \| metrics \| features
X-152++	3.7	model \| metrics \| features

License

The code is released under the Apache 2.0 license.

grid-feats-vqa's People

Contributors

Stargazers

Watchers

Forkers

dwtcourses deepylt vedanuj eustcpl taaccoo-beta onlyonewater ankitshah009 jeonsworld vinaypatil-ev kienduynguyen aabathur gyq716 arjun-mani clytze0216 zineos yichao96 peterzhousz seo-95 xhyandwyy rentainhe dachii-azm a7532ariel waizei wh-forker queekye nobelvictory chenxistephen syiswell nimo1989 wanboyang rainymoo nicher92 mymuli alice-cool fly2flies kelikeli hieunghia-pat yui010206 jqssss farisalasmary ayushsingh11 dlrook

grid-feats-vqa's Issues

Clarifying couple of impl details

Hello - not sure if this is the right place to do so, but just wanted to clarify a couple of implementation details in the paper:

Were grid features reshaped from H x W x 2048 to (H * W) x 2048 before input into Pythia? I would assume so, but just wanted to make sure.
Grid features are clearly variable size in the paper, yet you use the number 608 in Fig. 3 to compare to region features. For images of size less than 600 x 1000, do you zero pad?

Thanks a lot.

resnet 101 backbone

Thank you for your repo. How can I train on VG with a resnet 101 backbone like in bottom-up-attention?

How to apply it to the MCAN model

Thank you for sharing!Now That I want to apply the features extracted from this model to the MCAN source code, how do I replace them?
Looking forward to your reply. Thank you!!

train_net error:detectron2(version:0.1.1) RuntimeError

Not compiled with GPU support (ROIAlign_forward at /home/oliver/PycharmProjects/detectron2/detectron2/layers/csrc/ROIAlign/ROIAlign.h:73)
I have searched the error in the Internet but still can't slove the problem, and I am sure my cuda and the version of pytorch, etc. are right. Would you have any idea or suggestion to slove the problem? Thanks.

Grid Feature Size? (2048, 26, 19) vs (2048,25,19)

Hi, I am trying to run the grid+MCAN via MMF.
I extracted the grid features stored in .pth and each .pth has a size [2048, 26,19].
When I run the code, I mean a RuntimeError: The expanded size of the tensor (25) must match the existing size (26) at non-singleton dimension 1. Target sizes: [2048, 25, 19]. Tensor sizes: [2048, 26, 19]
Could you help me with that? Thank you!

The full traceback is attached.

Traceback (most recent call last):
File "/home/cc67459/MMF2/bin/mmf_run", line 33, in
sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')())
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 118, in run
nprocs=config.distributed.world_size,
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemo n, start_method='spawn')
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 158, in start_proce sses
while not context.join():
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 2 terminated with the following error:
Traceback (most recent call last):
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t fn(i, *args)
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", main(configuration, init_distributed=True, predict=p File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", trainer.train()
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/mmf self.training_loop()
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/core self.run_training_epoch()
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/core for batch in self.train_loader:
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t data = self._next_data()
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t return self._process_data(data)
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t data.reraise()
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker p Original Traceback (most recent call last):
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t data = fetcher.fetch(index)
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t return self.collate_fn(data)
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/common/batch sample_list = SampleList(batch)
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/common/sample self[field][idx] = self._get_data_copy(sample[field] RuntimeError: The expanded size of the tensor (25) must orch/multiprocessing/spawn.py", line 20, in wrap
line 66, in distributed_main
redict)
line 56, in main
trainer.py", line 108, in train
/training_loop.py", line 36, in training_loop
/training_loop.py", line 67, in run_training_epoch
orch/utils/data/dataloader.py", line 345, in next
orch/utils/data/dataloader.py", line 856, in _next_data
orch/utils/data/dataloader.py", line 881, in _process_da ta
orch/_utils.py", line 395, in reraise
rocess 0.
orch/utils/data/_utils/worker.py", line 178, in worker loop
orch/utils/data/utils/fetch.py", line 47, in fetch
collator.py", line 24, in call
.py", line 129, in init
)
match the existing size (26) at non-singleton dimension 1. Target sizes: [2048, 25, 19]. Tensor sizes: [2048, 26, 19]

Extracting grid features with smaller image size

Thanks for sharing the code!

I have a question when extracting the grid features: If I want to extract a fewer number of grid features (using a smaller image size like 448*448 which is mentioned in Table.4 of the paper), can I still use the provided pre-trained model or should I train a new model with smaller image sizes to ensure consistency?

scriptable code

I want to make a script of the code, with TorchScript. Since the detecron2 version that is used in this project is not scriptable, I tried to use the last version of D2. But I got an error:

TypeError: __init__() got an unexpected keyword argument 'train_on_pred_boxes'

@endernewton
What do you suggest?
Is there any way to use the new D2? or is there any other way for scripting?

images with no usable annotations

After downloading the json file, and running the training command, one line of the output is:

Removed 963 images with no usable annotations. 102114 images left.

Is it normal? Or some of the images are damaged?

Thank you!

unexpected behaviour of extract_region_feature.py

@endernewton
When using extract_region_feature.py in this PR: facebookresearch/grid-feats-vqa#3 to extract features of Hateful Memes dataset, no object's detected for some images (e.g. 01243.png). Is this an expected behaviour?

Check out this Colab notebook to see the details.

I used another model from Detectron2 and detected an object on the same image (01243.png)

TypeError: __init__() got an unexpected keyword argument 'train_on_pred_boxes'

unable to load model from a checkpoint

I am pretraining a model using visual genome dataset my training got interupted but the checkpoint is now created under "output" directory. How can I resume a training?
I'm using this code

python3.6 train_net.py --num-gpus 1 --config-file configs/R-50-grid.yaml OUTPUT_DIR output --resume

but it gives me an error

Command Line Args: Namespace(config_file='configs/R-50-grid.yaml', dist_url='tcp://127.0.0.1:50170', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['OUTPUT_DIR', 'output', '--resume'], resume=True)
Traceback (most recent call last):
File "train_net.py", line 127, in
args=(args,),
File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/launch.py", line 52, in launch
main_func(*args)
File "train_net.py", line 101, in main
cfg = setup(args)
File "train_net.py", line 94, in setup
cfg.merge_from_list(args.opts)
File "/usr/local/lib/python3.6/dist-packages/fvcore/common/config.py", line 135, in merge_from_list
return super().merge_from_list(cfg_list)
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 226, in merge_from_list
cfg_list
File "/usr/local/lib/python3.6/dist-packages/yacs/config.py", line 545, in _assert_with_logging
assert cond, msg
AssertionError: Override list has odd length: ['OUTPUT_DIR', 'output', '--resume']; it must be a list of pairs
$

R-50 VG weights give worse performance than Imagenet weights

I tried to plug in the R-50 pretrained model weights into a torchvision resnet50 model using a simple pattern matching script
However this does not seem to improve performance as expected and in fact performs worse than image net weights for image size 448 x 448 in an MCAN type model (custom implementation - where I replace LSTM with a BERT and dont use Self attention on top of language features)

Am I using the weights wrong?

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

running command
`python train_net.py --num-gpus 1 --config-file R-50.pth

`

Bad key "text.kerning_factor" on line 4 in
/data1/lyx/anaconda3/envs/detect-grid/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
https://github.com/matplotlib/matplotlib/blob/v3.1.3/matplotlibrc.template
or from the matplotlib source distribution
Command Line Args: Namespace(config_file='R-50.pth', dist_url='tcp://127.0.0.1:50171', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
Traceback (most recent call last):
File "train_net.py", line 98, in
args=(args,),
File "/data3/lyx/project/detectron2/detectron2/engine/launch.py", line 52, in launch
main_func(*args)
File "train_net.py", line 72, in main
cfg = setup(args)
File "train_net.py", line 64, in setup
cfg.merge_from_file(args.config_file)
File "/data3/lyx/project/detectron2/detectron2/config/config.py", line 26, in merge_from_file
loaded_cfg = _CfgNode.load_yaml_with_base(cfg_filename, allow_unsafe=allow_unsafe)
File "/data1/lyx/anaconda3/envs/detect-grid/lib/python3.6/site-packages/fvcore/common/config.py", line 51, in load_yaml_with_base
cfg = yaml.safe_load(f)
File "/data1/lyx/anaconda3/envs/detect-grid/lib/python3.6/site-packages/yaml/init.py", line 162, in safe_load
return load(stream, SafeLoader)
File "/data1/lyx/anaconda3/envs/detect-grid/lib/python3.6/site-packages/yaml/init.py", line 112, in load
loader = Loader(stream)
File "/data1/lyx/anaconda3/envs/detect-grid/lib/python3.6/site-packages/yaml/loader.py", line 34, in init
Reader.init(self, stream)
File "/data1/lyx/anaconda3/envs/detect-grid/lib/python3.6/site-packages/yaml/reader.py", line 85, in init
self.determine_encoding()
File "/data1/lyx/anaconda3/envs/detect-grid/lib/python3.6/site-packages/yaml/reader.py", line 124, in determine_encoding
self.update_raw()
File "/data1/lyx/anaconda3/envs/detect-grid/lib/python3.6/site-packages/yaml/reader.py", line 178, in update_raw
data = self.stream.read(size)
File "/data1/lyx/anaconda3/envs/detect-grid/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Did you try to training an end-to-end model?

Did you try to training an end-to-end model, by composing your pretrained CNN model and some SOTA VQA model? Is the performance of these kind of models worse than just using CNN as a feature extractor?

RuntimeError: Not compiled with GPU support

Hello,
I get ERROR the following as;
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/detectron2/engine/launch.py", line 84, in _distributed_worker
main_func(*args)
File "/media/vinh/DATA_4T/emotion_sketch/grid-feats-vqa-master/train_net.py", line 120, in main
return trainer.train()
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 380, in train
super().train(self.start_iter, self.max_iter)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/media/vinh/DATA_4T/emotion_sketch/grid-feats-vqa-master/train_net.py", line 73, in run_step
loss_dict = self.model(data)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 130, in forward
_, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/detectron2/modeling/roi_heads/roi_heads.py", line 555, in forward
losses = self._forward_box(features, proposals)
File "/media/vinh/DATA_4T/emotion_sketch/grid-feats-vqa-master/grid_feats/roi_heads.py", line 210, in _forward_box
box_features = self.box_pooler(features, [x.proposal_boxes for x in proposals])
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/detectron2/modeling/poolers.py", line 215, in forward
return self.level_poolers[0](x[0], pooler_fmt_boxes)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/detectron2/layers/roi_align.py", line 95, in forward
input, rois, self.output_size, self.spatial_scale, self.sampling_ratio, self.aligned
File "/home/vinh/anaconda3/envs/py37/lib/python3.7/site-packages/detectron2/layers/roi_align.py", line 20, in forward
input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned
RuntimeError: Not compiled with GPU support

Can you help?

ResNet-101

为什么没有ResNet101呢？

code on VQA and Caption

Thanks for the great work!
Will you release the downstream VQA and Caption code? If so, when?
Thank you.

How do you control input image size and feature number？

I am confused about the description of input size controlling in the paper,

Notably, input images are resized to have a maximum shorter side of 600 pixels (longest 1000) when keeping aspect ratio fixed.

My understanding is that images (from Visual Genome) are resized and padded (Is there a padding process?) to 600X1000 when training the detector. So I guess that, when training a VQA model, images (from MSCOCO) are also resized and padded to 600X1000. But there are two questions:

What is the specific procedure of resizing and padding？And is there actually two type image size，i.e. 600X1000 and 1000X600, since both of them will output 608 features and can be used by VQA model like MCAN? But when I read your code (extract_grid_feature.py), there is no sign of resizing and padding, just the normalization of pixel values.

I found that maximum image size in MSCOCO is 640X640, so do we actually enlarge every image in MSCOCO when training a VQA model using 608 features?

Grid+MCAN using MMF

Hi, I am trying to set up Grid+MCAN without using the MMF library because I would like to make it more flexible for future development: changing the dataset to the VizWiz dataset, changing GLOVE embeddings to BERT embeddings, and maybe changing the stacking strategy, etc.

Could you give me some guidance, or release the related code about setting up the grid+MCAN model reported in your paper without using MMF? Or do you think it is easier to make the changes using MMF?

Thank you!

Grid features extraction

1-) I have a confusion regarding grid feature extraction.
In region features we select top k bounding boxes i-e 36 or 100 depending upon the requirements.
Now grid features extract features based upon the Image size right? does it involve bounding boxes around the object as well along with the labels of object? Can you please explain that how we do the object detection/recognition part with these grid features on coco dataset?

2-) Also, I checked the tensor of one image feature file which displays [1, 2048, 19, 29]. Does it means we have total number of features 19*29=551. Can we select some top k features in it as well?

3-) lastly, does .pth features files only contains weights? or any other information as well. I am unable to read the complete file but only can see few weights

How to create annotations for vqa2/vizwiz dataset for grid features?

I need to create annotations file of vqa2/viwiz dataset for VQA task that you've mentioned in the paper. Please let me know how do we create the complete annotations for grid features.

how to change the command line to give path of pretrained model

I want to load a pretrained model X-152++ and calculate its grid-features. I put the pretrained model X-152pp.pth in xyz directory. How should I load it to calculate its features as I am not loading model from output directory.

How do I need to change this command line
python extract_grid_feature.py -config-file configs/X-152-grid.yaml --dataset

Also, when I write
CUDA_VISIBLE_DEVICES=0 python3.6 extract_grid_feature.py --config-file configs/X-152-grid.yaml --dataset coco_2015_test OUTPUT_DIR xyz

it downloads

[03/20 20:14:59 d2.checkpoint.catalog]: Catalog entry catalog://ImageNetPretrained/FAIR/X-152-32x8d-IN5k points to https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl

and later loads the model like this and starts calculating features for the dataset

[03/20 20:01:31 detectron2]: Full config saved to xyz/config.yaml
[03/20 20:01:31 d2.utils.env]: Using a generated random seed 31780998
[03/20 20:01:36 fvcore.common.checkpoint]: Loading checkpoint from catalog://ImageNetPretrained/FAIR/X-152-32x8d-IN5k
[03/20 20:01:36 d2.checkpoint.catalog]: Catalog entry catalog://ImageNetPretrained/FAIR/X-152-32x8d-IN5k points to https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl
[03/20 20:01:36 fvcore.common.file_io]: URL https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl cached in /home/ifrahmaqsood/.torch/fvcore_cache/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl

It doesn't take the model that is saved under xyz folder

could you help me, how can I overcome this problem

About the grid feature extraction

Dear professor,
I 'm really confused about the grid feature extraction procedure. As far as I'm concerned, there should be two methods to extract grid features, one is 14×14Pool, and the other is 1×1 pool. But what makes me confused is that the two kinds of pooling seem to just be for region features, however, the grid feature are directly extracted from Resnet's c1 to c5. I don't know how the two kinds of pooling influence the extraction of grid feature. Could you please solve my confusion please? Thanks a lot.

Issues with extract_region_feature.py

Hi, Thanks for the great work.

In the up-down model, the VG objects are extracted such that every image has 10-100 boxes (called adaptive features). Yet, I found no similar constraints here and the output VG boxes are normally less than 10. (I tried images from open images)

Any thoughts on this?

The effect of the result of OD

Hi, can you

Pretrained weight for BUTD model

Hello, thanks for sharing your implementations.

Could you please also share the weights for the pretrained BUTD model trained on visual genome?

dimension of grid features

can you please tell me what is the dimension of each grid feature? you have mentioned the total number of features only for each image 608 .

区域特征yaml文件下载

I whan to extract the region features but can not find a file of X-152-region.yaml or X-101-region.yaml. Can you share me the related fiels.

Pretrained model not generating correct detections

Hi, I'm trying to load the weights to use in a detectron model. I installed as suggested on the README and using the following code:

import sys sys.path.insert(0, "../grid-feats-vqa/") import matplotlib.pyplot as plt from PIL import Image import torch import torchvision.transforms as T from extract_grid_feature import setup, extract_grid_feature_argument_parser from detectron2.checkpoint import DetectionCheckpointer from detectron2.modeling import build_model from detectron2.utils.visualizer import Visualizer %matplotlib inline args = extract_grid_feature_argument_parser().parse_args(['--config-file', '../grid-feats-vqa/configs/R-50-grid.yaml']) cfg = setup(args) cfg.defrost() cfg.MODEL.WEIGHTS = "../data/R-50.pth" cfg.MODEL.DEVICE = 'cpu' cfg.freeze() model = build_model(cfg) DetectionCheckpointer(model).resume_or_load(cfg.MODEL.WEIGHTS, resume=True); path_to_coco_img = "/Users/sebamenabar/Documents/datasets/VQA/val2014/COCO_val2014_000000000136.jpg" img = Image.open(path_to_coco_img).convert('RGB') model.eval() with torch.no_grad(): outputs = model([{"image": T.functional.to_tensor(img)}]) v_gt = Visualizer(img, None) v_gt = v_gt.overlay_instances(boxes=outputs[0]["instances"].pred_boxes) fig, ax = plt.subplots(figsize=(10, 10)) ax.imshow(v_gt.get_image()) plt.show()

But this generates the following result:

Which does not look like a correct detection.
Any chance you could help me with this?

How to extract grid features on customized images?

Hi~
I have some customized images and I want to extract their grid features. But I don't know how to do that with code.
Could you tell me?
Thank you!

Learning rate value causes Infinite/nan loss value

Hi,
first of all, thank you for sharing the code to reproduce your results. I was trying to train the model from scratch on VG using your configuration R-50-updn.yaml and I have found a potential problem. With the learning rate of 0.02 you have in your configs (both base grids and base bottom-up) the training script produces infinite/nan loss values causing the training to diverge.
The training works after decreasing the learning rate to 0.002.
I was wondering if there is something I am missing.
Thank you!

Flickr30K Grid feature

May I ask if you have processed the Flick30k dataset, the direct use of extract_region_feature.py does not work well. Are there any data preprocessing requirements?

Thanks

TypeError: __init__() got an unexpected keyword argument 'train_on_pred_boxes'

Hi, thank you very much for your open source.
I found a problem while executing your code.
I am executing, python train_net.py --config-file configs/R-50-grid.yaml
it exploded the following error.

Traceback (most recent call last): File "train_net.py", line 92, in <module> launch( File "/home/team/xiaonan/xiaonan_venv/detectron2/detectron2/engine/launch.py", line 57, in launch main_func(*args) File "train_net.py", line 84, in main trainer = Trainer(cfg) File "/home/team/xiaonan/xiaonan_venv/detectron2/detectron2/engine/defaults.py", line 274, in __init__ model = self.build_model(cfg) File "/home/team/xiaonan/xiaonan_venv/detectron2/detectron2/engine/defaults.py", line 419, in build_model model = build_model(cfg) File "/home/team/xiaonan/xiaonan_venv/detectron2/detectron2/modeling/meta_arch/build.py", line 21, in build_model model = META_ARCH_REGISTRY.get(meta_arch)(cfg) File "/home/team/xiaonan/xiaonan_venv/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 35, in __init__ self.roi_heads = build_roi_heads(cfg, self.backbone.output_shape()) File "/home/team/xiaonan/xiaonan_venv/detectron2/detectron2/modeling/roi_heads/roi_heads.py", line 43, in build_roi_heads return ROI_HEADS_REGISTRY.get(name)(cfg, input_shape) File "/home/team/xiaonan/grid_vqa/grid_feats/roi_heads.py", line 179, in __init__ super(StandardROIHeads, self).__init__(cfg, input_shape) File "/home/team/xiaonan/xiaonan_venv/detectron2/detectron2/config/config.py", line 152, in wrapped init_func(self, **explicit_args) TypeError: __init__() got an unexpected keyword argument 'train_on_pred_boxes'

I don't understand whether it is an internal error in detector2 or the error in the configuration file above.

I run extract_grid_feature.py and loop over my dataset but got most 0 in outputs

i have no idea to figure that

How to Calculate grid features from given pretrained X152pp model

I am trying to extract the features for test split of Coco dataset. I used the pretrained model of Backbone X-152++

to extract features I used
python3.6 extract_grid_feature.py --config-file configs/X-152-challenge.yaml --dataset coco_2015_test MODEL.WEIGHTS /home/grid-feats-vqa-master/output/X-152pp.pth

However, when I matched the calculated features with https://github.com/facebookresearch/mmf features for movie_mcan model. I found a huge file size gap between this repository features and MMF's

i-e

The test split image features for X-152pp

The one I calculated myself is of size
3.9M 1024.pth

The one I downloaded from mmf for Movie_MCAN is of size
8.1M 1024.pth

the features that I calculated for the whole test split of vqa2 for X-152pp are of size approx. 320gb
please let me know how there could be huge file size difference

WARNING: Did not find branch or tag 'ffff8ac', assuming revision or ref.

I try to follow the command "python -m pip install 'git+https://github.com/facebookresearch/detectron2.git@ffff8ac'" to install detectron2, but find "WARNING: Did not find branch or tag 'ffff8ac', assuming revision or ref."

Saving attributes, their scores and setting parameter to extract K regions

Hello @endernewton and @vedanuj,

I was curious if you are planning to adjust the code for storing attributes and their scores as well. I am talking about the pull request #3.
As far as I can see, at the moment the code does not output attribute probabilities and attributes themselves.

Also,

maybe worth adding an option of "K" which stands for extracting the top K region features, I think this is useful for a lot of papers that choose top 36, or varying size from 36-100 (maybe another option is needed there)

is there a quick way to set the K number of extracted regions to the desired parameter? In particular, 36, as it says in the comment.

Thank you.

How to extract region feature?

I didn't find the extract_region_feature.py in the repo and I found someone has questions about the python file in issues. Now, I have need to extract region feature after training the model on 'R-50-updn.yaml' and don't know how to do it. Could anyone help me solve the problem? Thanks.

[Help] Can I use the pretrained models for Region Features

@endernewton
Hi I landed here while searching for a pre-trained object detection model , trained on VisualGenome using detectron2.
Can I use the models you have listed and get features from say the FC layers of the box head?
What I need actually is to extract region features of objects as done in most VQA models like Pythia

About the 2 GPUs

If we pretrain the ResNet-101 model on a 2 GPUs server with the open-source code, do I need to adjust the batchsize?
If it is still 16, as it is written in the Base-RCNN-grid.yaml, will it affect the performance?