milvlg / mcan-vqa Goto Github PK

View Code? Open in Web Editor NEW

434.0 6.0 88.0 1.88 MB

Deep Modular Co-Attention Networks for Visual Question Answering

License: Apache License 2.0

Python 98.43% Shell 1.57%

visual-question-answering attention visual-reasoning

mcan-vqa's People

Contributors

Stargazers

Watchers

Forkers

kailiwu paradoxzw hshujuan nbgao aistudentsh cyhbrilliant ammieqi yuzcccc eustcpl jizecao cuiyuhao1996 jerrywisdom zawecha1 chandanmishra-03 shubhamagarwal92 tanwey rajsaurabh1303 saifsayed jialinwu17 uuuque yijunwu chwlsunny kayoyin luogen1996 d86518 dami23 arastogi1997 tgc1997 j-bing cczka gazelxu avpodtikhov michael-wzhu zerojuzi hopstone leedoyup taaccoo-beta swetha2410 cmfiltenborg gzcsudo ayush1801 mishra27 gwy-nk sailfish009 hemanth-s17 v-user1098new jojo23333 aaronhd hopeliu20160622 zhenwang23 harisahmad1881 zhangk1551 kevinqian97 ericdoug-qi chensyeric dorren002 b-matchlsr rentainhe ankitshah009 jeff52415 eminde raikon55 fangzheng354 esradonmez yfcodedream 2212221352 techthiyanes jingjingjing123 doublebc originofamonia rongfei-chen lily11223344 billiecn fineimew cjj2923 wheltz queekye malekijoo syyyyyw farisalasmary qianqian121 rilzob xxayt iq-scm lizi0408 codenewww seungbhin dunghuynhandy

mcan-vqa's Issues

Can I run code with CPU?

Hello and thanks for your code and share
When I tried to run your work, couldn't download SPACY==2.1.0, so I run it with SPACY==3.2.3 (last version), and thus I have to change the pipeline, from ‘‘en_vectors_web_lg-2.1.0” to “en_core_web_lg-3.2.0”, after that I change ‘‘ yaml_dict = yaml.safe_load(f)’’ that was "yaml_dict = yaml.load(f)" before.
And in the end, I face this error: “AssertionError: Torch not compiled with CUDA enabled”.
I tried with: ...GPU=="CPU” but it doesn't work either.
Please guide me, how to run with CPU (my GPU that doesn't support CUDA requirements), if possible.

Features' file loading in the code

Hi,
I am confused regarding how you tackled the "bbox" information in the code. I can see loading the image features x from ".npz" file only.

Also, it is mentioned that we can work with grid features as well. Grid features' file with ".pth" extension only contains features/weights with tensor size [1, 2048, 19, 29](a sample feature file) and not any bounding box information, object detection etc. Then how can we cater those features without any such information.

MCAN

mcan_encoder_decoder

linear fusion model

Thank you for sharing.I would like to ask if you have tried to change the linear multimodal fusion model, does it affect the accuracy?Looking forward to your reply. Thanks a lot!!

# of params for the small model

Your small model (6-Layer) has 57,812,491 parameters. However, the paper reports 56M for it (Table 1b). What does bring this dicrepancy?

p=0
for k,v in data['state_dict'].items():
    p+=functools.reduce(operator.mul, v.size(), 1)
>>> p
57812491

question regarding test split

Hello! Thanks for sharing your code and brilliant work.
I'd like to ask about the evaluation on test-std and test-dev. Is there any way to know the number of epochs needed for training, since evaluation is not available. I've seen that in your case, you used the same number of training epochs (13). But I assume that since the training data is largely increased (since for evaluation on test set requires training on 'train+val+vg' sets), the number of epochs required for convergence will also increase. Or do you evaluate a number of epochs on the online server to see which performs better?

Thanks for sharing your suggestion.

Regards

Visualizations of the learned attention maps

Hi! MCAN team
Thanks for your sharing

I want to replicate the visualization of the learned attention maps in Figure7
Can you post the code you used to extract and create the learned attention maps?
Looking forward to your reply !

Best

where is image feature about .npy?

Imbalance of GPU comsumption.

Hi! I have ran into a problem that when I running the model, the GPU-0 always takes large memory,while other GPUs take less memory, any suggestion?
Hope for your respond.

VQA CPv2

你好，我想问一下MCAN有在VQA CPv2上测试过性能吗？

size mismatch for embedding.weight

Thank you for the open source code!

I am trying to run the validation with the dataset you have provided, as well as the pretrained model "small".

I encounter this problem during evaluation of the validation set:

size mismatch for embedding.weight: copying a param with shape torch.Size([20572, 300]) from checkpoint, the shape in current model is torch.Size([14613, 300]).

Do you know why this happens and how to fix the difference in vocabulary size for the embedding?

Best,

Kayo

help!

what can i do?
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 90789960 (char 90789959)

question about answer processing

Hi. thanks for your code.
May I please ask, in the case where multiple questions for the same image are given, why are you assigning a label <1 for each question, as in:

def get_score(occur):
    if occur == 0:
        return .0
    elif occur == 1:
        return .3
    elif occur == 2:
        return .6
    elif occur == 3:
        return .9
    else:
        return 1.

Why not assign 1 to each and treat it individually? Is it to avoid confusion in predicting the answer at test time?

Why our reproduced accuracy is so much higher than in the paper.

Overall Accuracy is: 81.02

Per Answer Type Accuracy is the following:
other : 73.65
yes/no : 95.62
number : 66.90

How did you create "npz" files from tsv files

Can you please provide a source code through which you created the npz files for each image from tsv files?

box is xyxy or xywh?

Hello,

Thanks for your code. I was trying to plot rois (boxes) on the images. Can I ask what's the box format? e.g.: (x_min, y_min, x_max, y_max) or (x_min, y_min, w, h)?

Regards.

Direct link to the Bottom-up features

Hi,
Can you please provide a direct downloadable link to the bottomup-features, so that I can download it using "wget".
Thanks

Overfitting on val dataset

Hi, i have one question.
I train the model, details as follow: model='small', train_split='train+val', 13 epoches.
The val acc is overall=84.18, yes/no=97.67, num=71.03, other=77.39.
Is it overfitting on the val dataset?
I tried to add dropout rate to 0.5, it works(val: overall=72.31),
but the test score is overall=66.95 (the same implements).
What should i do for this problem?
Thank you!

test-dev or test-std

May I ask the author, it seems that there is no channel to submit json files online now, how to do test-dev

improved results in val set but decrease results in online evaluation

Thanks for providing codes of this interesting project. I followed your approach where I trained the network with default hyper-parameters settings (python3 run.py --RUN='train'). During validation the model is performing well where I used

python3 run.py --RUN='val' --CKPT_PATH=str

But when I did online evaluation performance is not improving. I used

python3 run.py --RUN='test' --CKPT_PATH=str

to generate json for online evaluation. Am I missing something? or am I using correct split? Or for online evaluation (i.e. on test-dev and test-std) do we need to train the network with different split?

log file

Hello and thanks for your code.
May you please share your log file for validation split (results/log/log_run_small.txt)? I am doing some minor changes in your code, and therefore it would be helpful to compare each epoch. The results are strange. Some minor changes I did degraded the accuracy. It started at 48% reached 60% at the 9th epoch. I don't assume it will reach 67% in a few more epochs, as it increases very slowly.

Thanks

Why use only image guided-attention rather than both image and question guided-attention?

Thanks for making your great work open-source!
Curious about why use only image guided-attention rather than both image and question guided-attention.
Frankly, both directions of attention are important.
Have you ever done some experiments about that?
Looking forward to your reply.

Co-Attention?

The MCAN paper suggests that SGA (i.e. a guided attention module) is only used for question-guided attention over image content, but not the other way around (image-guided attention over question content). Could the authors please explain why they call this "CO-"attention even though there's no image-guided attention over question content? Or did I misunderstand the paper?

Greatly appreciate a response!

single question test

I am trying to do a single question test, so I reduce the test_question.json to one picture and one question, and I encounter this problem:
RuntimeError: Error(s) in loading state_dict for Net: size mismatch for embedding.weight: copying a param with shape torch.Size([20572, 300]) from checkpoint, the shape in current model is torch.Size([18405, 300]).

Do you know how to fix this ? or the test_question.json is just stable which I cannot modify the scale of the pictures and the questions?

Can you give me some suggestions about how to test single question in single picture ?

Thanks a lot.

Bert encoding

Hi,

Did you also use bert encodings in your experiments? Do you plan to release the final model config that you used for the challenge?

How long did you take on training with ssd?

How long did you take on training with ssd?
I spent ~2 hours every epoch with default setting on hhd disk. I have set GPU=2, BS=256, CPU=10, num_worker=15 to accelerate I/O, however I don't think it's fast. Could you provide some suggestions to accelerate I/O?
Do you try to load all *.npz files into memory? I think it may be faster.

Thank you!

About initialize the GloVe

Hello and thank you for your code and share ,
When I execute wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_ vectors_web_lg-2.1.0.tar.gz command, I found that the web page doesn't exist, please guide me what I should do!

net.train() not called again after evaluation finishes

Hi!

From your code, I realized that after you run the eval function and start the next training epoch, you do not turn back to training mode by net.train(), as net.train() is called before the epoch loop. Shouldn't that affect since you are using dropout?

about val score

Thank you very much for sharing your work!
I have a question.
I train the model in train_mode='train' and train_split='train', other settings are not change.
but the best score is no more than 65% in the val dataset.
In your work, the score is 66~67% in the val dataset.
What should i do for my confusion?
Thanks again!

the problem about result jons file

Hi,thank for your opening code and excellent work.
i have a problem when i upload the json file which is the ans to the test2015 to the EVA AI it gets false and the error info as fellow:

'Results do not correspond to current VQA set. Either the results do have predictions for all question ids in annotation file or there is one/more questions id that does not belong to the question ids in the annotation file.'

Have you met the problem before?

Code for learned image and question attentions

Hey,

in your paper you showed some examples of learned image and question attentions (Figure 8).
I want to replicate these examples if possible.
Can you post the code you used to extract and create the learned image and question attentions?

Best,
Karol

Are the provided bottomup feature same as the it in the original repo?

Hi, thank you for sharing your code.
However, I have found that the bottom up feature you provided is different with the default bottom up feature?
Which is different with the up down model and BAN?

How much impovement does VG dataset bring?

How to use pretrained models

How can i use pretrained models for generation of answers to image and questions

Why does the answer dictionary need to be filtered with less than 8 occurrences

Why does the answer dictionary need to be filtered with less than 8 occurrences?

pretrained frcnn and network

Hi, Thanks for your project and great work.
I am looking to run it on new images but using other pretrained faster-rcnn features like COCO giving wrong answer. Can you please provide pretrained faster-rcnn model and network to replicate.

Thanks

Feature file download failed

During the feature file download process, the speed is normal at the beginning but the connection is disconnected during the process. And it can't be reconnected.Could you please provide a download address suitable for Chinese users? For example, the download address of Baidu Netdisk.In addition, since I have not used it before, installing spacy in anaconda has also caused some problems.
特征文件下载过程中，开始时速度正常但是过程中会断开连接。而且无法重连。能否请您提供一个适合**用户使用的下载地址？比如百度网盘的下载地址。另外，由于我之前没有使用过，在anaconda中安装spacy也出了些问题。