milvlg / mcan-vqa Goto Github PK
View Code? Open in Web Editor NEWDeep Modular Co-Attention Networks for Visual Question Answering
License: Apache License 2.0
Deep Modular Co-Attention Networks for Visual Question Answering
License: Apache License 2.0
Hello and thanks for your code and share
When I tried to run your work, couldn't download SPACY==2.1.0, so I run it with SPACY==3.2.3 (last version), and thus I have to change the pipeline, from ‘‘en_vectors_web_lg-2.1.0” to “en_core_web_lg-3.2.0”, after that I change ‘‘ yaml_dict = yaml.safe_load(f)’’ that was "yaml_dict = yaml.load(f)" before.
And in the end, I face this error: “AssertionError: Torch not compiled with CUDA enabled”.
I tried with: ...GPU=="CPU” but it doesn't work either.
Please guide me, how to run with CPU (my GPU that doesn't support CUDA requirements), if possible.
Hi,
I am confused regarding how you tackled the "bbox" information in the code. I can see loading the image features x from ".npz" file only.
Also, it is mentioned that we can work with grid features as well. Grid features' file with ".pth" extension only contains features/weights with tensor size [1, 2048, 19, 29](a sample feature file) and not any bounding box information, object detection etc. Then how can we cater those features without any such information.
Thank you for sharing.I would like to ask if you have tried to change the linear multimodal fusion model, does it affect the accuracy?Looking forward to your reply. Thanks a lot!!
Your small model (6-Layer) has 57,812,491 parameters. However, the paper reports 56M for it (Table 1b). What does bring this dicrepancy?
p=0
for k,v in data['state_dict'].items():
p+=functools.reduce(operator.mul, v.size(), 1)
>>> p
57812491
Hello! Thanks for sharing your code and brilliant work.
I'd like to ask about the evaluation on test-std and test-dev. Is there any way to know the number of epochs needed for training, since evaluation is not available. I've seen that in your case, you used the same number of training epochs (13). But I assume that since the training data is largely increased (since for evaluation on test set requires training on 'train+val+vg' sets), the number of epochs required for convergence will also increase. Or do you evaluate a number of epochs on the online server to see which performs better?
Thanks for sharing your suggestion.
Regards
Hi! MCAN team
Thanks for your sharing
I want to replicate the visualization of the learned attention maps in Figure7
Can you post the code you used to extract and create the learned attention maps?
Looking forward to your reply !
Best
Hi! I have ran into a problem that when I running the model, the GPU-0 always takes large memory,while other GPUs take less memory, any suggestion?
Hope for your respond.
你好,我想问一下MCAN有在VQA CPv2上测试过性能吗?
Thank you for the open source code!
I am trying to run the validation with the dataset you have provided, as well as the pretrained model "small".
I encounter this problem during evaluation of the validation set:
size mismatch for embedding.weight: copying a param with shape torch.Size([20572, 300]) from checkpoint, the shape in current model is torch.Size([14613, 300]).
Do you know why this happens and how to fix the difference in vocabulary size for the embedding?
Best,
Kayo
what can i do?
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 90789960 (char 90789959)
Hi. thanks for your code.
May I please ask, in the case where multiple questions for the same image are given, why are you assigning a label <1 for each question, as in:
def get_score(occur):
if occur == 0:
return .0
elif occur == 1:
return .3
elif occur == 2:
return .6
elif occur == 3:
return .9
else:
return 1.
Why not assign 1 to each and treat it individually? Is it to avoid confusion in predicting the answer at test time?
Why our reproduced accuracy is so much higher than in the paper.
Overall Accuracy is: 81.02
Per Answer Type Accuracy is the following:
other : 73.65
yes/no : 95.62
number : 66.90
Can you please provide a source code through which you created the npz files for each image from tsv files?
Hello,
Thanks for your code. I was trying to plot rois (boxes) on the images. Can I ask what's the box format? e.g.: (x_min, y_min, x_max, y_max) or (x_min, y_min, w, h)?
Regards.
Hi,
Can you please provide a direct downloadable link to the bottomup-features, so that I can download it using "wget".
Thanks
Hi, i have one question.
I train the model, details as follow: model='small', train_split='train+val', 13 epoches.
The val acc is overall=84.18, yes/no=97.67, num=71.03, other=77.39.
Is it overfitting on the val dataset?
I tried to add dropout rate to 0.5, it works(val: overall=72.31),
but the test score is overall=66.95 (the same implements).
What should i do for this problem?
Thank you!
May I ask the author, it seems that there is no channel to submit json files online now, how to do test-dev
Thanks for providing codes of this interesting project. I followed your approach where I trained the network with default hyper-parameters settings (python3 run.py --RUN='train'). During validation the model is performing well where I used
python3 run.py --RUN='val' --CKPT_PATH=str
But when I did online evaluation performance is not improving. I used
python3 run.py --RUN='test' --CKPT_PATH=str
to generate json for online evaluation. Am I missing something? or am I using correct split? Or for online evaluation (i.e. on test-dev and test-std) do we need to train the network with different split?
Hello and thanks for your code.
May you please share your log file for validation split (results/log/log_run_small.txt
)? I am doing some minor changes in your code, and therefore it would be helpful to compare each epoch. The results are strange. Some minor changes I did degraded the accuracy. It started at 48% reached 60% at the 9th epoch. I don't assume it will reach 67% in a few more epochs, as it increases very slowly.
Thanks
Thanks for making your great work open-source!
Curious about why use only image guided-attention rather than both image and question guided-attention.
Frankly, both directions of attention are important.
Have you ever done some experiments about that?
Looking forward to your reply.
The MCAN paper suggests that SGA (i.e. a guided attention module) is only used for question-guided attention over image content, but not the other way around (image-guided attention over question content). Could the authors please explain why they call this "CO-"attention even though there's no image-guided attention over question content? Or did I misunderstand the paper?
Greatly appreciate a response!
I am trying to do a single question test, so I reduce the test_question.json to one picture and one question, and I encounter this problem:
RuntimeError: Error(s) in loading state_dict for Net: size mismatch for embedding.weight: copying a param with shape torch.Size([20572, 300]) from checkpoint, the shape in current model is torch.Size([18405, 300]).
Do you know how to fix this ? or the test_question.json is just stable which I cannot modify the scale of the pictures and the questions?
Can you give me some suggestions about how to test single question in single picture ?
Thanks a lot.
Hi,
Did you also use bert encodings in your experiments? Do you plan to release the final model config that you used for the challenge?
How long did you take on training with ssd?
I spent ~2 hours every epoch with default setting on hhd disk. I have set GPU=2, BS=256, CPU=10, num_worker=15 to accelerate I/O, however I don't think it's fast. Could you provide some suggestions to accelerate I/O?
Do you try to load all *.npz files into memory? I think it may be faster.
Thank you!
Hello and thank you for your code and share ,
When I execute wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_ vectors_web_lg-2.1.0.tar.gz command, I found that the web page doesn't exist, please guide me what I should do!
Hi!
From your code, I realized that after you run the eval
function and start the next training epoch, you do not turn back to training mode by net.train()
, as net.train()
is called before the epoch loop. Shouldn't that affect since you are using dropout?
Thank you very much for sharing your work!
I have a question.
I train the model in train_mode='train' and train_split='train', other settings are not change.
but the best score is no more than 65% in the val dataset.
In your work, the score is 66~67% in the val dataset.
What should i do for my confusion?
Thanks again!
Hi,thank for your opening code and excellent work.
i have a problem when i upload the json file which is the ans to the test2015 to the EVA AI it gets false and the error info as fellow:
'Results do not correspond to current VQA set. Either the results do have predictions for all question ids in annotation file or there is one/more questions id that does not belong to the question ids in the annotation file.'
Have you met the problem before?
Hey,
in your paper you showed some examples of learned image and question attentions (Figure 8).
I want to replicate these examples if possible.
Can you post the code you used to extract and create the learned image and question attentions?
Best,
Karol
Hi, thank you for sharing your code.
However, I have found that the bottom up feature you provided is different with the default bottom up feature?
Which is different with the up down model and BAN?
How can i use pretrained models for generation of answers to image and questions
Why does the answer dictionary need to be filtered with less than 8 occurrences?
Hi, Thanks for your project and great work.
I am looking to run it on new images but using other pretrained faster-rcnn features like COCO giving wrong answer. Can you please provide pretrained faster-rcnn model and network to replicate.
Thanks
During the feature file download process, the speed is normal at the beginning but the connection is disconnected during the process. And it can't be reconnected.Could you please provide a download address suitable for Chinese users? For example, the download address of Baidu Netdisk.In addition, since I have not used it before, installing spacy in anaconda has also caused some problems.
特征文件下载过程中,开始时速度正常但是过程中会断开连接。而且无法重连。能否请您提供一个适合**用户使用的下载地址?比如百度网盘的下载地址。另外,由于我之前没有使用过,在anaconda中安装spacy也出了些问题。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.