Code Monkey home page Code Monkey logo

mcan-vqa's People

Contributors

cuiyuhao1996 avatar mil-vlg avatar paradoxzw avatar yuzcccc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mcan-vqa's Issues

single question test

I am trying to do a single question test, so I reduce the test_question.json to one picture and one question, and I encounter this problem:
RuntimeError: Error(s) in loading state_dict for Net: size mismatch for embedding.weight: copying a param with shape torch.Size([20572, 300]) from checkpoint, the shape in current model is torch.Size([18405, 300]).

Do you know how to fix this ? or the test_question.json is just stable which I cannot modify the scale of the pictures and the questions?

Can you give me some suggestions about how to test single question in single picture ?

Thanks a lot.

Imbalance of GPU comsumption.

Hi! I have ran into a problem that when I running the model, the GPU-0 always takes large memory,while other GPUs take less memory, any suggestion?
Hope for your respond.

Visualizations of the learned attention maps

Hi! MCAN team
Thanks for your sharing

I want to replicate the visualization of the learned attention maps in Figure7
Can you post the code you used to extract and create the learned attention maps?
Looking forward to your reply !

Best

question regarding test split

Hello! Thanks for sharing your code and brilliant work.
I'd like to ask about the evaluation on test-std and test-dev. Is there any way to know the number of epochs needed for training, since evaluation is not available. I've seen that in your case, you used the same number of training epochs (13). But I assume that since the training data is largely increased (since for evaluation on test set requires training on 'train+val+vg' sets), the number of epochs required for convergence will also increase. Or do you evaluate a number of epochs on the online server to see which performs better?

Thanks for sharing your suggestion.

Regards

help!

what can i do?
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 90789960 (char 90789959)

log file

Hello and thanks for your code.
May you please share your log file for validation split (results/log/log_run_small.txt)? I am doing some minor changes in your code, and therefore it would be helpful to compare each epoch. The results are strange. Some minor changes I did degraded the accuracy. It started at 48% reached 60% at the 9th epoch. I don't assume it will reach 67% in a few more epochs, as it increases very slowly.

Thanks

How long did you take on training with ssd?

  • How long did you take on training with ssd?
    I spent ~2 hours every epoch with default setting on hhd disk. I have set GPU=2, BS=256, CPU=10, num_worker=15 to accelerate I/O, however I don't think it's fast. Could you provide some suggestions to accelerate I/O?

  • Do you try to load all *.npz files into memory? I think it may be faster.

Thank you!

Bert encoding

Hi,

Did you also use bert encodings in your experiments? Do you plan to release the final model config that you used for the challenge?

the problem about result jons file

Hi,thank for your opening code and excellent work.
i have a problem when i upload the json file which is the ans to the test2015 to the EVA AI it gets false and the error info as fellow:

'Results do not correspond to current VQA set. Either the results do have predictions for all question ids in annotation file or there is one/more questions id that does not belong to the question ids in the annotation file.'

Have you met the problem before?

about val score

Thank you very much for sharing your work!
I have a question.
I train the model in train_mode='train' and train_split='train', other settings are not change.
but the best score is no more than 65% in the val dataset.
In your work, the score is 66~67% in the val dataset.
What should i do for my confusion?
Thanks again!

# of params for the small model

Your small model (6-Layer) has 57,812,491 parameters. However, the paper reports 56M for it (Table 1b). What does bring this dicrepancy?

p=0
for k,v in data['state_dict'].items():
    p+=functools.reduce(operator.mul, v.size(), 1)
>>> p
57812491

linear fusion model

Thank you for sharing.I would like to ask if you have tried to change the linear multimodal fusion model, does it affect the accuracy?Looking forward to your reply. Thanks a lot!!

Co-Attention?

The MCAN paper suggests that SGA (i.e. a guided attention module) is only used for question-guided attention over image content, but not the other way around (image-guided attention over question content). Could the authors please explain why they call this "CO-"attention even though there's no image-guided attention over question content? Or did I misunderstand the paper?

Greatly appreciate a response!

Feature file download failed

During the feature file download process, the speed is normal at the beginning but the connection is disconnected during the process. And it can't be reconnected.Could you please provide a download address suitable for Chinese users? For example, the download address of Baidu Netdisk.In addition, since I have not used it before, installing spacy in anaconda has also caused some problems.
特征文件下载过程中,开始时速度正常但是过程中会断开连接。而且无法重连。能否请您提供一个适合**用户使用的下载地址?比如百度网盘的下载地址。另外,由于我之前没有使用过,在anaconda中安装spacy也出了些问题。

Code for learned image and question attentions

Hey,

in your paper you showed some examples of learned image and question attentions (Figure 8).
I want to replicate these examples if possible.
Can you post the code you used to extract and create the learned image and question attentions?

Best,
Karol

Overfitting on val dataset

Hi, i have one question.
I train the model, details as follow: model='small', train_split='train+val', 13 epoches.
The val acc is overall=84.18, yes/no=97.67, num=71.03, other=77.39.
Is it overfitting on the val dataset?
I tried to add dropout rate to 0.5, it works(val: overall=72.31),
but the test score is overall=66.95 (the same implements).
What should i do for this problem?
Thank you!

VQA CPv2

你好,我想问一下MCAN有在VQA CPv2上测试过性能吗?

improved results in val set but decrease results in online evaluation

Thanks for providing codes of this interesting project. I followed your approach where I trained the network with default hyper-parameters settings (python3 run.py --RUN='train'). During validation the model is performing well where I used

python3 run.py --RUN='val' --CKPT_PATH=str

But when I did online evaluation performance is not improving. I used

python3 run.py --RUN='test' --CKPT_PATH=str

to generate json for online evaluation. Am I missing something? or am I using correct split? Or for online evaluation (i.e. on test-dev and test-std) do we need to train the network with different split?

net.train() not called again after evaluation finishes

Hi!

From your code, I realized that after you run the eval function and start the next training epoch, you do not turn back to training mode by net.train(), as net.train() is called before the epoch loop. Shouldn't that affect since you are using dropout?

question about answer processing

Hi. thanks for your code.
May I please ask, in the case where multiple questions for the same image are given, why are you assigning a label <1 for each question, as in:

def get_score(occur):
    if occur == 0:
        return .0
    elif occur == 1:
        return .3
    elif occur == 2:
        return .6
    elif occur == 3:
        return .9
    else:
        return 1.

Why not assign 1 to each and treat it individually? Is it to avoid confusion in predicting the answer at test time?

size mismatch for embedding.weight

Thank you for the open source code!

I am trying to run the validation with the dataset you have provided, as well as the pretrained model "small".

I encounter this problem during evaluation of the validation set:

size mismatch for embedding.weight: copying a param with shape torch.Size([20572, 300]) from checkpoint, the shape in current model is torch.Size([14613, 300]).

Do you know why this happens and how to fix the difference in vocabulary size for the embedding?

Best,

Kayo

Features' file loading in the code

Hi,
I am confused regarding how you tackled the "bbox" information in the code. I can see loading the image features x from ".npz" file only.

Also, it is mentioned that we can work with grid features as well. Grid features' file with ".pth" extension only contains features/weights with tensor size [1, 2048, 19, 29](a sample feature file) and not any bounding box information, object detection etc. Then how can we cater those features without any such information.

Can I run code with CPU?

Hello and thanks for your code and share
When I tried to run your work, couldn't download SPACY==2.1.0, so I run it with SPACY==3.2.3 (last version), and thus I have to change the pipeline, from ‘‘en_vectors_web_lg-2.1.0” to “en_core_web_lg-3.2.0”, after that I change ‘‘ yaml_dict = yaml.safe_load(f)’’ that was "yaml_dict = yaml.load(f)" before.
And in the end, I face this error: “AssertionError: Torch not compiled with CUDA enabled”.
I tried with: ...GPU=="CPU” but it doesn't work either.
Please guide me, how to run with CPU (my GPU that doesn't support CUDA requirements), if possible.

box is xyxy or xywh?

Hello,

Thanks for your code. I was trying to plot rois (boxes) on the images. Can I ask what's the box format? e.g.: (x_min, y_min, x_max, y_max) or (x_min, y_min, w, h)?

Regards.

test-dev or test-std

May I ask the author, it seems that there is no channel to submit json files online now, how to do test-dev

pretrained frcnn and network

Hi, Thanks for your project and great work.
I am looking to run it on new images but using other pretrained faster-rcnn features like COCO giving wrong answer. Can you please provide pretrained faster-rcnn model and network to replicate.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.