thaolmk54 / hcrn-videoqa Goto Github PK

Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

License: Apache License 2.0

Python 100.00%

tgif-qa videoqa question-answering vqa

hcrn-videoqa's People

Contributors

Stargazers

Watchers

hcrn-videoqa's Issues

Video Processing

This problem occurred when I modified the video path and tried to process the data. The path was modified, but why did the video read error?
Also, I want to know if the annotations file is under this link? The link is https://mega.nz/file/UnRnyb7A#es4XmqsLxl-B7MP0KAat9VibkH7J_qpKj9NcxLh8aHg

Training on TGIF-QA / FrameQA

Hi,

Thanks for your great work. I have no problem using the code for MSVD-QA / MSRVTT-QA / the 3 other tasks of TGIF-QA, but as I train on TGIF-QA for FrameQA subtask, the loss quickly becomes nan (after about 80% of the first epoch), and the accuracy is 0. Do you have an idea why it happens?

Pre-extracted features link not working

Hi,

I was trying to download the pre-extracted features through the link https://bit.ly/2TX9rlZ. But accessing the link gives me the error "We're sorry, but [email protected] can't be found in the deakin365-my.sharepoint.com directory. Please try again later, while we try to automatically fix this for you." ([email protected] is my email for Microsoft account). Is there anything I should do to fix this error? Or can you upload the features to google drive?

Thanks!

dataset

May I ask what method is used to call the dataset in your code?

Decoder problem

I'm sorry to trouble you again. When I was debugging, there was a decoding problem. I found some data but didn't solve it. Please give me some ideas and methods to solve it.

Low acc when validating msrvtt-qa dataset

Sorry for distubing you, i ran you code with msrvtt-qa but get a bad error which cause the acc of val to 0.
When i ran train.py, train_acc is great but val_acc is 0 in every epoch.
Then i found that with model.train(), it got a great acc. With model.eval(), the model always outputed , the output tensor never change whatever the inputs are.
How can i fix this problem?

Data error

I'm sorry to bother you again. After I downloaded your data of frame_qa and merged it into an h5 file, I got an unexpected error. The other task files are right. I am not sure whether the file is damaged due to my download error or the original file you uploaded is wrong. If possible, can you test whether your original file can be read using h5py. It will take me a whole day to re-download your data, thank you.

Question about the TGIFs dataset ?

Hello,

Thank you for your excellent work!

When I download the tgif-qa dataset, which includes approximately 124G of GIF files and some csv files with question and answer pairs, I find some gif_name in the csv files can't be found in the GIFs dataset. such as the
tumblr_nk172bbdPI1u1lr18o1_250 in the Test_action_question.csv.

Meanwhile, some tgif file can't be found in the csv file, such as the tumblr_l5zke1pg6r1qzzqaxo1_500.gif.

Have you ever had the same experience? Is there a solution here?
I downloaded the wrong data set ?

About the mutli-choice task in t-gif qa

Hello, thanks a lot for sharing your impressive work.

I notice you use the candidate answers information on MC tasks, in HCRNNetwork.forward
out = self.output_unit(question_embedding[batch_agg], q_visual_embedding[batch_agg], ans_candidates_embedding,a_visual_embedding).

So, I tried to use the candidate answer information to guide the visual_embedding. I change the code in HCRNNetwork.forward as follows:

ans_candidates_agg = ans_candidates.view(-1, ans_candidates.size(2))  
ans_candidates_len_agg = ans_candidates_len.view(-1)
batch_agg = np.reshape(
                np.tile(np.expand_dims(np.arange(batch_size), axis=1), [1, 5]), [-1])
ans_candidates_embedding = self.linguistic_input_unit(ans_candidates_agg, ans_candidates_len_agg)
ans_candidates_emb_mul = ans_candidates_embedding.view(batch_size, 5, -1).sum(1)
question_embedding = self.linguistic_input_unit(question, question_len)
visual_embedding = self.visual_input_unit(video_appearance_feat, video_motion_feat, 
             question_embedding+ans_candidates_emb_mul)
q_visual_embedding = self.feature_aggregation(question_embedding, visual_embedding)
a_visual_embedding = self.feature_aggregation(ans_candidates_embedding, visual_embedding[batch_agg])
out = self.output_unit(question_embedding[batch_agg], q_visual_embedding[batch_agg],
                           ans_candidates_embedding, a_visual_embedding).

I get the acurracy of 0.9380 and 0.9759 on the action and trasition task . I checked the loss fuction and acurracy evaluation function and I did not find any bug. Can you explain it?

How much will the hinge loss converge to?

Hi,

I was trying to use the model to train on another dataset. I found that the hinge loss for multi-choice problems finally converged to about 1.0. I wonder how much will the hinge loss finally be in your training process.

Thanks for your reply!

MSRVTT-QA dataset some videos loss

Hi, thanks for sharing the repo and your work. When I need to use the source video of MSRVTT-QA, I found some provided urls are invalid now. Could you share the source video of MSRVTT-QA? Thanks a lot.

A question of glove?

Hi,

This is your pre-processing question file. The file read in is glove.6B.300d.txt, but the download link you gave is glove.840B.300d.txt.
The file I trained data with was glove.6B.300d.txt last time, and did not use the link you gave.

Another way of downloading the tgif dataset

Hi,
Thank you for your work! I want to run the code, but when I download the dataset tgif by the links from the tsv file, it always fails. Do you have another way to download it? Like Google Drive or other cloud disks?

A question about dataset

I would like to ask you, in the extracted features, such a sentence is given.
.
Are the features you extracted only for action tasks?
And if I want to extract all the features, how long will it take to finish?
Thank you!

About the accuracy of tgif-qa

Hi，
I downloaded code、features、pre-trained models, but I got the accuracy of Count about 4.05/4.04/4.05 on test. When I train the model， I got 4.0639/4.0802/4.0599 on Count test and 0.7476/0.7454/0.7449 on Action test. I wonder if the parameters of configs/tgif_qa_xx.yml need to be adjusted, or I need do other settings.

epoch

您的代码为什么选用第25轮作为最终结果呢，在25轮时验证集的准确率明显下降，但是损失函数一直在降低

A problem of accuracy

Hi,
I re-downloaded all your files and trained them four times. I completely follow your readme file to train, but the accuracy of the action task is only about 73% . If you need to view the log file, I can send it to you. And I tried both in 1080ti and 2080ti and got the same result.

Motion model information

Hi,

Firstly let me appreciate your work. Your code is such an elegant one. Unlike other code, you provide the code to extract visual and text features so that it is easier for me to apply your method to my customize datasets and tasks.

Now I'm changing the feature extraction method to improve the performance in my tasks. But I don't know where the motion model ResNeXt-101 is from, which dataset it was pre-trained on and what is the accuracy. So I cannot campare this model with other models directly. If I try them one by one, it will be very time consuming. Could you please tell me some information?

Thanks a lot!

MSRVTT-QA, MSVD-QA dataset

hello, Thank you for your excellent work.
I would like to know if you are willing to provide preprocessed features of MSRVTT-QA and MSVD-QA datasets? I want to test your model on MSRVTT-QA, MSVD-QA dataset.

thaolmk54 / hcrn-videoqa Goto Github PK

hcrn-videoqa's People

Contributors

Stargazers

Watchers

Forkers

hcrn-videoqa's Issues

Recommend Projects

Recommend Topics

Recommend Org