thaolmk54 / hcrn-videoqa Goto Github PK
View Code? Open in Web Editor NEWImplementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
License: Apache License 2.0
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
License: Apache License 2.0
This problem occurred when I modified the video path and tried to process the data. The path was modified, but why did the video read error?
Also, I want to know if the annotations file is under this link? The link is https://mega.nz/file/UnRnyb7A#es4XmqsLxl-B7MP0KAat9VibkH7J_qpKj9NcxLh8aHg
Hi,
Thanks for your great work. I have no problem using the code for MSVD-QA / MSRVTT-QA / the 3 other tasks of TGIF-QA, but as I train on TGIF-QA for FrameQA subtask, the loss quickly becomes nan (after about 80% of the first epoch), and the accuracy is 0. Do you have an idea why it happens?
Hi,
I was trying to download the pre-extracted features through the link https://bit.ly/2TX9rlZ. But accessing the link gives me the error "We're sorry, but [email protected] can't be found in the deakin365-my.sharepoint.com directory. Please try again later, while we try to automatically fix this for you." ([email protected] is my email for Microsoft account). Is there anything I should do to fix this error? Or can you upload the features to google drive?
Thanks!
May I ask what method is used to call the dataset in your code?
Sorry for distubing you, i ran you code with msrvtt-qa but get a bad error which cause the acc of val to 0.
When i ran train.py, train_acc is great but val_acc is 0 in every epoch.
Then i found that with model.train(), it got a great acc. With model.eval(), the model always outputed , the output tensor never change whatever the inputs are.
How can i fix this problem?
I'm sorry to bother you again. After I downloaded your data of frame_qa and merged it into an h5 file, I got an unexpected error. The other task files are right. I am not sure whether the file is damaged due to my download error or the original file you uploaded is wrong. If possible, can you test whether your original file can be read using h5py. It will take me a whole day to re-download your data, thank you.
Hello,
Thank you for your excellent work!
When I download the tgif-qa dataset, which includes approximately 124G of GIF files and some csv files with question and answer pairs, I find some gif_name in the csv files can't be found in the GIFs dataset. such as the
tumblr_nk172bbdPI1u1lr18o1_250
in the Test_action_question.csv
.
Meanwhile, some tgif file can't be found in the csv file, such as the tumblr_l5zke1pg6r1qzzqaxo1_500.gif
.
Have you ever had the same experience? Is there a solution here?
I downloaded the wrong data set ?
Hello, thanks a lot for sharing your impressive work.
I notice you use the candidate answers information on MC tasks, in HCRNNetwork.forward
out = self.output_unit(question_embedding[batch_agg], q_visual_embedding[batch_agg], ans_candidates_embedding,a_visual_embedding)
.
So, I tried to use the candidate answer information to guide the visual_embedding. I change the code in HCRNNetwork.forward
as follows:
ans_candidates_agg = ans_candidates.view(-1, ans_candidates.size(2))
ans_candidates_len_agg = ans_candidates_len.view(-1)
batch_agg = np.reshape(
np.tile(np.expand_dims(np.arange(batch_size), axis=1), [1, 5]), [-1])
ans_candidates_embedding = self.linguistic_input_unit(ans_candidates_agg, ans_candidates_len_agg)
ans_candidates_emb_mul = ans_candidates_embedding.view(batch_size, 5, -1).sum(1)
question_embedding = self.linguistic_input_unit(question, question_len)
visual_embedding = self.visual_input_unit(video_appearance_feat, video_motion_feat,
question_embedding+ans_candidates_emb_mul)
q_visual_embedding = self.feature_aggregation(question_embedding, visual_embedding)
a_visual_embedding = self.feature_aggregation(ans_candidates_embedding, visual_embedding[batch_agg])
out = self.output_unit(question_embedding[batch_agg], q_visual_embedding[batch_agg],
ans_candidates_embedding, a_visual_embedding).
I get the acurracy of 0.9380 and 0.9759 on the action and trasition task . I checked the loss fuction and acurracy evaluation function and I did not find any bug. Can you explain it?
Hi,
I was trying to use the model to train on another dataset. I found that the hinge loss for multi-choice problems finally converged to about 1.0. I wonder how much will the hinge loss finally be in your training process.
Thanks for your reply!
Hi, thanks for sharing the repo and your work. When I need to use the source video of MSRVTT-QA, I found some provided urls are invalid now. Could you share the source video of MSRVTT-QA? Thanks a lot.
Hi,
This is your pre-processing question file. The file read in is glove.6B.300d.txt, but the download link you gave is glove.840B.300d.txt.
The file I trained data with was glove.6B.300d.txt last time, and did not use the link you gave.
Hi,
Thank you for your work! I want to run the code, but when I download the dataset tgif by the links from the tsv file, it always fails. Do you have another way to download it? Like Google Drive or other cloud disks?
Hi,
I downloaded code、features、pre-trained models, but I got the accuracy of Count about 4.05/4.04/4.05 on test. When I train the model, I got 4.0639/4.0802/4.0599 on Count test and 0.7476/0.7454/0.7449 on Action test. I wonder if the parameters of configs/tgif_qa_xx.yml need to be adjusted, or I need do other settings.
您的代码为什么选用第25轮作为最终结果呢,在25轮时验证集的准确率明显下降,但是损失函数一直在降低
Hi,
Firstly let me appreciate your work. Your code is such an elegant one. Unlike other code, you provide the code to extract visual and text features so that it is easier for me to apply your method to my customize datasets and tasks.
Now I'm changing the feature extraction method to improve the performance in my tasks. But I don't know where the motion model ResNeXt-101 is from, which dataset it was pre-trained on and what is the accuracy. So I cannot campare this model with other models directly. If I try them one by one, it will be very time consuming. Could you please tell me some information?
Thanks a lot!
hello, Thank you for your excellent work.
I would like to know if you are willing to provide preprocessed features of MSRVTT-QA and MSVD-QA datasets? I want to test your model on MSRVTT-QA, MSVD-QA dataset.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.