gewu-lab / music-avqa Goto Github PK
View Code? Open in Web Editor NEWMUSIC-AVQA, CVPR2022 (ORAL)
License: MIT License
MUSIC-AVQA, CVPR2022 (ORAL)
License: MIT License
In the following section of the code in dataloader_qa_grd_baseline.py
,
if len(question) < self.max_len:
n = self.max_len - len(question)
for i in range(n):
question.append('<pad>')
are you padding the question with extra tokens? and if so, did you not find a question with length more than 14? Rather I would like to ask what happens if question length is more than 14. Also, is it related to the 14*14 visual features.
Thanks for the awesome dataset as well as open sourcing everything!
I found that for many videos, when I extract the audio from the video using your script I get: cannot load MUSIC-AVQA-videos-Real/00001835.mp4
.
Is this something that also happened on your end? Also, could you please release the audio files separately?
Thanks again!
Thanks for your great work! I'm reading the code, but have some issues about it.
In net_avst.py, line 125, you assigned the audio_feat to audio_feat_pure, but the audio_feat haven't been changed before line 207. It seems that the "pure" variable doesn't work in fact. So it's just for indicating that the audio_feat used in Temporal Grounding Module is "pure"?
In a word, will the rename operation influence the gradient flow? I noticed that the shape of audio_feat_pure is [B, T, C], and the shape of audio_feat is [B*T, C]. But the pointers of them are the same (in line 206 of net_avst.py). Maybe I can use audio_feat in line 206 directly.
Hello, thank you for your great work, but I met some problems when I ran the code.
I used your "extract_14x14_feat.py" to extract 14x14 visual feature, but the size of extracted "0000xxxx.npy" is [4, 512, 14, 14]. Therefore, when I ran the "main_avst.py" file, we met the problem "IndexError: index 5 is out of bounds for dimension 0 with size 4".
I found the size of 'selected_image' in 'extract_14x14_feat.py' is [4, 3, 224, 224], how can I solve the problem and run the code successfully?
Thank you so much for your fantastic work!
I found a few videos shorter than 60s in your dataset. When using your frame extraction script to extract frames from a video in the 1fps manner, I could not get 60 frames, however, the shape of the corresponding audio feature was [60, 128] in vggish folder.
It would be so grateful if you let me know how to align the audio and frames from the same video.
您好,感谢您的出色工作,请问我在训练时出现这样一个错误是什么原因呢?谢谢!:
Traceback (most recent call last):
File "net_grd_avst/main_avst.py", line 275, in
main()
File "net_grd_avst/main_avst.py", line 258, in main
train(args, model, train_loader, optimizer, criterion, epoch=epoch)
File "net_grd_avst/main_avst.py", line 46, in train
for batch_idx, sample in enumerate(train_loader):
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 363, in next
data = self._next_data()
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 74, in
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 64, in default_collate
return default_collate([torch.as_tensor(b) for b in batch])
File "/uestcers/uestc1/.conda/envs/music/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [11, 512, 14, 14] at entry 0 and [10, 512, 14, 14] at entry 1
此问题已解决!之前我直接用了the extracted frames (1fps),但是这里面提取的帧数有62和60的,我重新在原视频中提取后得到的就是60的了。
I have tried to use a pretrained BERT to encode the text, but no matter how I tune the hyperparameters, the accuracy is around 50%. It seems that these text shouldn't be encoded with a complex encoder? Could you please tell me your opinion on this phenomenon?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.