mhn-ijcai22's Introduction

MHN

This is the PyTorch Implementation of our paper "Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering". (accepted by IJCAI’22)

Platform and dependencies

Ubuntu 14.04
Python 3.7
CUDA10.1
CuDNN7.5+
pytorch>=1.7.0

Data Preparation

Download the dataset
MSVD-QA: link
MSRVTT-QA: link
TGIF-QA: link
Preprocessing
1. To extract questions or answers Glove Embedding, please ref here.
  Take the action task in TGIF-QA dataset as an example, we have features at the path /QAfeatures: TGIF/word/action/TGIF_action_train_questions.pt TGIF/word/action/TGIF_action_val_questions.pt TGIF/word/action/TGIF_action_test_questions.pt TGIF/word/action/TGIF_action_vocab.json
2. To extract appearance and motion feature, use the pretrained models here.
  for the action task, we have features at the path /Vfeatures:
  TGIF/SpatialFeatures/tumblr_nd24xaX8d11qkb1azo1_250/Features.pkl (shape is 2^level-1,16,2048)
  TGIF/SpatialFeatures/tumblr_no00ddSlG31t34v14o1_250/Features.pkl
  ...
  TGIF/TemporalFeatures/tumblr_nd24xaX8d11qkb1azo1_250/Features.pkl (shape is 2^level-1,2048)
  TGIF/TemporalFeatures/tumblr_no00ddSlG31t34v14o1_250/Features.pkl
  ...
  In our paper, number of levels is set to 3 by default.

Train and test

The trained models for the action task can be downloaded from here.

Reference

@article{peng2022MHN,
     title={Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering},
     author={Peng Min, Wang Chongyang, Gao Yuan, Shi Yu, Zhou Xiang-Dong},
     journal={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)},
     year={2022}}

mhn-ijcai22's People

Contributors

Stargazers

Watchers

mhn-ijcai22's Issues

How many clips do you set?

Ask questions about the "MHN" project

The link to the file where you trained the model in the "MHN" project has broken and the file in the "model. Eembedding" is not defined in "PositionalEncoding", can you share your work on this?

Recommend Projects