Code Monkey home page Code Monkey logo

ego4d_asl's Introduction

Ego4D-ASL

Techical report | 1-st in MQ challenge and 2-nd in NLQ challenge in Ego4D workshop at CVPR 2023.

This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries. This solution inherits from our proposed Action Sensitivity Learning framework (ASL) to better capture discrepant information of frames. Further, we incorporate a series of stronger video features and fusion strategies. Our method achieves an average mAP of 29.34, ranking 1st in Moment Queries Challenge, and garners 19.79 mean R1, ranking 2nd in Natural Language Queries Challenge. Our code will be released.

Changelog

  • May have some bugs or problems, we are in progress to improve the released codes.
  • release the code for MQ
  • release the code for NLQ
  • tidy the code

Installation

  • GCC, PyTorch==1.12.0, CUDA==cu113 dependencies
  • pip dependencies
conda env create -n py38 python==3.8
conda activate py38
pip install  tensorboard numpy pyyaml pandas h5py joblib
  • NMS compilation
cd ./libs/utils
python setup.py install --user
cd ../..

Data Preparation

  • Ego4D MQ Annotation, Video Data / Features Preparation

    • Please refer to Ego4D website to download features.
    • In our submission, we finally use InternVideo, EgoVLP, Slowfast and Omnivore features, where only combination of InternVideo and EgoVLP can achieve good results.
  • Ego4D Video Features Preparation

    • By using python convert_annotation.py to convert official annotation to the processed one. And put it into data/ego4d.
    • Create config file such as baseline.yaml corrsponding to training. And put it into configs/
    • In baseline.yaml, you can specify annotation json_file, video features, training split and validation split, e.t.c.

Train on MQ (train-set)

  • Change the train_split as ['train'] and val_split as ['val'].
  • bash train_val.sh baseline 0 where baseline is the corresponding config yaml and 0 is the GPU ordinal.

Validate on MQ (val-set)

  • When running bash train_val.sh baseline 0 , in epoch > max_epoch //3, it will automatically validate performance on val-set (e.g., average mAP, Recall@1x)
  • Can also run bash val.sh checkpoint.ckpt baseline to validate performance manually.
  • It is expected to get average mAP between 27-28(%).

Train on MQ (Train + val)

  • Change the train_split as ['train', 'val] and val_split as ['test'].
  • bash train_combine.sh baseline 0 where baseline is the corresponding config yaml and 0 is the GPU ordinal.
  • In this way, it will not validate performance during training, will save checkpoint of the last 5 epochs instead.

Submission (to Leaderboard test-set server)

  • python infer.py --config configs/baseline.yaml --ckpt your_checkpoint to finally generate submission.json of detection results.
  • Then python merge_submission.py to generate submission_final.json which is results of both detection and retrieval.
  • Upload submission_final.json to the Ego4D MQ test-server

Acknowledgement

Our model are based on Actionformer. Thanks for their contributions.

Cite

@article{shao2023action,
  title={Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023},
  author={Shao, Jiayi and Wang, Xiaohan and Quan, Ruijie and Yang, Yi},
  journal={arXiv preprint arXiv:2306.09172},
  year={2023}
}

@InProceedings{Shao_2023_ICCV,
    author    = {Shao, Jiayi and Wang, Xiaohan and Quan, Ruijie and Zheng, Junjun and Yang, Jiang and Yang, Yi},
    title     = {Action Sensitivity Learning for Temporal Action Localization},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {13457-13469}
}

ego4d_asl's People

Contributors

jonnys1226 avatar

Stargazers

AmoySH avatar TianqiTang avatar  avatar Arda Arslan avatar Xiaolong Shen avatar CYY avatar Shuai Zhao avatar  avatar Jiahao Li avatar

Watchers

Arda Arslan avatar Kostas Georgiou avatar  avatar

ego4d_asl's Issues

Data/Feature download command

Hi,

Thank you for your contribution. Could you please provide the bash command for downloading data and features for the MQ task? I was planning to use the following:

ego4d --output_directory="ego4d_data" --datasets clips annotations omnivore_video_swinl slowfast8x8_r101_k400 --benchmarks mq --version v2

Best regards,
Arda

Reproducing the MQ results

Hello,

In your paper, you mentioned that your final submission for the MQ Task on eval.ai is produced by ensembling predictions from three models. Could you please provide the config files for these three models? I am also confused about the combination of the features I should use. Is it EgoVLP+InternVideo or EgoVLP+InternVideo+SlowFast+Omnivore?

Edit: When I followed the instructions under "Train on MQ (Train + val)" and "Submission (to Leaderboard test-set server)" in the README file, I got the following scores on eval.ai for the test split:

{"Recall@1x,tIoU=0.5": 43.15983417779825, "average_mAP": 26.092527253173575}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.