Code Monkey home page Code Monkey logo

paxion's Introduction

alt_text Overview of the Action Dynamics Benchmark (ActionBench)

alt_text Overview of the Paxion framework

๐Ÿ“ A note regarding the naming convention in this repo: we use "PatchAndFuse" or "patch_and_fuse" as an alternative name for the "Paxion" model.

Setup

Environment Setup

  • after cloning this repo, setup submodules by:

    git submodule update --init --recursive
    
  • Setup conda environment

    conda env create -f environment.yml
    conda activate paxion
    
  • install LAVIS library from source

    cd src/LAVIS
    pip install -e .
    

Dataset Setup

Download Annotations

  • Download the annotations for ActionBench here; and put under ActionBench/ego4d and ActionBench/ssv2
  • Download the annotations for downstream tasks here; and put the downloaded folder under the root directory of this repo as datasets/
  • Annotation details for each dataset can be found in the .md files in dataset_cards.

Download Videos & Preprocessing

Please refer to the .md files in dataset_cards for instructions on downloading the raw videos and preprocessing.

Download Pretrained Backbone Checkpoints

  • InternVideo: Download the InternVideo-MM-L-14 checkpoint following the instructions here; put the downloaded InternVideo-MM-L-14.ckpt under src/pretrained_ckpt/InternVideo/InternVideo-MM-L-14.ckpt
  • ClipViP: Download the CLIP-ViP-B/32 checkpoint following the instructions here; put the downloaded pretrain_clipvip_base_32.pt under src/pretrained_ckpt/ClipViP/pretrain_clipvip_base_32.pt
  • Singularity: Download the Pre-trained checkpoints following the instructions here; put the singularity_temporal_17m under src/pretrained_ckpt/Singularity/singularity_temporal_17m.pth

Download Trained Knowledge Patcher and Knowledge Fuser

  • Download the Knowledge Patcher checkpoints on actionbench here; and put under src/pretrained_ckpt/PatchAndFuse/ActionBench
  • Download the Patch & Fuse checkpoints on downstream tasks here; and put under src/pretrained_ckpt/PatchAndFuse/downstream_tasks

Quick Start

demo.py shows an usage example on loading a trained PatchAndFuse model (with InternVideo backbone and trained on SSv2-label) and perform inference on video-text matching.

Code Description

We build our codebase on top of LAVIS framework. Please refer to the documentation to get an idea of the overall structure (Tasks, Models, Runners, etc). Config files for running any tasks can be found in src/configs/projects, we include detailed comments in the .yaml files for more fine-grained customization on the experimental configs.

Training

To train Knowledge Patcher on ActionBench or further train Knowledge Fuser on downstream tasks, we provide configs under src/configs/projects/train; Please make sure to look into the configs and do necessary modifications (e.g. specify trained checkpoints which are marked as #TODO).

Here is an example for using the training configs (run_scripts/train.sh): cd src/ bash run_scripts/train.sh

Evaluation

To evaluate Knowledge Patcher on ActionBench or evaluate trained Knowledge Fuser on downstream tasks, we provide configs under src/configs/projects/eval; Please make sure to look into the configs and do necessary modifications (e.g. specify trained checkpoints which are marked as #TODO).

Here are two examples for using the evaluation configs (run_scripts/eval_actionbench.sh and run_scripts/eval_downstream_task.sh ):

  • eval actionbench:
        cd src/
        bash run_scripts/eval_actionbench.sh 
    
  • eval downstream task:
        cd src/
        bash run_scripts/eval_downstream_task.sh 
    

Inference

For customized inference, i.e., evaluate on some specific samples and visualize the results, we provide inference.py where one can set spefic instance ids in the __main__ function.

Examples for running inference on different tasks can be found in run_scripts/inference.sh

Citation

@article{wang2023paxion,
  title={Paxion: Patching Action Knowledge in Video-Language Foundation Models},
  author={Wang, Zhenhailong and Blume, Ansel and Li, Sha and Liu, Genglin and Cho, Jaemin and Tang, Zineng and Bansal, Mohit and Ji, Heng},
  journal={arXiv preprint arXiv:2305.10683},
  year={2023}
}

Acknowledgement

This code used resources from LAVIS, InternVideo, ClipViP, Singularity, and flamingo-pytorch. The code is implemented using PyTorch. We thank the authors for their great work and open-sourcing the code.

paxion's People

Contributors

mikewangwzhl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.