Code Monkey home page Code Monkey logo

video-ttt-release's Introduction

Test-Time Training on Video Streams

Renhao Wang*, Yu Sun*, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang

[arXiv] [Project] [BibTeX]


Installation

See installation instructions.

Datasets

We release COCO-Videos, a new dataset for instance and panoptic segmentation which follows the COCO labeling format. We also rely on semantic-level labels in the KITTI-STEP dataset for evaluation on semantic segmentation.
All datasets can be downloaded here and should subsequently be unzipped to the path specified under the $DETECTRON2_DATASETS environment variable (see installation instructions).

Checkpoints

Relevant pretrained checkpoints can be obtained here. These should be downloaded and stored at some /path/to/checkpoints.

Reproducing Results

Baselines

To evaluate a pretrained Mask2Former-S on COCO-Videos for panoptic segmentation:

python runner_coco_videos_baseline.py --gpu 0 \
  --videos bangkok bar berkeley havana house irvine paris restaurant school tokyo \
  --batch_size 8 \
  --weights /path/to/checkpoints/ttt_coco_panoptic_baseline.pkl \
  --output_dir coco_vid_panoptic_baseline \
  --eval_type pano \
  --num_imgs 4000

You can pass --eval_type inst to obtain the baseline instance numbers (as well as the corresponding pretrained instance segmentation checkpoint). Results will be logged under the directory specified in the --output_dir flag,

COCO-Videos Instance and Panoptic Segmentation

Runner script for instance segmentation:

python runner_ttt_mae_inst.py --gpu 0 \
    --videos bangkok bar berkeley havana house irvine paris restaurant school tokyo \
    --batch_size 32 \
    --accum_iter 8 \
    --base_lr 0.0001 \
    --weights /path/to/checkpoints/ttt_coco_instance_baseline.pkl \
    --restart_optimizer

Runner script for panoptic segmentation:

python runner_ttt_mae_panoptic.py --gpu 0 \
    --videos bangkok bar berkeley havana house irvine paris restaurant school tokyo \
    --batch_size 32 \
    --accum_iter 8 \
    --base_lr 0.0001 \
    --weights /path/to/checkpoints/ttt_coco_panoptic_baseline.pkl \
    --restart_optimizer

For easy collation of numbers, we provide a utility script which can, for example, be called as python mask2former/utils/tabulate_results_cv.py --root_dir exp_dir/mae_coco_inst_32_0.0001.

KITTI-STEP Semantic Segmentation

Runner script:

python runner_ttt_mae.py --gpu 0 \
    --videos 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 \
    --batch_size 32 \
    --accum_iter 4 \
    --base_lrs 0.0001 \
    --weights /path/to/checkpoints/ttt_ks_semantic_baseline.pkl \
    --restart_optimizer

For easy collation of numbers, we provide a utility script which can, for example, be called as python mask2former/utils/tabulate_results.py --root_dir exp_dir/mae_ks_sema_32_0.0001.

License

This codebase inherits all licenses from the public release of Mask2Former.

Citing Video-TTT

@article{wang2023test,
  title={Test-time training on video streams},
  author={Wang, Renhao and Sun, Yu and Gandelsman, Yossi and Chen, Xinlei and Efros, Alexei A and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2307.05014},
  year={2023}
}

Acknowledgements

Code is based on Mask2Former.

video-ttt-release's People

Contributors

renwang435 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

hadryan enock360

video-ttt-release's Issues

Need Help with TENT Code

Hey there,

I'm really interested in this project! I notice the comparison of other methods, e.g. TENT, but I can't find it in the project. Could you point me in the right direction or share the code with me? Excited to learn more and get involved!

Question about the evaluation metric

Hi,

Thank you for your great work and contribution to the video recognition community.

I am wondering if you have evaluated your method and baselines using any video metrics (e.g., VPQ or STQ)?

Thank you.

Regarding the panoptic dataset in COCO-Videos

Hi,

Thank you for providing the work in open-source.
While I am looking through your provided panoptic dataset, I think I find something that I cannot understand.
Belows are two cases:

  1. In "berkely" folder, I think the id of the same person seems to be not consistent. I also checked the json file, the json file is also indicated that person has different ids.
    resized-berkeley_000001
    resized-berkeley_000011

  2. In "irvine" folder, all the people in same frame seem to be sharing same ids, which I think it is not "panoptic image".

irvine_resized_000001

I might be missing, could you help me on this to figure out how I can train and run inference previous VPS model on COCO-Videos?

Thanks

Baseline for KiTTI

hi, could you plz provide the baseline code of KITTI, including MAE joint training, TTT-MAE No mem, and Offline MAE all frames?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.