tsujuifu / pytorch_empirical-mvm Goto Github PK

A PyTorch implementation of EmpiricalMVM

Python 99.99% Shell 0.01%

cvpr2023 pre-training pytorch video-captioning video-question-answering video-retrieval vision-and-language

pytorch_empirical-mvm's Introduction

Tsu-Jui (Ray) Fu

I am a Ph.D. candidate at UCSB CS, advised by William Wang. My research lies in vision+language and text-guided visual editing. I am also interested in language grounding and information extraction. My goal is to bridge the gap between vision and language via the AI system.

pytorch_empirical-mvm's People

Contributors

Stargazers

Watchers

Forkers

danielflaherty palaktotala

pytorch_empirical-mvm's Issues

Missing training datasets and checkpoint of MSRVTT-MC

Hi @tsujuifu,
Thanks for your great work and tidy github for future works!

I found that MSRVTT-MC train- and val-datasets are missed in google drive, and also same for the checkpoint.
It would be appreciated if you upload those things, thanks.

Test data discrepancy

Hi,

Congrats on the amazing work!! For the DiDeMo video retrieval, the test data used has 913 samples where as CLIP4Clip has 1003 samples. Is it the value reported in the paper based on 913 or 1003?

Thank you.

Distributed Initialization Fails When Pretraining with Multiple GPUs

I'm able to successfully start the training on a single node single GPU setup, but fail when I increase the number of GPUs.

For example, on an A100 with 2 GPUs, if I run the following with deepspeed enabled:

CUDA_VISIBLE_DEVICES='0,1' python -m torch.distributed.launch --nproc_per_node=2 --master_port=5566 main_pretrain_yaml.py --config _args/args_pretrain.json

I can see that both GPUs (ranks 0 and 1) are seemingly able to initialize the distributed training, but while GPU rank 0 continues to run as expected, GPU rank 1 becomes unresponsive. Furthermore, It appears that only one process on the CPU starts and is pinned on one of the gpus.

Here's a snippet from the logs:

INFO - __main__ -   Init distributed training on local rank 0
INFO - __main__ -   Init distributed training on local rank 1
INFO - torch.distributed.distributed_c10d -   Added key: store_based_barrier_key:1 to store for rank: 1
INFO - torch.distributed.distributed_c10d -   Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
INFO - torch.distributed.distributed_c10d -   Added key: store_based_barrier_key:1 to store for rank: 0
INFO - torch.distributed.distributed_c10d -   Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
...

[INFO] [comm.py:594:init_distributed] cdb=None
INFO - torch.distributed.distributed_c10d -   Added key: store_based_barrier_key:2 to store for rank: 0
INFO - torch.distributed.distributed_c10d -   Waiting in store based barrier to initialize process group for rank: 0, key: store_based_barrier_key:2 (world_size=2, worker_count=1, timeout=0:30:00)
INFO - torch.distributed.distributed_c10d -   Waiting in store based barrier to initialize process group for rank: 0, key: store_based_barrier_key:2 (world_size=2, worker_count=1, timeout=0:30:00)
INFO - torch.distributed.distributed_c10d -   Waiting in store based barrier to initialize process group for rank: 0, key: store_based_barrier_key:2 (world_size=2, worker_count=1, timeout=0:30:00)
...

This issue arises when attempting to distribute the computational workload across multiple data files (cc3m/webvid2.5m_train_0.caption.tsv to cc3m/webvid2.5m_train_9.caption.tsv) verses when using a single file (cc3m/webvid2.5m_train_0.caption.tsv) so it seems like the problem may be in the cpu's data loading/handling of the files. I have tried increasing the number of workers without success.

Note that this occurs in the code when making a call to
self.model, self.optzr, _, _ = deepspeed.initialize(config_params=config, model=self.model, optimizer=self.optzr, lr_scheduler=self.lr_scheduler)

And similarly, in the case that deepspeed is not enabled, at
self.model = T.nn.parallel.DistributedDataParallel(self.model, device_ids=[get_local_rank()], output_device=get_local_rank(), find_unused_parameters=True)

Please help, thanks!

Are models in Table 1 finetuned by the downstream dataset (TGIF-frame or Didemo)?

Hi,

Thank you for your great work!
But I would like to know, are the models in Table 1 fine-tuned by the downstream dataset (TGIF-frame or Didemo)?
What are the details of the settings in Table 1 (there is only "All variants are pre-trained on WebVid [3] for 5 epochs" in your paper)? Because I trained your model with spatial-focused image feature targets (5 epochs for webvid-2.5M, 10 epochs for Didemo) and found that the R1 for the Didemo dataset is 25.23 (35.4 in your paper).
Am I missing something here?
Any reply would be helpful!

Best wishes!

My dataset

Hi sir,
I want to run my dataset on your model, can you give some information more about your dataset, i am not able to figure out what the video_id and id fields are, and any code related to preprocessing the dataset will be helpful.
Thankyou

Performance discrepancy on Didemo-retrieval

Hi,

Thank you for your interesting work. I have downloaded your best pre-trained checkpoint here and fine-tuned the model on the DIDEMO-retrieval dataset.

However, the test scores I have obtained are 39.84/69.77/79.57, which shows a large discrepancy compared with the numbers in the README file and the paper (Table 9).

I have used the original configuration in the repo. Can I ask whether the aforementioned checkpoint is the checkpoint used in Table 9 of the paper, or whether there is any other issue?

Thank you so much.
Regards,
Thong

没有找到_pretrain/val_webvid2.5m.yaml

非常棒的工作，复现pretrain任务时，提示没有找到yaml_file:_pretrain/val_webvid2.5m.yaml，请问这个文件在哪里呢，非常感谢

Test accuracy discrepancy on MSVD-QA

Hi @tsujuifu,
thanks again for your great work.

As I ran the script for MSVD-QA downstream task, I got the following results (Best test 51.49), which is lower than 54.6.

Q1. Do you have any idea what I missed?
I didn't change any argument in _args.args_msvd-qa.json,
and use the command like CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node=4 --master_port=5566 main_qaoe_tsv_mlm_head.py --config _args/args_msvd-qa.json.

Q2. Increasing the frame size
Also, when I increased size_frame argument to 10, the downstream performance was lower than using just 5 frames.
Is it the expected result? As I'm the beginner in this field, I would like to ask your insight.

tsujuifu / pytorch_empirical-mvm Goto Github PK

pytorch_empirical-mvm's Introduction

Tsu-Jui (Ray) Fu

pytorch_empirical-mvm's People

Contributors

Stargazers

Watchers

Forkers

pytorch_empirical-mvm's Issues

Missing training datasets and checkpoint of MSRVTT-MC

Test data discrepancy

Distributed Initialization Fails When Pretraining with Multiple GPUs

Are models in Table 1 finetuned by the downstream dataset (TGIF-frame or Didemo)?

My dataset

Performance discrepancy on Didemo-retrieval

没有找到_pretrain/val_webvid2.5m.yaml

Test accuracy discrepancy on MSVD-QA

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent