Code Monkey home page Code Monkey logo

pytorch_empirical-mvm's Introduction

Tsu-Jui (Ray) Fu

I am a Ph.D. candidate at UCSB CS, advised by William Wang. My research lies in vision+language and text-guided visual editing. I am also interested in language grounding and information extraction. My goal is to bridge the gap between vision and language via the AI system.

pytorch_empirical-mvm's People

Contributors

tsujuifu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pytorch_empirical-mvm's Issues

Test data discrepancy

Hi,

Congrats on the amazing work!! For the DiDeMo video retrieval, the test data used has 913 samples where as CLIP4Clip has 1003 samples. Is it the value reported in the paper based on 913 or 1003?

Thank you.

Distributed Initialization Fails When Pretraining with Multiple GPUs

I'm able to successfully start the training on a single node single GPU setup, but fail when I increase the number of GPUs.

For example, on an A100 with 2 GPUs, if I run the following with deepspeed enabled:

CUDA_VISIBLE_DEVICES='0,1' python -m torch.distributed.launch --nproc_per_node=2 --master_port=5566 main_pretrain_yaml.py --config _args/args_pretrain.json

I can see that both GPUs (ranks 0 and 1) are seemingly able to initialize the distributed training, but while GPU rank 0 continues to run as expected, GPU rank 1 becomes unresponsive. Furthermore, It appears that only one process on the CPU starts and is pinned on one of the gpus.

Here's a snippet from the logs:

INFO - __main__ -   Init distributed training on local rank 0
INFO - __main__ -   Init distributed training on local rank 1
INFO - torch.distributed.distributed_c10d -   Added key: store_based_barrier_key:1 to store for rank: 1
INFO - torch.distributed.distributed_c10d -   Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
INFO - torch.distributed.distributed_c10d -   Added key: store_based_barrier_key:1 to store for rank: 0
INFO - torch.distributed.distributed_c10d -   Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
...

[INFO] [comm.py:594:init_distributed] cdb=None
INFO - torch.distributed.distributed_c10d -   Added key: store_based_barrier_key:2 to store for rank: 0
INFO - torch.distributed.distributed_c10d -   Waiting in store based barrier to initialize process group for rank: 0, key: store_based_barrier_key:2 (world_size=2, worker_count=1, timeout=0:30:00)
INFO - torch.distributed.distributed_c10d -   Waiting in store based barrier to initialize process group for rank: 0, key: store_based_barrier_key:2 (world_size=2, worker_count=1, timeout=0:30:00)
INFO - torch.distributed.distributed_c10d -   Waiting in store based barrier to initialize process group for rank: 0, key: store_based_barrier_key:2 (world_size=2, worker_count=1, timeout=0:30:00)
...

This issue arises when attempting to distribute the computational workload across multiple data files (cc3m/webvid2.5m_train_0.caption.tsv to cc3m/webvid2.5m_train_9.caption.tsv) verses when using a single file (cc3m/webvid2.5m_train_0.caption.tsv) so it seems like the problem may be in the cpu's data loading/handling of the files. I have tried increasing the number of workers without success.

Note that this occurs in the code when making a call to
self.model, self.optzr, _, _ = deepspeed.initialize(config_params=config, model=self.model, optimizer=self.optzr, lr_scheduler=self.lr_scheduler)

And similarly, in the case that deepspeed is not enabled, at
self.model = T.nn.parallel.DistributedDataParallel(self.model, device_ids=[get_local_rank()], output_device=get_local_rank(), find_unused_parameters=True)

Please help, thanks!

Are models in Table 1 finetuned by the downstream dataset (TGIF-frame or Didemo)?

Hi,

Thank you for your great work!
But I would like to know, are the models in Table 1 fine-tuned by the downstream dataset (TGIF-frame or Didemo)?
What are the details of the settings in Table 1 (there is only "All variants are pre-trained on WebVid [3] for 5 epochs" in your paper)? Because I trained your model with spatial-focused image feature targets (5 epochs for webvid-2.5M, 10 epochs for Didemo) and found that the R1 for the Didemo dataset is 25.23 (35.4 in your paper).
Am I missing something here?
Any reply would be helpful!

Best wishes!

My dataset

Hi sir,
I want to run my dataset on your model, can you give some information more about your dataset, i am not able to figure out what the video_id and id fields are, and any code related to preprocessing the dataset will be helpful.
Thankyou

Performance discrepancy on Didemo-retrieval

Hi,

Thank you for your interesting work. I have downloaded your best pre-trained checkpoint here and fine-tuned the model on the DIDEMO-retrieval dataset.

However, the test scores I have obtained are 39.84/69.77/79.57, which shows a large discrepancy compared with the numbers in the README file and the paper (Table 9).

I have used the original configuration in the repo. Can I ask whether the aforementioned checkpoint is the checkpoint used in Table 9 of the paper, or whether there is any other issue?

Thank you so much.
Regards,
Thong

Test accuracy discrepancy on MSVD-QA

Hi @tsujuifu,
thanks again for your great work.

As I ran the script for MSVD-QA downstream task, I got the following results (Best test 51.49), which is lower than 54.6.
image

Q1. Do you have any idea what I missed?
I didn't change any argument in _args.args_msvd-qa.json,
and use the command like CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node=4 --master_port=5566 main_qaoe_tsv_mlm_head.py --config _args/args_msvd-qa.json.

Q2. Increasing the frame size
Also, when I increased size_frame argument to 10, the downstream performance was lower than using just 5 frames.
Is it the expected result? As I'm the beginner in this field, I would like to ask your insight.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.