antoine77340 / mil-nce_howto100m Goto Github PK

View Code? Open in Web Editor NEW

213.0 213.0 31.0 11.11 MB

PyTorch GPU distributed training code for MIL-NCE HowTo100M

License: Apache License 2.0

Python 100.00%

mil-nce_howto100m's People

Contributors

Stargazers

Watchers

mil-nce_howto100m's Issues

Does this dataset cooperate with Google? Is it licensed by Google?

how to modify the project to train on other distributed systems?

May I ask how to train it on horovod or other distributed pytorch architecture?

Parameters for replicating Zero-Shot evaluation retrieval results

Hello,

Thank you so much for sharing your code and pretrained models. I was trying to replicate your text-video retrieval results on the MSR-VTT dataset. I obtained the pretrained model from here - https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/s3d_howto100m.pth. I ran the command mentioned in the README to perform the evaluation but using a smaller batch size, I didn't change any of the other parameters:

python3 eval_msrvtt.py --batch_size=2  --num_thread_reader=20 --num_windows_test=10 --eval_video_root=path_to_videos --pretrain_cnn_path=path_to_pretrained_model

I get the following results:

R@1: 0.1 - R@5: 0.25 - R@10: 0.35 - Median R: 31.0

These numbers are much lower than the ones mentioned in the table. I am guessing that the evaluation parameters are different since changing the batch size should not affect the results. Could you please tell me what parameter values were used to obtain the results mentioned in the table?

Thank you!

How to install Slurm

Hi,
I couldn't install Slurm.
Is there any easy way to got it?
The web page looks like so confusing.
https://slurm.schedmd.com/download.html

Any suggestion and help?

Training with smaller batch sizes

Hi @antoine77340 ,

Thank you very much for publishing this great paper and code.

I'm wondering if you can share some insights regarding the effect of batch size on the results. Did you ever attempt to go below batch size 512? (e.g., 32/64/128?). While trying to reproduce the results, I wasn't able to get very far with batch size 64 (unfortunately, larger batch sizes are out of reach for me w.r.t compute).

Thanks!!
Amir

Log

Hello, thank you for this great work. I am customizing this repo with my own dataset and I am wondering if you can provide the log file for the training. Thanks you ahead!

video-caption pair

I noticed that the caption csv file seems to have many overlaps for each clip. I want to see the long range pair, but I think it'll be a problem if I just concatenate them. Is there any way to get a video-caption pair? Do I have to use ASR on my own?
By the way, thanks for sharing nice work.

about sentence embedding

Hi! just want to ask a quick question about sentence embedding. In this work the sentence embedding is just 2 fc with a max-pooling. I am wondering if you have experimented with more complex sentence embeddings, such as BERT? Is the current design to save computation costs, or does it work better than more complex models? Thanks!

Training speed

Thanks for this nice work. Could you provide a rough estimation of the running time for this implementation?

Currently, it takes around 2.5 hours to train one epoch and seems much slower than the normal case. (Total batch size 2048, 4 x 8V100, 32G)

Thank you!

Release of the pretrained weights

Thank you for this great work,

Are you planning on releasing the pre-trained weights for the s3dg model?

thanks !

Question about Sentence_Embedding

Hi,
first of all thank you very much for sharing this!

I have two questions regarding the sentence embedding class.

Why is th.no_grad() used with the embedding even if no pretrained embeddings are used?
Does the model account for the zero padding in the text? E.g. by replacing the embeddings for padded words by zero before pooling.

Thank you!

Other hyperparams in pre-trained checkpoint (eg. learning rate)

Hi, thanks for this very useful code.

When I was trying to reproduce and train the model based on the pre-trained weights from S3D_Howto100m, the model quickly outputs all NaN for the video and text embeddings after 132 steps with batch size 1024, which is very strange (still same when I tested different learning rates). I found in the provided checkpoint, there is only weights of the network but no other hyperparameters like the learning rate.

Could you please share the hyperparams after pretraining (maybe this could be the issue)? Also it would be much appreciated if you could shed some light on the bugs I got.

PS. I use the provided S3D video features and only keep the very last linear layer for training the video encoder.

Pre-processed video download

Hello,

can you please explain what you mean by preprocessed videos in "Finally the preprocessed HowTo100M videos (12Tb in total)..."?

How are they preprocessed?

About the MILNCELoss

Hi,

What is the input for the MILNCELoss function?
Is it like this:
video_embd: batch x D
text_embd: batch x D
where D=512?
Why do you cat the x and x transpose here?
denominator = th.cat((x, x.permute(1,0,2)), dim=1).view(x.shape[0], -1)
isn't that th.logsumexp(x, dim=1) already computed the log sum in the denominator?

Thanks

Do the model see all the training clips at least once?

Thanks for the good work! As per the README, "An epoch here is equivalent of processing 1238911 video-text training samples, which is the number of different videos in HowTo100M. It is not the same as the number of different training video clips as there are more than 100M clips." Further, the clips are chosen randomly from a long video (here). Is it possible that the model is not looking at some clips in the dataset? I know that will not have a significant impact on the performance but just I am checking my understanding. Thanks!

Minor mistake in MILNCELoss

Hi there,

I'm interested in your paper and the proposed MIL-NCE Loss. So I read the code about this loss.
However, I found a mistake in your loss function.

The original design in your paper should be loss = log(pos/(pos+neg)).
But in the code implementation, you might mistakenly design it as loss = log(pos/(pos+pos+neg)), leading the minimum of this loss to be 0.6931. Luckily, this would have little impact when batch size is large.

You could check the mistake by this code snippet:

import MILNCELoss

loss_fn = MILNCELoss()
video = torch.Tensor([[100,0,0,0,0,0]]).cuda()
>>> tensor([[100.,   0.,   0.,   0.,   0.,   0.]], device='cuda:0')
text = torch.Tensor([[100,0,0,0,0,0],[100,0,0,0,0,0],[100,0,0,0,0,0]]).cuda()
>>> tensor([[100.,   0.,   0.,   0.,   0.,   0.],
        [100.,   0.,   0.,   0.,   0.,   0.],
        [100.,   0.,   0.,   0.,   0.,   0.]], device='cuda:0')
loss_fn(video, text)
>>> tensor(0.6934, device='cuda:0')

Could you release the checkpoint?

Hi, I can not find the checkpoint link. Could you release the checkpoint?

hi,about your loss ~~

hi , dear author:

Thank you open your code~~ I don't know if you can write a simple code to give a example to use MIL-NCE loss ? ╥﹏╥..., I afraid I use it worsely ...

Best,
jun (●'◡'●)

Evaluation on CrossTask and COIN

Thank you for making your code public! Would you also be releasing the evaluation pipeline for CrossTask and COIN?

YouCook and MSR-VTT Dataloaders

Hello,

(Edited)

Thank you for releasing the code. It's massively helpful. I had few queries regarding the dataloaders:

Where is num_frames used? In args.py, it is flagged as a random seed.
What is the difference between num_frames and num_clips? Why is num_clips set to 10 for eval_msrvtt?
Consider the following code block
np.linspace(start, max(start, end-self.num_sec - 0.4), num_clip)
What is the role of num_sec then in this context?

Thanks in advance.

Error with dataloader

I got this error message while running dataloader with 40 workers to load Howto100M dataset. Just wondering if you ever encountered this. if not don't worry, it might be some problem with my setup.

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/workspace/pytorch_code/mmssl_graph/train_MILNCE.py", line 207, in main_worker
train(train_loader, model, criterion, optimizer, scheduler, epoch, train_dataset, writer, args)
File "/workspace/pytorch_code/mmssl_graph/train_MILNCE.py", line 225, in train
for i_batch, sample_batch in enumerate(train_loader):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1065, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: init() missing 2 required positional arguments: 'stdout' and 'stderr'

Environment

PyTorch Version (e.g., 1.0): 1.7.0
OS (e.g., Linux): Ubuntu 18.04
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source):
Python version: 3.8
CUDA/cuDNN version: 11.1
GPU models and configuration: Tesla V100
Any other relevant information: Running with Docker container

Unable to access the word2vec matrix

Hi I'm unable to access the word2vec matrix or other files from the https://www.rocq.inria.fr/cluster-willow/ cluster

wget https://www.rocq.inria.fr/cluster-willow/amiech/word2vec.zip

normalized vector dot product

Hi! I noticed that in this paper you directly multiply the embedding vectors without normalizing them, as many of the recent self-supervised learning paper has done. Is there a specific reason for not doing the normalization? Thanks!

antoine77340 / mil-nce_howto100m Goto Github PK

mil-nce_howto100m's People

Contributors

Stargazers

Watchers

Forkers

mil-nce_howto100m's Issues

Environment

Recommend Projects

Recommend Topics

Recommend Org