antoine77340 / mil-nce_howto100m Goto Github PK
View Code? Open in Web Editor NEWPyTorch GPU distributed training code for MIL-NCE HowTo100M
License: Apache License 2.0
PyTorch GPU distributed training code for MIL-NCE HowTo100M
License: Apache License 2.0
Does this dataset cooperate with Google? Is it licensed by Google?
May I ask how to train it on horovod or other distributed pytorch architecture?
Hello,
Thank you so much for sharing your code and pretrained models. I was trying to replicate your text-video retrieval results on the MSR-VTT dataset. I obtained the pretrained model from here - https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/s3d_howto100m.pth. I ran the command mentioned in the README to perform the evaluation but using a smaller batch size, I didn't change any of the other parameters:
python3 eval_msrvtt.py --batch_size=2 --num_thread_reader=20 --num_windows_test=10 --eval_video_root=path_to_videos --pretrain_cnn_path=path_to_pretrained_model
I get the following results:
R@1: 0.1 - R@5: 0.25 - R@10: 0.35 - Median R: 31.0
These numbers are much lower than the ones mentioned in the table. I am guessing that the evaluation parameters are different since changing the batch size should not affect the results. Could you please tell me what parameter values were used to obtain the results mentioned in the table?
Thank you!
Hi,
I couldn't install Slurm.
Is there any easy way to got it?
The web page looks like so confusing.
https://slurm.schedmd.com/download.html
Any suggestion and help?
Hi @antoine77340 ,
Thank you very much for publishing this great paper and code.
I'm wondering if you can share some insights regarding the effect of batch size on the results. Did you ever attempt to go below batch size 512? (e.g., 32/64/128?). While trying to reproduce the results, I wasn't able to get very far with batch size 64 (unfortunately, larger batch sizes are out of reach for me w.r.t compute).
Thanks!!
Amir
Hello, thank you for this great work. I am customizing this repo with my own dataset and I am wondering if you can provide the log file for the training. Thanks you ahead!
I noticed that the caption csv file seems to have many overlaps for each clip. I want to see the long range pair, but I think it'll be a problem if I just concatenate them. Is there any way to get a video-caption pair? Do I have to use ASR on my own?
By the way, thanks for sharing nice work.
Hi! just want to ask a quick question about sentence embedding. In this work the sentence embedding is just 2 fc with a max-pooling. I am wondering if you have experimented with more complex sentence embeddings, such as BERT? Is the current design to save computation costs, or does it work better than more complex models? Thanks!
Thanks for this nice work. Could you provide a rough estimation of the running time for this implementation?
Currently, it takes around 2.5 hours to train one epoch and seems much slower than the normal case. (Total batch size 2048, 4 x 8V100, 32G)
Thank you!
Thank you for this great work,
Are you planning on releasing the pre-trained weights for the s3dg model?
thanks !
Hi,
first of all thank you very much for sharing this!
I have two questions regarding the sentence embedding class.
Thank you!
Hi, thanks for this very useful code.
When I was trying to reproduce and train the model based on the pre-trained weights from S3D_Howto100m, the model quickly outputs all NaN for the video and text embeddings after 132 steps with batch size 1024, which is very strange (still same when I tested different learning rates). I found in the provided checkpoint, there is only weights of the network but no other hyperparameters like the learning rate.
Could you please share the hyperparams after pretraining (maybe this could be the issue)? Also it would be much appreciated if you could shed some light on the bugs I got.
PS. I use the provided S3D video features and only keep the very last linear layer for training the video encoder.
Hello,
can you please explain what you mean by preprocessed videos in "Finally the preprocessed HowTo100M videos (12Tb in total)..."?
How are they preprocessed?
Hi,
What is the input for the MILNCELoss function?
Is it like this:
video_embd: batch x D
text_embd: batch x D
where D=512?
Why do you cat the x and x transpose here?
denominator = th.cat((x, x.permute(1,0,2)), dim=1).view(x.shape[0], -1)
isn't that th.logsumexp(x, dim=1) already computed the log sum in the denominator?
Thanks
Thanks for the good work! As per the README, "An epoch here is equivalent of processing 1238911 video-text training samples, which is the number of different videos in HowTo100M. It is not the same as the number of different training video clips as there are more than 100M clips." Further, the clips are chosen randomly from a long video (here). Is it possible that the model is not looking at some clips in the dataset? I know that will not have a significant impact on the performance but just I am checking my understanding. Thanks!
Hi there,
I'm interested in your paper and the proposed MIL-NCE Loss. So I read the code about this loss.
However, I found a mistake in your loss function.
The original design in your paper should be loss = log(pos/(pos+neg))
.
But in the code implementation, you might mistakenly design it as loss = log(pos/(pos+pos+neg))
, leading the minimum of this loss to be 0.6931. Luckily, this would have little impact when batch size is large.
You could check the mistake by this code snippet:
import MILNCELoss
loss_fn = MILNCELoss()
video = torch.Tensor([[100,0,0,0,0,0]]).cuda()
>>> tensor([[100., 0., 0., 0., 0., 0.]], device='cuda:0')
text = torch.Tensor([[100,0,0,0,0,0],[100,0,0,0,0,0],[100,0,0,0,0,0]]).cuda()
>>> tensor([[100., 0., 0., 0., 0., 0.],
[100., 0., 0., 0., 0., 0.],
[100., 0., 0., 0., 0., 0.]], device='cuda:0')
loss_fn(video, text)
>>> tensor(0.6934, device='cuda:0')
Hi, I can not find the checkpoint link. Could you release the checkpoint?
hi , dear author:
Thank you open your code~~ I don't know if you can write a simple code to give a example to use MIL-NCE loss ? ╥﹏╥..., I afraid I use it worsely ...
Best,
jun (●'◡'●)
Thank you for making your code public! Would you also be releasing the evaluation pipeline for CrossTask and COIN?
Hello,
(Edited)
Thank you for releasing the code. It's massively helpful. I had few queries regarding the dataloaders:
np.linspace(start, max(start, end-self.num_sec - 0.4), num_clip)
Thanks in advance.
I got this error message while running dataloader with 40 workers to load Howto100M dataset. Just wondering if you ever encountered this. if not don't worry, it might be some problem with my setup.
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/workspace/pytorch_code/mmssl_graph/train_MILNCE.py", line 207, in main_worker
train(train_loader, model, criterion, optimizer, scheduler, epoch, train_dataset, writer, args)
File "/workspace/pytorch_code/mmssl_graph/train_MILNCE.py", line 225, in train
for i_batch, sample_batch in enumerate(train_loader):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1065, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: init() missing 2 required positional arguments: 'stdout' and 'stderr'
conda
, pip
, source): pipHi I'm unable to access the word2vec matrix or other files from the https://www.rocq.inria.fr/cluster-willow/ cluster
wget https://www.rocq.inria.fr/cluster-willow/amiech/word2vec.zip
Hi! I noticed that in this paper you directly multiply the embedding vectors without normalizing them, as many of the recent self-supervised learning paper has done. Is there a specific reason for not doing the normalization? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.