Hi, would you mind also uploading the weights (or whole checkpoint) for a model with t

ImageNet linear classifier weights? about moco HOT 18 CLOSED

facebookresearch commented on August 28, 2024

ImageNet linear classifier weights?

from moco.

Comments (18)

lucasb-eyer commented on August 28, 2024 2

Yes, the code makes learning them easy indeed. However, it looks to be ~75h on 4-GPUs :) Anyways, thanks for your answer!

from moco.

lucasb-eyer commented on August 28, 2024 1

Indeed getting the same timings with the PyTorch ImageNet example, so the speed is not an issue with this repo, and I apologize for the distraction!

from moco.

lucasb-eyer commented on August 28, 2024 1

Final update just to confirm that indeed it was a problem with my setup, related to data i/o as Kaiming suspected.

from moco.

lucasb-eyer commented on August 28, 2024 1

Sure. I was using cloud, and had the data on a SSD disk in the same datacenter as my VM. But apparently, that's still slower than using the VM's boot-disk. So after copying the data onto the boot-disk's SSD, everything was as fast as it should be. As Kaiming pointed out, the reported timings did not reflect that reality.

from moco.

KaimingHe commented on August 28, 2024

Hi, we have not planned to upload the weights of the linear classifiers or fine-tuned detectors, to keep the repo manageable. Those can be trained in a few hours given our provided pre-training weights.

from moco.

KaimingHe commented on August 28, 2024

Yes, the code makes learning them easy indeed. However, it looks to be ~75h on 4-GPUs :) Anyways, thanks for your answer!

Training the linear classifier using the pre-trained weights only takes ~8 to 12 hours in 8 GPUs. I guess in 4 GPUs it won't be much slower as the bottleneck is just data loading.

from moco.

lucasb-eyer commented on August 28, 2024

I switched to an 8xV100 machine and have the data on SSD and increased workers from the default 4 to 16, but given the logs below I don't think data is the bottleneck:

Estimate is still at 33h. Are you sure you get that speed with batch-size 256? I didn't want to increase batch-size compared to the README, as I don't know if it would deteriorate results. But GPUs do look pretty under-utilized

from moco.

KaimingHe commented on August 28, 2024

Could you please try --workers 32? We set that as the default in main_moco.py but missed it in main_lincls.py.

from moco.

lucasb-eyer commented on August 28, 2024

Sure, will try that in a second! however given my screenshot above, I don't think data input is bottleneck?

from moco.

KaimingHe commented on August 28, 2024

Sure, will try that in a second! however given my screenshot above, I don't think data input is bottleneck?

The interface was inherited from the official PyTorch code, and I think the "Data" time monitor is not precise. Using --workers 32 should help.

from moco.

lucasb-eyer commented on August 28, 2024

While GPU utilization seems higher in nvidia-smi (hitting 100% larger fraction of the time), the logs as well as my timing estimates remain the same as they were --workers 16: it fluctuates between 20-30min per epoch, which would be 33h in the optimistic case. I read through the code but didn't see an obvious reason for this. I also tried both DataParallel variant and distributed variant, but they get about the same ballpark speed too. I would need 5min/ep to hit 8h total.

from moco.

KaimingHe commented on August 28, 2024

While GPU utilization seems higher in nvidia-smi (hitting 100% larger fraction of the time), the logs as well as my timing estimates remain the same as they were --workers 16: it fluctuates between 20-30min per epoch, which would be 33h in the optimistic case. I read through the code but didn't see an obvious reason for this. I also tried both DataParallel variant and distributed variant, but they get about the same ballpark speed too. I would need 5min/ep to hit 8h total.

The timer is an average of all iterations in one epoch, and the overhead of the first few iterations would dominate the average time. Perhaps you may estimate the time after running one epoch.

This code is just the official PyTorch ImageNet code. Full end-to-end training for 100 epochs on ImageNet should take <20 hours in 8 GPUs, and I believe there are plenty of benchmarking of this kind available on Internet. Training the linear classifier should be faster as it does not backprop thorough the weights (except for the fc).

from moco.

ppwwyyxx commented on August 28, 2024

I just ran the exact command in https://github.com/facebookresearch/moco/#linear-classification and -j 32 with 8 V100s and the first epoch finished in around 6 minutes.

If you're unable to get this speed it's likely an issue of your environment.

from moco.

lucasb-eyer commented on August 28, 2024

The timing estimate is me manually looking at steps and a wall-clock, it is pretty reliable :)

In this case, I must be doing something wrong in the machine setup and will have to figure that out myself. Thanks a lot for all your information and attempts to help!

from moco.

mbsariyildiz commented on August 28, 2024

Hello, thanks for this discussion.
@lucasb-eyer can you please tell me what the bottleneck was specifically? With a fresh conda environment and data on an SSD, I get similar timings with 4 V100s.

from moco.

mbsariyildiz commented on August 28, 2024

Thanks! In my case, I have the data on the SSD of my local server. I guess I need to do profiling to see what the bottleneck is. Cheers!

from moco.

aravindsrinivas commented on August 28, 2024

I was using cloud too but I put the data on the boot SSD disk of the VM instance. Still see the slow training. @lucasb-eyer could you share your epoch speeds?

from moco.

lucasb-eyer commented on August 28, 2024

Just like @ppwwyyxx mentioned above in the thread: roughly 6min per epoch.

from moco.

ImageNet linear classifier weights? about moco HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent