Code Monkey home page Code Monkey logo

Comments (18)

lucasb-eyer avatar lucasb-eyer commented on August 28, 2024 2

Yes, the code makes learning them easy indeed. However, it looks to be ~75h on 4-GPUs :) Anyways, thanks for your answer!

from moco.

lucasb-eyer avatar lucasb-eyer commented on August 28, 2024 1

Indeed getting the same timings with the PyTorch ImageNet example, so the speed is not an issue with this repo, and I apologize for the distraction!

from moco.

lucasb-eyer avatar lucasb-eyer commented on August 28, 2024 1

Final update just to confirm that indeed it was a problem with my setup, related to data i/o as Kaiming suspected.

from moco.

lucasb-eyer avatar lucasb-eyer commented on August 28, 2024 1

Sure. I was using cloud, and had the data on a SSD disk in the same datacenter as my VM. But apparently, that's still slower than using the VM's boot-disk. So after copying the data onto the boot-disk's SSD, everything was as fast as it should be. As Kaiming pointed out, the reported timings did not reflect that reality.

from moco.

KaimingHe avatar KaimingHe commented on August 28, 2024

Hi, we have not planned to upload the weights of the linear classifiers or fine-tuned detectors, to keep the repo manageable. Those can be trained in a few hours given our provided pre-training weights.

from moco.

KaimingHe avatar KaimingHe commented on August 28, 2024

Yes, the code makes learning them easy indeed. However, it looks to be ~75h on 4-GPUs :) Anyways, thanks for your answer!

Training the linear classifier using the pre-trained weights only takes ~8 to 12 hours in 8 GPUs. I guess in 4 GPUs it won't be much slower as the bottleneck is just data loading.

from moco.

lucasb-eyer avatar lucasb-eyer commented on August 28, 2024

I switched to an 8xV100 machine and have the data on SSD and increased workers from the default 4 to 16, but given the logs below I don't think data is the bottleneck:
image
Estimate is still at 33h. Are you sure you get that speed with batch-size 256? I didn't want to increase batch-size compared to the README, as I don't know if it would deteriorate results. But GPUs do look pretty under-utilized
image

from moco.

KaimingHe avatar KaimingHe commented on August 28, 2024

Could you please try --workers 32? We set that as the default in main_moco.py but missed it in main_lincls.py.

from moco.

lucasb-eyer avatar lucasb-eyer commented on August 28, 2024

Sure, will try that in a second! however given my screenshot above, I don't think data input is bottleneck?

from moco.

KaimingHe avatar KaimingHe commented on August 28, 2024

Sure, will try that in a second! however given my screenshot above, I don't think data input is bottleneck?

The interface was inherited from the official PyTorch code, and I think the "Data" time monitor is not precise. Using --workers 32 should help.

from moco.

lucasb-eyer avatar lucasb-eyer commented on August 28, 2024

While GPU utilization seems higher in nvidia-smi (hitting 100% larger fraction of the time), the logs as well as my timing estimates remain the same as they were --workers 16: it fluctuates between 20-30min per epoch, which would be 33h in the optimistic case. I read through the code but didn't see an obvious reason for this. I also tried both DataParallel variant and distributed variant, but they get about the same ballpark speed too. I would need 5min/ep to hit 8h total.

from moco.

KaimingHe avatar KaimingHe commented on August 28, 2024

While GPU utilization seems higher in nvidia-smi (hitting 100% larger fraction of the time), the logs as well as my timing estimates remain the same as they were --workers 16: it fluctuates between 20-30min per epoch, which would be 33h in the optimistic case. I read through the code but didn't see an obvious reason for this. I also tried both DataParallel variant and distributed variant, but they get about the same ballpark speed too. I would need 5min/ep to hit 8h total.

The timer is an average of all iterations in one epoch, and the overhead of the first few iterations would dominate the average time. Perhaps you may estimate the time after running one epoch.

This code is just the official PyTorch ImageNet code. Full end-to-end training for 100 epochs on ImageNet should take <20 hours in 8 GPUs, and I believe there are plenty of benchmarking of this kind available on Internet. Training the linear classifier should be faster as it does not backprop thorough the weights (except for the fc).

from moco.

ppwwyyxx avatar ppwwyyxx commented on August 28, 2024

I just ran the exact command in https://github.com/facebookresearch/moco/#linear-classification and -j 32 with 8 V100s and the first epoch finished in around 6 minutes.

If you're unable to get this speed it's likely an issue of your environment.

from moco.

lucasb-eyer avatar lucasb-eyer commented on August 28, 2024

The timing estimate is me manually looking at steps and a wall-clock, it is pretty reliable :)

In this case, I must be doing something wrong in the machine setup and will have to figure that out myself. Thanks a lot for all your information and attempts to help!

from moco.

mbsariyildiz avatar mbsariyildiz commented on August 28, 2024

Hello, thanks for this discussion.
@lucasb-eyer can you please tell me what the bottleneck was specifically? With a fresh conda environment and data on an SSD, I get similar timings with 4 V100s.

from moco.

mbsariyildiz avatar mbsariyildiz commented on August 28, 2024

Thanks! In my case, I have the data on the SSD of my local server. I guess I need to do profiling to see what the bottleneck is. Cheers!

from moco.

aravindsrinivas avatar aravindsrinivas commented on August 28, 2024

I was using cloud too but I put the data on the boot SSD disk of the VM instance. Still see the slow training. @lucasb-eyer could you share your epoch speeds?

from moco.

lucasb-eyer avatar lucasb-eyer commented on August 28, 2024

Just like @ppwwyyxx mentioned above in the thread: roughly 6min per epoch.

from moco.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.