Comments (18)
Yes, the code makes learning them easy indeed. However, it looks to be ~75h on 4-GPUs :) Anyways, thanks for your answer!
from moco.
Indeed getting the same timings with the PyTorch ImageNet example, so the speed is not an issue with this repo, and I apologize for the distraction!
from moco.
Final update just to confirm that indeed it was a problem with my setup, related to data i/o as Kaiming suspected.
from moco.
Sure. I was using cloud, and had the data on a SSD disk in the same datacenter as my VM. But apparently, that's still slower than using the VM's boot-disk. So after copying the data onto the boot-disk's SSD, everything was as fast as it should be. As Kaiming pointed out, the reported timings did not reflect that reality.
from moco.
Hi, we have not planned to upload the weights of the linear classifiers or fine-tuned detectors, to keep the repo manageable. Those can be trained in a few hours given our provided pre-training weights.
from moco.
Yes, the code makes learning them easy indeed. However, it looks to be ~75h on 4-GPUs :) Anyways, thanks for your answer!
Training the linear classifier using the pre-trained weights only takes ~8 to 12 hours in 8 GPUs. I guess in 4 GPUs it won't be much slower as the bottleneck is just data loading.
from moco.
I switched to an 8xV100 machine and have the data on SSD and increased workers from the default 4 to 16, but given the logs below I don't think data is the bottleneck:
Estimate is still at 33h. Are you sure you get that speed with batch-size 256? I didn't want to increase batch-size compared to the README, as I don't know if it would deteriorate results. But GPUs do look pretty under-utilized
from moco.
Could you please try --workers 32
? We set that as the default in main_moco.py but missed it in main_lincls.py.
from moco.
Sure, will try that in a second! however given my screenshot above, I don't think data input is bottleneck?
from moco.
Sure, will try that in a second! however given my screenshot above, I don't think data input is bottleneck?
The interface was inherited from the official PyTorch code, and I think the "Data" time monitor is not precise. Using --workers 32
should help.
from moco.
While GPU utilization seems higher in nvidia-smi
(hitting 100% larger fraction of the time), the logs as well as my timing estimates remain the same as they were --workers 16
: it fluctuates between 20-30min per epoch, which would be 33h in the optimistic case. I read through the code but didn't see an obvious reason for this. I also tried both DataParallel variant and distributed variant, but they get about the same ballpark speed too. I would need 5min/ep to hit 8h total.
from moco.
While GPU utilization seems higher in
nvidia-smi
(hitting 100% larger fraction of the time), the logs as well as my timing estimates remain the same as they were--workers 16
: it fluctuates between 20-30min per epoch, which would be 33h in the optimistic case. I read through the code but didn't see an obvious reason for this. I also tried both DataParallel variant and distributed variant, but they get about the same ballpark speed too. I would need 5min/ep to hit 8h total.
The timer is an average of all iterations in one epoch, and the overhead of the first few iterations would dominate the average time. Perhaps you may estimate the time after running one epoch.
This code is just the official PyTorch ImageNet code. Full end-to-end training for 100 epochs on ImageNet should take <20 hours in 8 GPUs, and I believe there are plenty of benchmarking of this kind available on Internet. Training the linear classifier should be faster as it does not backprop thorough the weights (except for the fc).
from moco.
I just ran the exact command in https://github.com/facebookresearch/moco/#linear-classification and -j 32
with 8 V100s and the first epoch finished in around 6 minutes.
If you're unable to get this speed it's likely an issue of your environment.
from moco.
The timing estimate is me manually looking at steps and a wall-clock, it is pretty reliable :)
In this case, I must be doing something wrong in the machine setup and will have to figure that out myself. Thanks a lot for all your information and attempts to help!
from moco.
Hello, thanks for this discussion.
@lucasb-eyer can you please tell me what the bottleneck was specifically? With a fresh conda environment and data on an SSD, I get similar timings with 4 V100s.
from moco.
Thanks! In my case, I have the data on the SSD of my local server. I guess I need to do profiling to see what the bottleneck is. Cheers!
from moco.
I was using cloud too but I put the data on the boot SSD disk of the VM instance. Still see the slow training. @lucasb-eyer could you share your epoch speeds?
from moco.
Just like @ppwwyyxx mentioned above in the thread: roughly 6min per epoch.
from moco.
Related Issues (20)
- Question about transfering to COCO with Mocov1 and Mocov2 checkpoint
- Issue about dequeue_and_enqueue HOT 3
- Question about the queue for key encoder HOT 2
- Why labels are all zeros, should first columns of labels be ones? HOT 4
- Issue with batch size HOT 1
- Low Accuracy
- One question about single GPU HOT 2
- How to load the Hyperparameters without command line code Argument Parser?
- About training HOT 2
- what information is leaked due to intra-batch communication? HOT 2
- What is the label format of the cifar-10 dataset? HOT 1
- Concerns about feature dimensionality in MoCo self-training
- Can you tell me dataset structure and how images are named in the dataset HOT 1
- why pretrain from encoder_q? HOT 1
- Question about queue dimension
- How is BN in key-encoder updated (in Moco v1)? HOT 1
- Why is labels = zeros(N) set to zero? HOT 3
- The size of the dictionary HOT 2
- About License
- About using the model dict HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moco.