Code Monkey home page Code Monkey logo

Comments (9)

thucz avatar thucz commented on June 12, 2024 1

I changed a docker image with CUDA11.3. The code can run normally. Previously I used the docker image with CUDA11.7. Sorry for bothering you.

from neo-360.

thucz avatar thucz commented on June 12, 2024

I found even I used 8 A100 card with your given parameters: chunk size 16*64, the error still happened.

from neo-360.

zubair-irshad avatar zubair-irshad commented on June 12, 2024

I am only able to check currently with 7 GPUs and the training runs fine, can you share your gpu utilization? Mine is shared below and it utilizes around 40GB memory per gpu. This is with using chunk size = 16 * 64

image

from neo-360.

zubair-irshad avatar zubair-irshad commented on June 12, 2024

Here is my training progression:

image

from neo-360.

thucz avatar thucz commented on June 12, 2024
image

I use watch -n1 nvidia-smi to observe the gpu utilization. It reached 40G and then crashed.

Even I used chunk size 512 with 8A100 GPUs, OOM still happened.

Do you have any advice to reduce GPU memory?

from neo-360.

thucz avatar thucz commented on June 12, 2024

I changed a docker image with CUDA11.3. The code can run normally. Previously I used the docker image with CUDA11.7. Sorry for bothering you.

But I still wonder how to reduce GPU memory because I want to run it on other Cards like V100 (32GB)

from neo-360.

thucz avatar thucz commented on June 12, 2024

Just now I found that It utilized about 58G per GPU on 80G cards. It is so weird.

image

from neo-360.

zubair-irshad avatar zubair-irshad commented on June 12, 2024

Great to know that you have the code working on your end on A100 GPUs. To further reduce the memory, you can try the following:

  1. We randomly sample 500 rays from 20 destination views for rendering the target pixels. You could try reducing either of these to reduce memory. Please note that 500 is already a very low number, so I would suggest playing with the other parameter i.e. num_destination views first.

  2. Our data loader needs some refactoring. Currently, we load all annotations i.e. NOCS maps, instance maps. This might reduce some memory, but not that much.

  3. One could of course reduce the img_size to train and fine-tune with a higher resolution.

  4. I tried to improve grid sampling which is probably the part that requires the most memory in a single forward pass, and we have some batchifying code commented here which was a WIP and never truly trested. Please feel free to also give this a try, but note that we have not benchmarked our numbers with this batchification.

All of the above, we have not tried on our end locally, so we haven't benchmarked the exact memory savings they would generate, but please feel free to give these a try and let us know how it goes. Hope it helps your research!

from neo-360.

thucz avatar thucz commented on June 12, 2024

Thanks a lot! I will try your advice.

from neo-360.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.