Comments (9)
I changed a docker image with CUDA11.3. The code can run normally. Previously I used the docker image with CUDA11.7. Sorry for bothering you.
from neo-360.
I found even I used 8 A100 card with your given parameters: chunk size 16*64, the error still happened.
from neo-360.
I am only able to check currently with 7 GPUs and the training runs fine, can you share your gpu utilization? Mine is shared below and it utilizes around 40GB memory per gpu. This is with using chunk size = 16 * 64
from neo-360.
Here is my training progression:
from neo-360.
I use watch -n1 nvidia-smi
to observe the gpu utilization. It reached 40G and then crashed.
Even I used chunk size 512 with 8A100 GPUs, OOM still happened.
Do you have any advice to reduce GPU memory?
from neo-360.
I changed a docker image with CUDA11.3. The code can run normally. Previously I used the docker image with CUDA11.7. Sorry for bothering you.
But I still wonder how to reduce GPU memory because I want to run it on other Cards like V100 (32GB)
from neo-360.
Just now I found that It utilized about 58G per GPU on 80G cards. It is so weird.
from neo-360.
Great to know that you have the code working on your end on A100 GPUs. To further reduce the memory, you can try the following:
-
We randomly sample 500 rays from 20 destination views for rendering the target pixels. You could try reducing either of these to reduce memory. Please note that 500 is already a very low number, so I would suggest playing with the other parameter i.e. num_destination views first.
-
Our data loader needs some refactoring. Currently, we load all annotations i.e. NOCS maps, instance maps. This might reduce some memory, but not that much.
-
One could of course reduce the
img_size
to train and fine-tune with a higher resolution. -
I tried to improve grid sampling which is probably the part that requires the most memory in a single forward pass, and we have some batchifying code commented here which was a WIP and never truly trested. Please feel free to also give this a try, but note that we have not benchmarked our numbers with this batchification.
All of the above, we have not tried on our end locally, so we haven't benchmarked the exact memory savings they would generate, but please feel free to give these a try and let us know how it goes. Hope it helps your research!
from neo-360.
Thanks a lot! I will try your advice.
from neo-360.
Related Issues (11)
- Code Release of Neo-360 HOT 6
- Can Neo360 synthesize novel view from frontal camera images HOT 1
- Segmentation classes HOT 4
- Intrinsics missing HOT 2
- Dataset License? HOT 1
- Axes convention HOT 3
- Cameras in ground or objects HOT 1
- How to handle multi view images ? HOT 1
- Why Tri-plane matters instead of 3D feature grid?
- When do you plan to release the code?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neo-360.