OS: Ubuntu16.04 _x64 , 8 core * 16G RAM CMD: python train.py --gpu=0 ......<b

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

It said that I need to have more RAM？tks about paperrobot HOT 12 CLOSED

eaglew commented on July 21, 2024

It said that I need to have more RAM？tks

from paperrobot.

Comments (12)

EagleW commented on July 21, 2024

Thank you very much for your interest in our research. Perhaps you can try a smaller batch size by typing python train.py --batch_size 50. If that doesn't work you can decrease it again.

from paperrobot.

GabrielLin commented on July 21, 2024

@EagleW Could you please tell us what GPU do you use? Thanks.

from paperrobot.

EagleW commented on July 21, 2024

I use P100

from paperrobot.

GabrielLin commented on July 21, 2024

I use P100

Thanks.

from paperrobot.

davidrpugh commented on July 21, 2024

I am trying to replicate your work and am getting OOM errors on a P100 GPU with 16 GB when using the default batch size of 200. How much memory did your P100 GPU have?

from paperrobot.

EagleW commented on July 21, 2024

@davidrpugh Which part of the code are you running?

from paperrobot.

EagleW commented on July 21, 2024

I think you can decrease the batch size to 50

from paperrobot.

davidrpugh commented on July 21, 2024

Specifically I ran the ./Existing paper reading/train.py script with default parameters using one 16 GB P100 GPU and got an OOM error from the GPU. I can decrease the batch size but I wanted to start with the default params used in your paper. Was 200 the batch size used in the paper? Or something smaller?

from paperrobot.

EagleW commented on July 21, 2024

I think I use a much smaller batch size for it. Actually, the batch size didn't influence the model performance much

from paperrobot.

davidrpugh commented on July 21, 2024

I have tried batch sizes of 128, 64, 32, and 16. With --batch_size=16 I was able to train for one full epoch before getting GPU OOM error. I am going to try again with 8 but I feel that there must be something else I am missing.

from paperrobot.

EagleW commented on July 21, 2024

@davidrpugh I think you can add torch.cuda.empty_cache() after the line 263 in main.py

from paperrobot.

EagleW commented on July 21, 2024

The problem is that the get_subgraph will return a very large adj_ingraph matrix. If the batch size is too large, the adjacent matrix will be too large to fit in the memory

from paperrobot.

Recommend Projects

It said that I need to have more RAM？tks about paperrobot HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent