Code Monkey home page Code Monkey logo

Comments (12)

EagleW avatar EagleW commented on July 21, 2024

Thank you very much for your interest in our research. Perhaps you can try a smaller batch size by typing python train.py --batch_size 50. If that doesn't work you can decrease it again.

from paperrobot.

GabrielLin avatar GabrielLin commented on July 21, 2024

@EagleW Could you please tell us what GPU do you use? Thanks.

from paperrobot.

EagleW avatar EagleW commented on July 21, 2024

I use P100

from paperrobot.

GabrielLin avatar GabrielLin commented on July 21, 2024

I use P100

Thanks.

from paperrobot.

davidrpugh avatar davidrpugh commented on July 21, 2024

I am trying to replicate your work and am getting OOM errors on a P100 GPU with 16 GB when using the default batch size of 200. How much memory did your P100 GPU have?

from paperrobot.

EagleW avatar EagleW commented on July 21, 2024

@davidrpugh Which part of the code are you running?

from paperrobot.

EagleW avatar EagleW commented on July 21, 2024

I think you can decrease the batch size to 50

from paperrobot.

davidrpugh avatar davidrpugh commented on July 21, 2024

Specifically I ran the ./Existing paper reading/train.py script with default parameters using one 16 GB P100 GPU and got an OOM error from the GPU. I can decrease the batch size but I wanted to start with the default params used in your paper. Was 200 the batch size used in the paper? Or something smaller?

from paperrobot.

EagleW avatar EagleW commented on July 21, 2024

I think I use a much smaller batch size for it. Actually, the batch size didn't influence the model performance much

from paperrobot.

davidrpugh avatar davidrpugh commented on July 21, 2024

I have tried batch sizes of 128, 64, 32, and 16. With --batch_size=16 I was able to train for one full epoch before getting GPU OOM error. I am going to try again with 8 but I feel that there must be something else I am missing.

from paperrobot.

EagleW avatar EagleW commented on July 21, 2024

@davidrpugh I think you can add torch.cuda.empty_cache() after the line 263 in main.py

from paperrobot.

EagleW avatar EagleW commented on July 21, 2024

The problem is that the get_subgraph will return a very large adj_ingraph matrix. If the batch size is too large, the adjacent matrix will be too large to fit in the memory

from paperrobot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.