Code Monkey home page Code Monkey logo

Comments (16)

phillipi avatar phillipi commented on May 22, 2024

I would check to make sure the gpu is really not being used at all. You can use nvidia-smi to check how much gpu memory is being used. I haven't tested CPU mode carefully so there's a chance some memory is still allocated on the gpu.

If that's not the issue, then it's a bit surprising you would run out of memory. Can you profile how much memory your system is using for the process?

General strategies for reducing memory usage would be to reduce the batch size (try setting it to 1), and reduce the image size (will require modifying the net architectures, e.g., by removing the first and last layers of netG and the first layer of netD).

from pix2pix.

rdaniel avatar rdaniel commented on May 22, 2024

I turned off the display and the code ran fine. On the CPU it is pretty slow, of course, so I'm going to run it on an AWS GPU instance now.

from pix2pix.

b4zz4 avatar b4zz4 commented on May 22, 2024

I have 1.4GB free as I can limit the memory that use the gpu?

...
Epoch: [1][     399 /      400]	 Time: 3.719  DataTime: 0.001    Err_G: 3.4267  Err_D: 0.0299  ErrL1: 0.3935	
End of epoch 1 / 200 	 Time Taken: 938.975	
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-4014/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
...

from pix2pix.

khaulahzia avatar khaulahzia commented on May 22, 2024

How to turn off the display?

from pix2pix.

khaulahzia avatar khaulahzia commented on May 22, 2024

How to turn off the display?

from pix2pix.

phillipi avatar phillipi commented on May 22, 2024

On the command line, you can pass display=0

from pix2pix.

khaulahzia avatar khaulahzia commented on May 22, 2024

i am facing the following error when running training command:
transferring to gpu...
done
THCudaCheck FAIL file=/home/admink/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/home/admink/torch/install/bin/luajit: /home/admink/torch/install/share/lua/5.1/nn/Module.lua:309: cuda runtime error (2) : out of memory at /home/admink/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
[C]: in function 'Tensor'
/home/admink/torch/install/share/lua/5.1/nn/Module.lua:309: in function 'flatten'
/home/admink/torch/install/share/lua/5.1/nn/Module.lua:326: in function 'getParameters'
train.lua:445: in main chunk
[C]: in function 'dofile'
...mink/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
How could this issue be resolved?

from pix2pix.

khaulahzia avatar khaulahzia commented on May 22, 2024

because i still get the same error again and again even if i switched off the display.
i could train fine with just 3 images of size 600600 and maximum of 10 images of size 128128.
More than that i fail to train the model with the error i mentioned above. How could i train the model on more images at least 500?

from pix2pix.

phillipi avatar phillipi commented on May 22, 2024

Some ideas are here: #107

Other quick fixes:

  1. Run in CPU mode
  2. Run on a cloud service that has more GPU memory, e.g., Amazon EC2
  3. Make models smaller (e.g., set ngf and ndf to be 32 rather than 64)

Really it should be possible to avoid running out of memory as you scale the dataset size. The memory should be constant for any sized dataset as images are only loaded on demand. There may be a leak or some inefficiency that causes more memory to be used for larger datasets. I don't have a quick fix, but I would look at the behavior at the end of an epoch, where temporary variables are cleared and reallocated, since this is where you are getting the memory error.

from pix2pix.

khaulahzia avatar khaulahzia commented on May 22, 2024

from pix2pix.

phillipi avatar phillipi commented on May 22, 2024

It shouldn't.

from pix2pix.

khaulahzia avatar khaulahzia commented on May 22, 2024

from pix2pix.

phillipi avatar phillipi commented on May 22, 2024

The GPU is much faster :) Functionality should be the same, although I haven't thoroughly tested CPU mode myself (seems like others have had it work).

from pix2pix.

khaulahzia avatar khaulahzia commented on May 22, 2024

from pix2pix.

phillipi avatar phillipi commented on May 22, 2024

haha yeah it's an unfortunate tradeoff

from pix2pix.

khaulahzia avatar khaulahzia commented on May 22, 2024

from pix2pix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.