Code Monkey home page Code Monkey logo

Comments (5)

dyelax avatar dyelax commented on September 24, 2024

Hey @orgicus – Thanks for the detailed info. Did you follow along with the usage instructions? (specifically step 3 about processing the data)

from adversarial_video_generation.

orgicus avatar orgicus commented on September 24, 2024

Hi Matt, Thank you so much for getting in touch and sorry to take your time with this.

It might be a case of RFTM on my side 😊
Thank you for pointing me in the right direction.

I've started this yesterday:

python process_data.py -t ../Data/Ms_Pacman/Train/ ../Data/.Clips/

Currently it's Processed 2799700 clips. I haven't passed --num-clips so now I'm eagerly awaiting for the 5000000 counter :))

from adversarial_video_generation.

orgicus avatar orgicus commented on September 24, 2024

Eventually training completed and I started the avg_runner.py script, but after a full night of number crunching my 2GB GPU ran out of RAM:

I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 56 Chunks of size 256 totalling 14.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 31 Chunks of size 512 totalling 15.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 25 Chunks of size 1024 totalling 25.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 15 Chunks of size 2048 totalling 30.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3072 totalling 3.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 4096 totalling 16.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 6912 totalling 6.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 11776 totalling 11.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 13824 totalling 54.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 7 Chunks of size 38400 totalling 262.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 55296 totalling 162.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 61440 totalling 60.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 75264 totalling 367.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 131072 totalling 256.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 192000 totalling 937.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 245248 totalling 239.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 322560 totalling 315.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 360448 totalling 352.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 376320 totalling 1.08MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 524288 totalling 1.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 589824 totalling 576.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 662272 totalling 646.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 19 Chunks of size 1179648 totalling 21.38MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1409024 totalling 1.34MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2686976 totalling 2.56MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3225600 totalling 3.08MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 9 Chunks of size 3276800 totalling 28.12MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3538944 totalling 3.38MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 4194304 totalling 4.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 8 Chunks of size 4718592 totalling 36.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 5111808 totalling 4.88MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 6553600 totalling 18.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 12320768 totalling 11.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 8 Chunks of size 13107200 totalling 100.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 19660800 totalling 18.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 29360128 totalling 28.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 429004800 totalling 409.13MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 827952128 totalling 789.60MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 1.45GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                  1587499008
InUse:                  1559270656
MaxInUse:               1586930688
NumAllocs:                 9986112
MaxAllocSize:           1260182528

W tensorflow/core/common_runtime/bfc_allocator.cc:274] ***************************************xxxxxxxx************************************xxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 262.50MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:975] Resource exhausted: OOM when allocating tensor with shape[8,256,210,160]
Traceback (most recent call last):
  File "avg_runner.py", line 186, in <module>
    main()
  File "avg_runner.py", line 182, in main
    runner.train()
  File "avg_runner.py", line 90, in train
    self.test()
  File "avg_runner.py", line 98, in test
    batch, self.global_step, num_rec_out=self.num_test_rec)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/Adversarial_Video_Generation/Code/g_model.py", line 389, in test_batch
    feed_dict=feed_dict)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/tf-venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/tf-venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/tf-venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/tf-venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[8,256,210,160]
	 [[Node: generator/scale_3/calculation/convolutions_1/Conv2D_2 = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](generator/scale_3/calculation/convolutions_1/Relu_1, generator/scale_3/setup/Variable_4/read)]]

Caused by op u'generator/scale_3/calculation/convolutions_1/Conv2D_2', defined at:
  File "avg_runner.py", line 186, in <module>
    main()
  File "avg_runner.py", line 178, in main
    runner = AVGRunner(num_steps, load_path, num_test_rec)
  File "avg_runner.py", line 50, in __init__
    c.SCALE_KERNEL_SIZES_G)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/Adversarial_Video_Generation/Code/g_model.py", line 48, in __init__
    self.define_graph()
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/Adversarial_Video_Generation/Code/g_model.py", line 179, in define_graph
    last_scale_pred_test)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/Adversarial_Video_Generation/Code/g_model.py", line 127, in calculate
    preds, ws[i], [1, 1, 1, 1], padding=c.PADDING_G)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/tf-venv/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
    data_format=data_format, name=name)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/tf-venv/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/tf-venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/George/Downloads/Grouped/Projects/Resonate2017/workshops/ml4a/workshop/tf-venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[8,256,210,160]
	 [[Node: generator/scale_3/calculation/convolutions_1/Conv2D_2 = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](generator/scale_3/calculation/convolutions_1/Relu_1, generator/scale_3/setup/Variable_4/read)]]

Is there a way to "resume" the process from just before it crashed ? :D

from adversarial_video_generation.

dyelax avatar dyelax commented on September 24, 2024

Hmm yeah, I trained this on 6GB GPUs, so you might need to change the batch size or some other hyperparams to get it to work on 2GB. You can load the last-saved version of your model by passing in its .ckpt file with the -l flag

from adversarial_video_generation.

orgicus avatar orgicus commented on September 24, 2024

Thank you very much for the explanations, worked like a charm! ❤️

from adversarial_video_generation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.