Code Monkey home page Code Monkey logo

Comments (6)

tanwimallick avatar tanwimallick commented on July 26, 2024 3

It is better to define loss node in the graph in class DCRNNModel initialization. Then inside run_epoch_generator model.loss and model.mae can be used.

For a quick fix, I initialized the training and testing loss separately during the initialization of DCRNNSupervisor.

preds = self._train_model.outputs
labels = self._train_model.labels[..., :output_dim]

self.preds_test = self._test_model.outputs
self.labels_test = self._test_model.labels[..., :output_dim]

self._train_loss = self._loss_fn(preds=preds, labels=labels)
self._test_loss = self._loss_fn(preds=self.preds_test, labels=self.labels_test)

Inside run_epoch_generator:

if training:
             fetches = {
                 'loss': self._train_loss,
                 'mae': self._train_loss,
                 'global_step': tf.train.get_or_create_global_step()
             }
else:
            fetches = {
                 'loss': self._test_loss,
                 'mae': self._test_loss,
                 'global_step': tf.train.get_or_create_global_step()
            }

In the paper, how did you plot the learned localized filters centered at different nodes (Figure 7 in the paper)? Is that code available?

from dcrnn.

ivechan avatar ivechan commented on July 26, 2024 1

Is there any solution or suggestion? :)

from dcrnn.

ivechan avatar ivechan commented on July 26, 2024 1

It seems that the following codes will add nodes into computation graph per epoch.
Every epoch we create new nodes in graph so that the graph will be larger and larger.

labels = model.labels[..., :output_dim]
loss = self._loss_fn(preds=preds, labels=labels)

A possible solution is that creating loss node in graph in class DCRNNModel initialization instead of
in function run_epoch_generator.

from dcrnn.

liyaguang avatar liyaguang commented on July 26, 2024

Thanks for your kind information. I will investigate this issue. Besides, it is appreciated if you can provide more information, e.g., the error message, log, parameters, etc.

from dcrnn.

tanwimallick avatar tanwimallick commented on July 26, 2024

The error massage is:
2019-06-06 20:04:31.386792: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 43.75MiB. Current allocation summary follows.
2019-06-06 20:04:31.386936: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256): Total Chunks: 664, Chunks in use: 664. 166.0KiB allocated for chunks. 166.0KiB in use in bin. 8.9KiB client-requested in use in bin.

2019-06-06 20:04:31.396827: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at matmul_op.cc:478 : Resource exhausted: OOM when allocating tensor with shape[44800,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

I was trying to plot the memory consumption after each epoch. I got the following plot
OOM

The hyperparameter configuration was:

batch_size: 256, cl_decay_steps: 2000, filter_type: 'laplacian', horizon': 12, input_dim: 2, l1_decay': 0, max_diffusion_step: 1, num_nodes: 175, num_rnn_layers: 2, output_dim: 1, rnn_units: 64, seq_len: 12,
use_curriculum_learning: True, base_lr: 0.01, epochs: 62, epsilon: 0.001, global_step: 0, lr_decay_ratio: 0.05, max_grad_norm: 9, max_to_keep: 100, min_learning_rate: 2e-06, optimizer': adagrad, patience: 50, steps: [20, 30, 40, 50], test_every_n_epochs: 10

I got the error after 30 epochs.

from dcrnn.

parkitny avatar parkitny commented on July 26, 2024

Any further updates on when this fix will be added?

from dcrnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.