Comments (6)
It is better to define loss node in the graph in class DCRNNModel initialization. Then inside run_epoch_generator model.loss and model.mae can be used.
For a quick fix, I initialized the training and testing loss separately during the initialization of DCRNNSupervisor.
preds = self._train_model.outputs
labels = self._train_model.labels[..., :output_dim]
self.preds_test = self._test_model.outputs
self.labels_test = self._test_model.labels[..., :output_dim]
self._train_loss = self._loss_fn(preds=preds, labels=labels)
self._test_loss = self._loss_fn(preds=self.preds_test, labels=self.labels_test)
Inside run_epoch_generator:
if training:
fetches = {
'loss': self._train_loss,
'mae': self._train_loss,
'global_step': tf.train.get_or_create_global_step()
}
else:
fetches = {
'loss': self._test_loss,
'mae': self._test_loss,
'global_step': tf.train.get_or_create_global_step()
}
In the paper, how did you plot the learned localized filters centered at different nodes (Figure 7 in the paper)? Is that code available?
from dcrnn.
Is there any solution or suggestion? :)
from dcrnn.
It seems that the following codes will add nodes into computation graph per epoch.
Every epoch we create new nodes in graph so that the graph will be larger and larger.
labels = model.labels[..., :output_dim]
loss = self._loss_fn(preds=preds, labels=labels)
A possible solution is that creating loss node in graph in class DCRNNModel
initialization instead of
in function run_epoch_generator
.
from dcrnn.
Thanks for your kind information. I will investigate this issue. Besides, it is appreciated if you can provide more information, e.g., the error message, log, parameters, etc.
from dcrnn.
The error massage is:
2019-06-06 20:04:31.386792: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 43.75MiB. Current allocation summary follows.
2019-06-06 20:04:31.386936: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256): Total Chunks: 664, Chunks in use: 664. 166.0KiB allocated for chunks. 166.0KiB in use in bin. 8.9KiB client-requested in use in bin.
2019-06-06 20:04:31.396827: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at matmul_op.cc:478 : Resource exhausted: OOM when allocating tensor with shape[44800,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
I was trying to plot the memory consumption after each epoch. I got the following plot
The hyperparameter configuration was:
batch_size: 256, cl_decay_steps: 2000, filter_type: 'laplacian', horizon': 12, input_dim: 2, l1_decay': 0, max_diffusion_step: 1, num_nodes: 175, num_rnn_layers: 2, output_dim: 1, rnn_units: 64, seq_len: 12,
use_curriculum_learning: True, base_lr: 0.01, epochs: 62, epsilon: 0.001, global_step: 0, lr_decay_ratio: 0.05, max_grad_norm: 9, max_to_keep: 100, min_learning_rate: 2e-06, optimizer': adagrad, patience: 50, steps: [20, 30, 40, 50], test_every_n_epochs: 10
I got the error after 30 epochs.
from dcrnn.
Any further updates on when this fix will be added?
from dcrnn.
Related Issues (20)
- About the graph of the paper
- About the graph of the paper
- Why are some speed data negative?
- scaler transform in load_dataset function Causing speed data negative
- sensors correlations or node interactions and how to interpret the model's output
- Input 'b' of 'SparseTensorDenseMatMul' Op has type float32 that does not match type float64 of argument 'a_values'. HOT 2
- Wrong sensor IDs for MetrLA? HOT 8
- Predictions near mean value
- 关于数据的输入问题
- train.py
- Sensor id and data series
- Result Charts - One Example Sensor or Mean of the entire dataset
- Tensorflow 2 for DCRNN models HOT 2
- reproduce results HOT 13
- nothing HOT 1
- A question about the code HOT 1
- Diffusion convolution is not found in code
- A question about changing predicting time interval
- How to train model use different dataset
- isolated nodes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dcrnn.