liyaguang / dcrnn Goto Github PK

Implementation of Diffusion Convolutional Recurrent Neural Network in Tensorflow

License: MIT License

Python 100.00%

traffic-data deep-learning-graphs time-series iclr2018 spatiotemporal-forecasting

dcrnn's Introduction

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

This is a TensorFlow implementation of Diffusion Convolutional Recurrent Neural Network in the following paper:
Yaguang Li, Rose Yu, Cyrus Shahabi, Yan Liu, Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting, ICLR 2018.

Requirements

scipy>=0.19.0
numpy>=1.12.1
pandas>=0.19.2
pyaml
statsmodels
tensorflow>=1.3.0

Dependency can be installed using the following command:

pip install -r requirements.txt

Data Preparation

The traffic data files for Los Angeles (METR-LA) and the Bay Area (PEMS-BAY), i.e., metr-la.h5 and pems-bay.h5, are available at Google Drive or Baidu Yun, and should be put into the data/ folder. The *.h5 files store the data in panads.DataFrame using the HDF5 file format. Here is an example:

	sensor_0	sensor_1	sensor_2	sensor_n
2018/01/01 00:00:00	60.0	65.0	70.0	...
2018/01/01 00:05:00	61.0	64.0	65.0	...
2018/01/01 00:10:00	63.0	65.0	60.0	...
...	...	...	...	...

Here is an article about Using HDF5 with Python.

Run the following commands to generate train/test/val dataset at data/{METR-LA,PEMS-BAY}/{train,val,test}.npz.

# Create data directories
mkdir -p data/{METR-LA,PEMS-BAY}

# METR-LA
python -m scripts.generate_training_data --output_dir=data/METR-LA --traffic_df_filename=data/metr-la.h5

# PEMS-BAY
python -m scripts.generate_training_data --output_dir=data/PEMS-BAY --traffic_df_filename=data/pems-bay.h5

Graph Construction

As the currently implementation is based on pre-calculated road network distances between sensors, it currently only supports sensor ids in Los Angeles (see data/sensor_graph/sensor_info_201206.csv).

python -m scripts.gen_adj_mx  --sensor_ids_filename=data/sensor_graph/graph_sensor_ids.txt --normalized_k=0.1\
    --output_pkl_filename=data/sensor_graph/adj_mx.pkl

Besides, the locations of sensors in Los Angeles, i.e., METR-LA, are available at data/sensor_graph/graph_sensor_locations.csv, and the locations of sensors in PEMS-BAY are available at data/sensor_graph/graph_sensor_locations_bay.csv.

Run the Pre-trained Model on METR-LA

# METR-LA
python run_demo.py --config_filename=data/model/pretrained/METR-LA/config.yaml

# PEMS-BAY
python run_demo.py --config_filename=data/model/pretrained/PEMS-BAY/config.yaml

The generated prediction of DCRNN is in data/results/dcrnn_predictions.

Model Training

Here are commands for training the model on METR-LA and PEMS-BAY respectively.

# METR-LA
python dcrnn_train.py --config_filename=data/model/dcrnn_la.yaml

# PEMS-BAY
python dcrnn_train.py --config_filename=data/model/dcrnn_bay.yaml

Training details and tensorboard links

With a single GTX 1080 Ti, each epoch takes around 5min for METR-LA, and 13 min for PEMS-BAY respectively. Here are example tensorboard links for DCRNN on METR-LA, DCRNN on PEMS-BAY, including training details and metrics over time.

Note that, there is a chance of training loss explosion, one temporary workaround is to restart from the last saved model before the explosion, or to decrease the learning rate earlier in the learning rate schedule.

Metric for different horizons and datasets

The following table summarizes the performance of DCRNN on two dataset with regards to different metrics and horizons (numbers are better than those reported in the paper due to bug fix in commit 2e4b8c8 on Oct 1, 2018).

Dataset	Metric	5min	15min	30min	60min
METR-LA	MAE	2.18	2.67	3.08	3.56
	MAPE	5.17%	6.84%	8.38%	10.30%
	RMSE	3.77	5.17	6.3	7.52
PEMS-BAY	MAE	0.85	1.31	1.66	1.98
	MAPE	1.63%	2.74%	3.76%	4.74%
	RMSE	1.54	2.76	3.78	4.62

Eval baseline methods

# METR-LA
python -m scripts.eval_baseline_methods --traffic_reading_filename=data/metr-la.h5

More details are being added ...

Deploying DCRNN on Large Graphs with graph partitioning

With graph partitioning, DCRNN has been successfully deployed to forecast the traffic of the entire California highway network with 11,160 traffic sensor locations simultaneously. The general idea is to partition the large highway network into a number of small networks, and trained them with a share-weight DCRNN simultaneously. The training process takes around 3 hours in a moderately sized GPU cluster, and the real-time inference can be run on traditional hardware such as CPUs.

See the paper, slides, and video by Tanwi Mallick et al. from Argonne National Laboratory for more information.

DCRNN Applications

In addition to vehicle traffic forecasting, DCRNN and its variants have been applied in many important domains, including:

Neuroimaging: causal inference in brain networks. S. Wein et al. A graph neural network framework for causal inference in brain networks. Scientific Reports, 2021, GitHub Repo.
Air quality forecasting: Y Lin et al. Exploiting spatiotemporal patterns for accurate air quality forecasting using deep learning. ACM SIGSPATIAL 2018.
Internet traffic forecasting: D. Andreoletti et al. Network traffic prediction based on diffusion convolutional recurrent neural networks, INFOCOM 2019.

Third-party re-implementations

The Pytorch implementaion by chnsh@ is available at DCRNN-Pytorch.

Citation

If you find this repository, e.g., the code and the datasets, useful in your research, please cite the following paper:

@inproceedings{li2018dcrnn_traffic,
  title={Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting},
  author={Li, Yaguang and Yu, Rose and Shahabi, Cyrus and Liu, Yan},
  booktitle={International Conference on Learning Representations (ICLR '18)},
  year={2018}
}

dcrnn's People

Contributors

Stargazers

Watchers

Forkers

kastnerkyle shubhampachori12110095 sxjscience songfgh leilin-research eratostennis xiaoshuai09 himanshukgp pachchi mozartc thlvia bkj spbohai jerry185 lichunown hyunjunju kuonanhong ucasqcz zhangjiekui ai3dvision lvjiancheng melvinmaonn bran170294 sucrerouge wh228 falorse pingcsu akshu281 aung2phyowai jcchao alphacyc pepsalehi chaojidaxingxin liukangling tungk sophieqqq wsgan001 zhulei0717 jjaugust ricardozitseng intelligenttrafficforecasting cetinhasari lesliegaga wh-forker embarassed caizy1709 yinhei-chan xtxiatong raihan2108 lin-jj sean0719 victorsoda angeliafang feng37 wdzhong liheng kangzf zhhhzhang batermj manganganath sth4k huikunbi gegetang ellenzyq hisham32 yangyingxiang vishalseshagiri kschmit wwwonekey rmcarthur atalbayrak appleparan jdc08161063 xianglongtan kateduffy darkgt zhaoshengjian francoispgm victorchan314 wangjianlongnba youtang1993 wonderly321 xyuyan okevinok cnugis817 zhoushaojun robinwenqian hongpeng1992 shawnwan47 inesarous zyzhao66 amirunpri2018 qss2012 2673323862 sushil-khyalia mbonto amber0111 fanhongweifd ling-cai fangego

dcrnn's Issues

small Confusion with number of layers of DCRNN/ Cells

https://github.com/liyaguang/DCRNN/blob/master/model/dcrnn_model.py#L48

Multiplying a 'cell' object will lead to num_rnn_layer cells but they hava the same weight.
Is this the expected behavior?

with open(args.config_filename) as f: TypeError: invalid file: None

Hi Liyaguang,
I encountered this error when I ran the dcrnn_train.py file after generating train, val, test.npz.

 File "D: /DCRNN-master/dcrnn_train.py", line 14, in main
    with open(args.config_filename) as f:

 TypeError: invalid file: None

Could you please help me fix the error.
Thank you very much for your help and looking forward to your reply.
Have a nice day!

I'm so sorry... I forgot to add the default filename to main.

Regards,
Wendong

How could I get PEMS-BAY dataset?

how to deal with the missing data?

there are so many missing data and nil data in the METR.h5, so I'm wondering how you deal with it when you are producing your result?

I will be deeply grateful to you!

Reproduce FC-LSTM results

Hi, liyaguang, I am working on a paper which needs some details about FC-LSTM on METR-LA dataset, including the computing time, prediction results and so on. It would be very helpful for me if you can provide the codes running FC-LSTM experiments. Thanks.

By the way, I had cited your paper. :)

PS: It would be more helpful if you can provide code of ARIMA, VAR, SVR, FNN etc.

Got an error install on MacOS

Just wanted to let you know I got this error due to tables not being installed, while generating training data.

Generating training data
Traceback (most recent call last):
  File "/Users/rex/anaconda3/envs/traffic/lib/python3.7/site-packages/pandas/io/pytables.py", line 466, in __init__
    import tables  # noqa
ModuleNotFoundError: No module named 'tables'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rex/anaconda3/envs/traffic/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/rex/anaconda3/envs/traffic/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/rex/Work/DCRNN/scripts/generate_training_data.py", line 123, in <module>
    main(args)
  File "/Users/rex/Work/DCRNN/scripts/generate_training_data.py", line 108, in main
    generate_train_val_test(args)
  File "/Users/rex/Work/DCRNN/scripts/generate_training_data.py", line 57, in generate_train_val_test
    df = pd.read_hdf(args.traffic_df_filename)
  File "/Users/rex/anaconda3/envs/traffic/lib/python3.7/site-packages/pandas/io/pytables.py", line 368, in read_hdf
    store = HDFStore(path_or_buf, mode=mode, **kwargs)
  File "/Users/rex/anaconda3/envs/traffic/lib/python3.7/site-packages/pandas/io/pytables.py", line 469, in __init__
    'importing'.format(ex=ex))
ImportError: HDFStore requires PyTables, "No module named 'tables'" problem importing

It should be added to requirements.txt, but it doesn't install correctly right off the bat. I needed an HDF5 install as well, which is brew installable. Inside a venv, I ended up doing the following:

brew install hdf5
source activate my_venv
pip install tables

But that gave me terrible errors as well, that I can't get around.

$ pip install tables
Collecting tables
    /var/folders/wp/n612s6296jzf_yspjjs1ys1m0000gn/T/H5close6qg7st18.c:1:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
    main (int argc, char **argv) {
    ^
    /var/folders/wp/n612s6296jzf_yspjjs1ys1m0000gn/T/H5close6qg7st18.c:2:5: warning: implicit declaration of function 'H5close' is invalid in C99 [-Wimplicit-function-declaration]
        H5close();
        ^
    2 warnings generated.
    * Using Python 3.7.1 | packaged by conda-forge | (default, Nov 13 2018, 10:30:07)
    * USE_PKGCONFIG: False
    * Found conda env: ``/Users/rex/anaconda3/envs/traffic``
    .. ERROR:: Could not find a local HDF5 installation.
       You may need to explicitly state where your local HDF5 headers and
       library can be found by setting the ``HDF5_DIR`` environment
       variable or by using the ``--hdf5`` command-line option.
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/wp/n612s6296jzf_yspjjs1ys1m0000gn/T/pip-install-273jy0so/tables/

I have hdf5 installed, and I use it locally all the time. I tried to pass it as described there, but to no avail.

It does work outside of my venv for now, so I'll do that. But if anyone gets this running inside a condo venv on MacOs, please let me know.

a question on dcrnn_cell

Hello, liyaguang! I just learned tensorflow recently. therefore i have insufficient knowledge. I want to ask that how the 'def__call(self,inputs,state,...)' function is called in dcrnn_cell.py, in other words, what are the values of inputs and state?
Hope your answer. thx

Training: maximizing the likelihood

Hi,

Thanks for your paper! I've got a question concerning the training process. You wrote that 'the entire network is trained by maximizing the likelihood of generating the target future time series using backpropagation through time'. Where can I find the likelihood part in the code? I can only see that gradients are calculated using tf.gradients and they're being applied using the Adam optimizer.

Thank you very much!

Original dataset

Hello, could you please tell me from where you got the two datasets? I would like to download the raw data (each reading) instead of an 5 minute average. Thank you in advance!

Memory leak

def run_epoch_generator(self, sess, model, data_generator, return_output=False, training=False, writer=None):
output_dim = self._model_kwargs.get('output_dim')
preds = model.outputs
labels = model.labels[..., :output_dim]
loss = self._loss_fn(preds=preds, labels=labels)

This part of the code has a memory leak. Getting OOM error after several epochs.

Fail to reproduce best performance

Hello, Li.
Though I run 'dcrnn-train.py' with the parameter setup as you mentioned in the paper,
I failed to reproduce the best performance.
Could you please explain my mistakes or detailed options?

2019-02-22 16:58:40,796 - INFO - Log directory: data/model
2019-02-22 16:58:40,797 - INFO - {'data': {'val_batch_size': 64, 'test_batch_size': 64, 'batch_size': 64, 'graph_pkl_filename': 'data/sensor_graph/dcrnn/adj_mx.pkl', 'dataset_dir': 'data/METR-LA'}, 'model': {'cl_decay_steps': 2000, 'input_dim': 2, 'l1_decay': 0, 'num_rnn_layers': 2, 'num_nodes': 207, 'filter_type': 'dual_random_walk', 'horizon': 12, 'use_curriculum_learning': True, 'seq_len': 12, 'rnn_units': 64, 'output_dim': 1, 'max_diffusion_step': 3}, 'train': {'optimizer': 'adam', 'epsilon': 0.001, 'dropout': 0, 'model_filename': None, 'epochs': 100, 'patience': 50, 'base_lr': 0.01, 'max_grad_norm': 5, 'min_learning_rate': 2e-06, 'global_step': 0, 'max_to_keep': 100, 'lr_decay_ratio': 0.1, 'epoch': 0, 'test_every_n_epochs': 10, 'steps': [20, 30, 40, 50], 'log_dir': 'data/model'}, 'log_level': 'INFO', 'base_dir': 'data/model'}
2019-02-22 16:58:49,720 - INFO - ('x_val', (3425, 12, 207, 2))
2019-02-22 16:58:49,720 - INFO - ('x_train', (23974, 12, 207, 2))
2019-02-22 16:58:49,720 - INFO - ('x_test', (6850, 12, 207, 2))
2019-02-22 16:58:49,720 - INFO - ('y_val', (3425, 12, 207, 2))
2019-02-22 16:58:49,720 - INFO - ('y_train', (23974, 12, 207, 2))
2019-02-22 16:58:49,720 - INFO - ('y_test', (6850, 12, 207, 2))
2019-02-22 16:59:06,917 - INFO - Total number of trainable parameters: 520960
2019-02-22 16:59:09,019 - INFO - Start training ...
...
2019-02-23 04:12:31,358 - INFO - Epoch [89/100] (0) train_mae: 9.8364, val_mae: 12.8458 lr:0.000002 431.2s
2019-02-23 04:13:29,147 - INFO - Horizon 01, MAE: 13.55, MAPE: 0.3397, RMSE: 16.15
2019-02-23 04:13:29,213 - INFO - Horizon 02, MAE: 12.81, MAPE: 0.3336, RMSE: 15.54
2019-02-23 04:13:29,277 - INFO - Horizon 03, MAE: 12.34, MAPE: 0.3307, RMSE: 15.23
2019-02-23 04:13:29,340 - INFO - Horizon 04, MAE: 12.15, MAPE: 0.3311, RMSE: 15.21
2019-02-23 04:13:29,405 - INFO - Horizon 05, MAE: 12.20, MAPE: 0.3341, RMSE: 15.41
2019-02-23 04:13:29,467 - INFO - Horizon 06, MAE: 12.41, MAPE: 0.3385, RMSE: 15.74
2019-02-23 04:13:29,529 - INFO - Horizon 07, MAE: 12.70, MAPE: 0.3432, RMSE: 16.11
2019-02-23 04:13:29,591 - INFO - Horizon 08, MAE: 13.00, MAPE: 0.3476, RMSE: 16.47
2019-02-23 04:13:29,652 - INFO - Horizon 09, MAE: 13.28, MAPE: 0.3512, RMSE: 16.78
2019-02-23 04:13:29,714 - INFO - Horizon 10, MAE: 13.53, MAPE: 0.3540, RMSE: 17.06
2019-02-23 04:13:29,775 - INFO - Horizon 11, MAE: 13.75, MAPE: 0.3562, RMSE: 17.30
2019-02-23 04:13:29,837 - INFO - Horizon 12, MAE: 13.95, MAPE: 0.3582, RMSE: 17.54
2019-02-23 04:20:40,879 - INFO - Epoch [90/100] (0) train_mae: 9.9064, val_mae: 10.6131 lr:0.000002 431.0s
2019-02-23 04:20:40,879 - WARNING - Early stopping at epoch: 90

Code for Visualization

Thanks for your paper and code. How do you make a 3D map like Figure 9? Can you share your code for visualization?

How to calculate the road network distance?

Hi, I have read your paper and really appreciate this algorithm, however, I still don't understand how did you calculate the road network distance between two vertex.

I am aware that the road network distance is not spatial distance, but I do not find any further detail in your paper.

Could you please explain how did you calculate the road network distance?

What is the meaning of the last dimension of the input (Batch_size, Time granularity, Node number,2) of the network?

Hi, where can I get the introduction of the METR-LA and the PEMS-Bay dataset? What is the meaning of the last dimension of the input (Batch_size, Time granularity, Node number,2) of the network?

Small graphs?

Hi -- It seems like you process the entire graph at each minibatch, which will long the size of the graphs you can apply these methods to. Have you ever worked on methods for operating on subsets of the graph rather than the whole thing?

EDIT: Since you do a graph convolution at each step of the GRU, the receptive field for each node is all of it's 12 hop neighbors. The diameter of the graph is only 13, so making a prediction for a single node requires touching the entire graph. Does that sound right?

run_demo.py not working

Got the following error

sensor_ids, sensor_id_to_ind, adj_mx = pickle.load(f)
TypeError: a bytes-like object is required, not 'str'

A new question about data download

Hi Yaguang, I read through your answer in Issue#10, in which you said the url is linked to the raw data. I looked into the data and they are of the 207 stations you selected.

If I want to select the detectors and download data that I want, do you know where is the original raw data of METR-LA are? Does this require any form of authorization for the dataset, or is it an open one?

Thx.

A solution for pickle.load()

I use anaconda3 as my base python environment. However, the error that _pickle.UnpicklingError: the STRING opcode argument must be quoted always came out whenever I tried with python3 + open('adj_mx.pkl', 'rb') or python3 + open('adj_mx.pkl', 'r') in the interative shell.
Then I tried with python2 + open('adj_mx.pkl', 'rb'), and it came out with the error raise ValueError, "insecure string pickle".
Finally, it worked with python2 + open('adj_mx.pkl', 'r'). However, it is inconsistent with code gen_adj_mx.py, and utils.load_pickle. It confused me for quite a few days.

Solution

Python 2.7.15
>>> import numpy as np
>>> import pickle
>>> f = open('adj_mx.pkl', 'r')
>>> data  = pickle.load(f)
>>> with open('adj_mx_new.pkl', 'wb') as f:
>>>    pickle.dump(data, f, protocol=2)

You can replace the newly generated adj_mx_new.pkl with adj_mx.pkl. Then, it works python run_demo.py --config_filename=data/model/pretrained/METR-LA/config.yaml both under Python2 and Python3.

Question on Baseline Models

In the baseline models that you implemented(specifically ARIMA, SVR and HA); were they trained solely on the temporal data or the entire spatio-temporal data?

Run the Pre-trained Model (Error)

hi, I found this error when am trying to run the code

did I miss something?

Missing distances file PEMS-BAY

Hello,

Is there any way I can get the equivalent of "distances_la_2012.csv" but for PEMS-BAY ?

Best,

hello~

How to draw truth and prediction value curves?

Hi liyaguang,
I looked at the evaluate function in dcrnn_supervisor.py, where the real and predicted values are regularized. Can I print the image before line 271?

y_truth = self._data['y_test'][:, :, :, 0] y_pred = y_preds[:y_truth.shape[0], :, :, 0] pyplot.title('test') pyplot.xlabel('Time range (min)') pyplot.ylabel(' Speed (km/h)') pyplot.plot(y_truth[:],label='true') pyplot.plot(y_pred[:],'r--',label='Prediction-DCRNN') pyplot.legend() pyplot.savefig('Figure-test.png', dpi=300) pyplot.show()
best wishes!

trouble about the adj_mx

why the distance not equal during vertex 1 to 2 and vertex 2 to 1?

resource

how to review the map in the Browser by use of Baidu map or google map?

error Run the Pre-trained Model on my data

after generate train/test for my data which simply contains two columns ( Date, price). I got this information about the training and testing data:

But after running Run the Pre-trained Model, I got the following exception:

ValueError: Cannot feed value of shape (64, 12, 1, 2) for Tensor 'Test/DCRNN/inputs:0', which has shape '(64, 12, 207, 2)'

Is there anything to do to solve this?
Thanks

sensor index file for PEMS-bay is missing.

Why the distances between two sensors in opposite direction are different? such as the dis from 1 to 2 and from 2 to 1 are different.

How could I get METR-LA dataset?

difference between diffusion convolution and MCL algorithm.

hello~
as I know, MCL is a clustering algorithm for the directed graph. And its form is very similar to the diffusion convolution except that MCL has a parameter r to control the scale while diffusion convolution uses the theta to smooth. so what is the difference between them.MCL refers to this article, which is easy to understand.
looking forward to your reply~

maybe bug found in `DCRNN_cell.py`

In function _gconv of model/DCRNN_cell.py, from line 151-168

x = inputs_and_state
x0 = tf.transpose(x, perm=[1, 2, 0]) 
x0 = tf.reshape(x0, shape=[self._num_nodes, input_size * batch_size])
x = tf.expand_dims(x0, axis=0)

scope = tf.get_variable_scope()
with tf.variable_scope(scope):
    if self._max_diffusion_step == 0:
        pass
    else:
        for support in self._supports:
            x1 = tf.sparse_tensor_dense_matmul(support, x0) # for self._supports[1], x0 is no more the original x0 
            x = self._concat(x, x1)

            for k in range(2, self._max_diffusion_step + 1):
                x2 = 2 * tf.sparse_tensor_dense_matmul(support, x1) - x0
                x = self._concat(x, x2)
                x1, x0 = x2, x1 # x0 overwritten here

Here, x0 is the input feature, but overwritten in the last line. So, where dual with the second support matrix, x0 is no more the original input feature.
So I thought it may be a bug, and to fix it:

x = inputs_and_state
x_in = tf.transpose(x, perm=[1, 2, 0]) 
x_in = tf.reshape(x_in, shape=[self._num_nodes, input_size * batch_size])
x = tf.expand_dims(x_in, axis=0)

scope = tf.get_variable_scope()
with tf.variable_scope(scope):
    if self._max_diffusion_step == 0:
        pass
    else:
        for support in self._supports:
            x0 = x_in
            x1 = tf.sparse_tensor_dense_matmul(support, x0)
            x = self._concat(x, x1)

            for k in range(2, self._max_diffusion_step + 1):
                x2 = 2 * tf.sparse_tensor_dense_matmul(support, x1) - x0
                x = self._concat(x, x2)
                x1, x0 = x2, x1

Am I right?

How to calculate the distances between each node?

Hello, Li.
I wonder how to calculate the distances between each node.
(According to README.md (Graph Construction), this part is being added, but it is not provided yet.)

I calculated some distances using Google map,
however, my results were different from yours.
For example, for the case of 773906 > 767471,
my result was 13.0km, and the value from your data was 10419.9m.

Could you please explain how to calculate node-to-node distance,
or provide the data for this process?

graph_sensor_locations.csv for PEMS-BAY

can you provide the graph graph_sensor_locations.csv dataset for PEMS-BAY dataset?

run_demo.py not working on PEMS-BAY data

Hi, Can you help me with this error? I have run

python run_demo.py --config_filename=data/model/pretrained/PEMS-BAY/config.yaml --output_filename=data/results/dcrnn_bay_predictions.npz

and the error log is listed as follows:

Unable to load data data/sensor_graph/adj_mx_bay.pkl : the STRING opcode argument must be quoted
Traceback (most recent call last):
File "run_demo.py", line 37, in
run_dcrnn(args)
File "run_demo.py", line 20, in run_dcrnn
_, _, adj_mx = load_graph_data(graph_pkl_filename)
File "C:\Users\mlvis\Documents\Emma\DCRNN\lib\utils.py", line 198, in load_graph_data
sensor_ids, sensor_id_to_ind, adj_mx = load_pickle(pkl_filename)
File "C:\Users\mlvis\Documents\Emma\DCRNN\lib\utils.py", line 205, in load_pickle
pickle_data = pickle.load(f)
_pickle.UnpicklingError: the STRING opcode argument must be quoted

Thanks a lot for your help!

mask loss

Hello~, I have some questions about the mask operation. Why do you mask the loss when training? Is it fair?

building model for different dataset

Hello , I'd like to thank you first for the great work.
I'm trying to use the model on a different dataset , which has average occupancy rate per hour , so I have nearly 2000 nodes , location (lat , long) for each node , and records each hour per node , If you could specify what should be modified exactly in order to use this dataset It would be very helpful thank you so much.

Training hangs

Yaguang,

Training starts
python dcrnn_train.py --config_filename=data/model/dcrnn_config.yaml

but it hangs after that. We also tried it on GPU but found the same issue.

2018-07-26 12:08:16,158 - INFO - Log directory: data/model 2018-07-26 12:08:16,158 - INFO - Loading graph from: data/sensor_graph/adj_mx.pkl 2018-07-26 12:08:16,160 - INFO - Loading traffic data from: data/df_highway_2012_4mon_sample.h5 2018-07-26 12:08:16.407358: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2018-07-26 12:08:16.407392: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2018-07-26 12:08:16.407399: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2018-07-26 12:08:16.407405: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2018-07-26 12:08:16,409 - INFO - Log directory: data/model/dcrnn_DR_2_h_12_64-64_lr_0.01_bs_64_d_0.00_sl_12_MAE_0726120816/ 2018-07-26 12:08:16,410 - INFO - {'base_dir': 'data/model', 'batch_size': 64, 'cl_decay_steps': 2000, 'data_type': 'ALL', 'dropout': 0, 'epoch': 0, 'epochs': 100, 'filter_type': 'dual_random_walk', 'global_step': 0, 'graph_pkl_filename': 'data/sensor_graph/adj_mx.pkl', 'horizon': 12, 'l1_decay': 0, 'learning_rate': 0.01, 'loss_func': 'MAE', 'lr_decay': 0.1, 'lr_decay_epoch': 20, 'lr_decay_interval': 10, 'max_diffusion_step': 2, 'max_grad_norm': 5, 'min_learning_rate': 2e-06, 'null_val': 0, 'num_rnn_layers': 2, 'output_dim': 1, 'patience': 50, 'rnn_units': 64, 'seq_len': 12, 'test_every_n_epochs': 10, 'test_ratio': 0.2, 'use_cpu_only': False, 'use_curriculum_learning': True, 'validation_ratio': 0.1, 'verbose': 0, 'write_db': False} 2018-07-26 12:08:37,392 - INFO - Total number of trainable parameters: 373312 2018-07-26 12:08:37,392 - DEBUG - DCRNN/learning_rate:0, () 2018-07-26 12:08:37,392 - DEBUG - DCRNN/global_step:0, () 2018-07-26 12:08:37,392 - DEBUG - DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/gates/weights:0, (330, 128) 2018-07-26 12:08:37,392 - DEBUG - DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/gates/biases:0, (128,) 2018-07-26 12:08:37,392 - DEBUG - DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/candidate/weights:0, (330, 64) 2018-07-26 12:08:37,393 - DEBUG - DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/candidate/biases:0, (64,) 2018-07-26 12:08:37,393 - DEBUG - DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/gates/weights:0, (640, 128) 2018-07-26 12:08:37,393 - DEBUG - DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/gates/biases:0, (128,) 2018-07-26 12:08:37,393 - DEBUG - DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/candidate/weights:0, (640, 64) 2018-07-26 12:08:37,393 - DEBUG - DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/candidate/biases:0, (64,) 2018-07-26 12:08:37,393 - DEBUG - DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/gates/weights:0, (330, 128) 2018-07-26 12:08:37,394 - DEBUG - DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/gates/biases:0, (128,) 2018-07-26 12:08:37,394 - DEBUG - DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/candidate/weights:0, (330, 64) 2018-07-26 12:08:37,394 - DEBUG - DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/candidate/biases:0, (64,) 2018-07-26 12:08:37,394 - DEBUG - DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/gates/weights:0, (640, 128) 2018-07-26 12:08:37,394 - DEBUG - DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/gates/biases:0, (128,) 2018-07-26 12:08:37,394 - DEBUG - DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/candidate/weights:0, (640, 64) 2018-07-26 12:08:37,395 - DEBUG - DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/candidate/biases:0, (64,) 2018-07-26 12:08:37,395 - DEBUG - DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/projection/w:0, (64, 1) 2018-07-26 12:08:37,395 - DEBUG - Train/DCRNN/beta1_power:0, () 2018-07-26 12:08:37,395 - DEBUG - Train/DCRNN/beta2_power:0, () 2018-07-26 12:08:37,395 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/gates/weights/Adam:0, (330, 128) 2018-07-26 12:08:37,395 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/gates/weights/Adam_1:0, (330, 128) 2018-07-26 12:08:37,395 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/gates/biases/Adam:0, (128,) 2018-07-26 12:08:37,396 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/gates/biases/Adam_1:0, (128,) 2018-07-26 12:08:37,396 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/candidate/weights/Adam:0, (330, 64) 2018-07-26 12:08:37,396 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/candidate/weights/Adam_1:0, (330, 64) 2018-07-26 12:08:37,396 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/candidate/biases/Adam:0, (64,) 2018-07-26 12:08:37,396 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_0/dcgru_cell/candidate/biases/Adam_1:0, (64,) 2018-07-26 12:08:37,396 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/gates/weights/Adam:0, (640, 128) 2018-07-26 12:08:37,396 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/gates/weights/Adam_1:0, (640, 128) 2018-07-26 12:08:37,397 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/gates/biases/Adam:0, (128,) 2018-07-26 12:08:37,397 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/gates/biases/Adam_1:0, (128,) 2018-07-26 12:08:37,397 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/candidate/weights/Adam:0, (640, 64) 2018-07-26 12:08:37,397 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/candidate/weights/Adam_1:0, (640, 64) 2018-07-26 12:08:37,397 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/candidate/biases/Adam:0, (64,) 2018-07-26 12:08:37,397 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn/multi_rnn_cell/cell_1/dcgru_cell/candidate/biases/Adam_1:0, (64,) 2018-07-26 12:08:37,398 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/gates/weights/Adam:0, (330, 128) 2018-07-26 12:08:37,398 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/gates/weights/Adam_1:0, (330, 128) 2018-07-26 12:08:37,398 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/gates/biases/Adam:0, (128,) 2018-07-26 12:08:37,398 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/gates/biases/Adam_1:0, (128,) 2018-07-26 12:08:37,398 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/candidate/weights/Adam:0, (330, 64) 2018-07-26 12:08:37,398 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/candidate/weights/Adam_1:0, (330, 64) 2018-07-26 12:08:37,399 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/candidate/biases/Adam:0, (64,) 2018-07-26 12:08:37,399 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_0/dcgru_cell/candidate/biases/Adam_1:0, (64,) 2018-07-26 12:08:37,399 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/gates/weights/Adam:0, (640, 128) 2018-07-26 12:08:37,399 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/gates/weights/Adam_1:0, (640, 128) 2018-07-26 12:08:37,399 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/gates/biases/Adam:0, (128,) 2018-07-26 12:08:37,399 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/gates/biases/Adam_1:0, (128,) 2018-07-26 12:08:37,400 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/candidate/weights/Adam:0, (640, 64) 2018-07-26 12:08:37,400 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/candidate/weights/Adam_1:0, (640, 64) 2018-07-26 12:08:37,400 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/candidate/biases/Adam:0, (64,) 2018-07-26 12:08:37,400 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/candidate/biases/Adam_1:0, (64,) 2018-07-26 12:08:37,400 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/projection/w/Adam:0, (64, 1) 2018-07-26 12:08:37,400 - DEBUG - DCRNN/DCRNN/DCRNN_SEQ/rnn_decoder/multi_rnn_cell/cell_1/dcgru_cell/projection/w/Adam_1:0, (64, 1)

the training hangs here.

Missing file

Hi,

python dcrnn_train.py --config_filename=data/model/dcrnn_config.yaml
FileNotFoundError: File data/df_highway_2012_4mon_sample.h5 does not exist
Could you please upload the missing file.

whats the dataste 's inputdim represent? flow or occupy and?

get error when generating data.

Could you please help me fix the error. When I run

python -m scripts.generate_training_data --output_dir=data/METR-LA

I get the following logs

Traceback (most recent call last):
  File "/home/zhwu/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/zhwu/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/zhwu/workspace/DCRNN/scripts/generate_training_data.py", line 123, in <module>
    main(args)
  File "/home/zhwu/workspace/DCRNN/scripts/generate_training_data.py", line 108, in main
    generate_train_val_test(args)
  File "/home/zhwu/workspace/DCRNN/scripts/generate_training_data.py", line 85, in generate_train_val_test
    x_train, y_train = x[:num_train], y[:num_train]
TypeError: slice indices must be integers or None or have an __index__ method

Can't find the .h5 file for data

There is no metr-la.h5 file in the directory 'data', where i can find that .h5 file from? I want to reproduce the results of your experiment. thank you very much.

question about the metr_la data

excuse me why there have so many zero in the speed record？

Missing file?

Hi --

It seems like the file data/sensor_graph/graph_sensor_ids.txt is missing -- are you able to add it to the repo?

Alternatively -- maybe it's safe to ignore it by taking union(distance_df.from, distance_df.to) as the complete set of sensor IDs?

Thanks
Ben

Abnormal Train Loss

Hi, thanks for your code.
I got an abnormal training loss under default config. Although validating error is convergent, training loss goes down at first but it keeps growing after minimal.

METR-LA
PEMS-BAY

Question about the graph convolution kernel

Hi Liyaguang, I have a question about your graph convolution kernel. You compute the diffusion convolution kernel by iteratively doing x2 = 2*tf.sparse_tensor_dense_matmul(support, x1) - x0. But, if I am correct, should it be doing x2 = tf.sparse_tensor_dense_matmul(support, x1) according to your article?

hello! I have confused with diffusion step!

I've read the codes!
In the codes, i think you just use Chebyshev polynomial and replace ~L with D^-1W:
for support in self._supports:
x1 = tf.sparse_tensor_dense_matmul(support, x0)
x = self._concat(x, x1)
for k in range(2, self._max_diffusion_step + 1):
x2 = 2 * tf.sparse_tensor_dense_matmul(support, x1) - x0
x = self._concat(x, x2)
x1, x0 = x2, x1
It's different from Diffusion Convolution.
Is that the Chebyshev polynomial includes the diffusion convolution?

Biases in GRU

Hi,

thanks for your great paper!
In Chapter 2.3 you add a bias vector to the convolution for the update/reset/candidate gates, i.e. it seems like the biases are of shape (num_nodes) for every unit, hence in total num_nodes * rnn_units. In your code in dcrnn_cell.py, when adding the biases, this is only of the output_size which is equal to the number of rnn_units for the candidate gate or equal to 2*rnn_units for the reset/update gate. Hence, for every unit only one bias term is added.

Can you please help me with this issue?

Thank you very much.

input of seq2seq

Hi Yaguang,

Thanks for your great work. It give us lots of help. I got one question about the model implementation. In dcrnn_model.py, line 79:
outputs, final_state = legacy_seq2seq.rnn_decoder(labels, enc_state, decoding_cells,
loop_function=loop_function)
From this sentence, implementation of rnn_decoder and definition of loop function, I think output is induced with the help of label. I know Tensorflow define seq2seq in this way. However, according to the problem definition in prediction, we don't know labels when prediction. Is this a problem?

Question on dcrnn_cell

Hi, Yaguang

I have read through dcrnn codes and there is a little question I cannot understand. I hope for your help.
When the filter type is random walk or dual random walk, why is there a transpose operation after calculated random_walk_matrix?
Actually, in your paper, the defined diffusion convolution is just like graph convolution which does not have the transpose operation after calculated scaled_laplacian_matrix in the code.

dcrnn_train.py : Found Inf or NaN global norm: Tensor had NaN values

hello , I get an InvalidArgumentError when running dcrnn_train.py, it shows:

when I google it, someone said use smaller batch size but it doesn't work.
what do you think the error happens ?