xinychen / transdim Goto Github PK

Machine learning for transportation data imputation and prediction.

License: MIT License

Jupyter Notebook 100.00%

transdim's Introduction

👋 I'm Xinyu Chen, now a Postdoctoral Associate at MIT (see MIT sites). Before joining MIT, I received PhD degree from University of Montreal in Canada.

🌱 A strong advocate of open-source & reproducible research.
🤔 Besides coding, I enjoy reading & traveling.
💬 Create new ideas about spatiotemporal data modeling.
📫 How to reach me: [email protected]

Latest Publications

Xinyu Chen, Xi-Le Zhao, Chun Cheng (2024). Forecasting urban traffic states with sparse data using Hankel temporal matrix factorization. INFORMS Journal on Computing. Early Access. [PDF] [Data & Python code]
Xinyu Chen, Zhanhong Cheng, HanQin Cai, Nicolas Saunier, Lijun Sun (2024). Laplacian convolutional representation for traffic time series imputation. IEEE Transactions on Knowledge and Data Engineering. Early Access. [PDF] [Slides] [Blog post] [Data & Python code]
Xinyu Chen, Chengyuan Zhang, Xiaoxu Chen, Nicolas Saunier, Lijun Sun (2024). Discovering dynamic patterns from spatiotemporal data with time-varying low-rank autoregression. IEEE Transactions on Knowledge and Data Engineering. 36 (2): 504-517. [PDF] [Blog post] [Data & Python code]
Xinyu Chen, Lijun Sun (2022). Bayesian temporal factorization for multidimensional time series prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 (9): 4659-4673. [Slides] [Data & Python code]

transdim's People

Contributors

Stargazers

Watchers

Forkers

zhuangdingyi vadermit parety amirunpri2018 kamalravi ajaafer liheng giegloop aijinz wangjianlongnba csjunxu kapokyan lyzl2010 awesome-archive earthat zhiyongc pepsalehi shuxingcheng simmonssong lijunsun czzlegend mosiur97 hsuisme qianliu1990 youngflyasd wangpeng89 jiaodaxiaozi harold1997-trans skanderhn mindis afcarl frank19-lab ozz-d jjongjjong magica-chen yangzhou12 haleylgc dawnj wcc961129 lidanyang12 nancygaooo gaosq0604 chaomneg wendy914 dotwoo nsaunier mousewu lxw4939 tomfisher acoffeeyin rotcx geekyneuro mongoooo fmx789 gumplus ilyi1116 kiminh jayejndx kaiwang9370 salgorithom shengguanwsu kaimoxuan123 dpcscience sharonwangs bloodphx sanmuseniors w5802021 dreaminesslww altovate ap0stal renaldydwi yingfuu hitpisces sprinterzzj sqsxwj520 bob0123456 sandy4321 jayantchen96 zhuwentao2020 awoziji lbdhr opaquezxd stel-nik lcrypto vigneshr97 lzx-buaa wpcaia hao-zi reborn521 wsgan001 relevation-143 akarito dingyinghui bxfour2018 yunxileo yizhanyang 7stitch7 kaimaoge bonaldli fanhongweifd

transdim's Issues

transporatation

can we use a function that people meet together using gps to gather?

How to input a missing data to obtain complete data after training

Hello, thank you very much for your open source project, which has been very helpful to me!
I have a question now. After training the BATF algorithm, I want to input a missing piece of data and obtain the complete data. What should I do?
I would greatly appreciate it if you could reply to my question

请问BTMF数据修补程序中的rank参数该如何设置呢

作者您好！我觉得您的工作非常好且有意义，但是目前我在应用时遇到了两个问题。一个是如题所示，BTMF中的rank参数是什么含义，如何影响结果。二是我发现对速度序列使用BTMF后变得非常的平滑，甚至给我一种失真的感觉，请问这个问题有解决的办法吗，是否可以通过调整参数解决呢。

What is the difference between the multivariate datasets and the multidimensional datasets？

Recently I have read some articles about data imputation，and found that some methods are for multivariate data while others for multidimensional data,I wonder what is the difference between these.Could you please give me a simple example? Thanks a lot !

LATC的预测问题

非常感谢您提供了这样一篇漂亮的文章和code，这篇文章对我十分有帮助，但是我在复现您代码的过程中遇到了一些问题：在利用LATC模型对数据进行预测时，可以像传统的时间序列模型对数据向前进行预测吗，也就是预测现实世界中还没有的值吗？如果可以的话代码应该如何实现呢，我在复现您LATC-predict的代码时发现您定义的predictor函数是把要预测的值当成缺失值处理然后按LATC进行补全，然后与真实值比较，但是我想预测没有真实值的数据时，应该如何实现呢

About Your RMSE

In CP_ALS and Tucker_ALS, why do you calculate the RMSE on the training set ( sparse_tensor) not on the test set as you do in BGCP?

Btw, final_mape = np.sum(np.abs(dense_tensor[pos] - tensor_hat[pos]) / dense_tensor[pos]) / dense_tensor[pos].shape[0], the np.abs() should cover (dense_tensor[pos] - tensor_hat[pos]) / dense_tensor[pos] instead of dense_tensor[pos] - tensor_hat[pos]

A question about the null value generation of time series

hello,i am glad to ask a question. I am studying a set of data on the changes in the number of physical stores, mainly to detect the outliers of the number changes. Of course, the blank value itself is also an outlier. I want to ask, what algorithm should I use to fill in the blank values to find other types of outliers? Thank you.

关于将BATF应用到自己数据集时产生的不正定问题

LinAlgError: 8-th leading minor of the array is not positive definite
这是我最后报的错，参数和代码提供的一样，我上网查是因为我的数据中带有NaN。我想用您提供的办法做缺失值处理，请问怎样将BATF应用于带NaN的数据？非常感谢！

BPTF和BGCP区别在哪里？

The difference in data

About the data set. Each data file has these three named data: tensor，random_tensor，random_matrix.
What do these three stand for and is there any difference?

应用在自己的数据集上出现问题

你好，我有三条存在关联的时间序列数据，缺失值设置为0，我将他们重组成（6720，3）形状，采用LRTC进行数据填补，在运行过程中出现了问题。
ValueError Traceback (most recent call last)
in
11 epsilon = 1e-4
12 maxiter = 200
---> 13 tensor_hat = LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)
14 end = time.time()
15 print('Running time: %d seconds'%(end - start))

in LRTC(failed resolving arguments)
55 Z[pos_missing] = np.mean(X + T / rho, axis = 0)[pos_missing]
56 T = T + rho * (X - np.broadcast_to(Z, np.insert(dim, 0, len(dim))))
---> 57 tensor_hat = np.einsum('k, kmnt -> mnt', alpha, X)
58 tol = np.sqrt(np.sum((tensor_hat - last_tensor) ** 2)) / snorm
59 last_tensor = tensor_hat.copy()

<array_function internals> in einsum(*args, **kwargs)

~\AppData\Roaming\Python\Python36\site-packages\numpy\core\einsumfunc.py in einsum(out, optimize, *operands, **kwargs)
1348 if specified_out:
1349 kwargs['out'] = out
-> 1350 return c_einsum(*operands, **kwargs)
1351
1352 # Check the kwargs to avoid a more cryptic error later, without having to

ValueError: einstein sum subscripts string contains too many subscripts for operand 1

十分感谢你的开源项目！

A bug in LRTC-TNN.ipynb.

In the svt_tnn code:

def svt_tnn(mat, alpha, rho, theta):
    tau = alpha / rho
    [m, n] = mat.shape
    if 2 * m < n:
        u, s, v = np.linalg.svd(mat @ mat.T, full_matrices = 0)
        s = np.sqrt(s)
        idx = np.sum(s > tau)
        mid = np.zeros(idx)
        mid[:theta] = 1
        mid[theta:idx] = (s[theta:idx] - tau) / s[theta:idx]
        return (u[:, :idx] @ np.diag(mid)) @ (u[:, :idx].T @ mat)
    elif m > 2 * n:
        return svt_tnn(mat.T, tau, theta).T # this svt_tnn lack an argument. :( It only has 3 aurgements. 
    u, s, v = np.linalg.svd(mat, full_matrices = 0)
    idx = np.sum(s > tau)
    vec = s[:idx].copy()
    vec[theta:idx] = s[theta:idx] - tau
    return u[:, :idx] @ np.diag(vec) @ v[:idx, :]

The error shows:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[18], line 7
      5 epsilon = 1e-4
      6 maxiter = 200
----> 7 x = LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)
      8 # end = time.time()
      9 # print('Running time: %d seconds'%(end - start))

Cell In[8], line 17, in LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)
     15 rho = min(rho * 1.05, 1e5)
     16 for k in range(len(dim)):
---> 17     X[k] = mat2ten(svt_tnn(ten2mat(Z - T[k] [/](https://file+.vscode-resource.vscode-cdn.net/) rho, k), alpha[k], rho, int(np.ceil(theta * dim[k]))), dim, k)
     18 Z[pos_missing] = np.mean(X + T [/](https://file+.vscode-resource.vscode-cdn.net/) rho, axis = 0)[pos_missing]
     19 T = T + rho * (X - np.broadcast_to(Z, np.insert(dim, 0, len(dim))))

Cell In[6], line 13, in svt_tnn(mat, alpha, rho, theta)
     11     return (u[:, :idx] @ np.diag(mid)) @ (u[:, :idx].T @ mat)
     12 elif m > 2 * n:
---> 13     return svt_tnn(mat.T, tau, theta).T
     14 u, s, v = np.linalg.svd(mat, full_matrices = 0)
     15 idx = np.sum(s > tau)

TypeError: svt_tnn() missing 1 required positional argument: 'theta'

关于划分训练集，测试集

尊敬的作者您好，请问在使用机器学习算法进行数据插补时不需要像深度学习方法那样划分出训练集和测试集吗？

Compare GAIN with BGCP about imputation

Hi,I'm learning about GAIN now,and it seems that BGCP has better performance than GAIN In terms of imputation.So I want to know what are GAIN's advantages and disadvantages,comparing with BGCP?And is there any difference in the direction of application between BGCP and GAIN?Thanks!

some question about your paper

From paper "Low-Rank Autoregressive Tensor Completion for Spatiotemporal Traffic Data Imputation"
Could you tell me how you calculated to get this formula?

Looking forward your reply

Can contraints be introduced during the data imputation?

If I want to add some contraints or expert experience as prior during the data imputation process, can transdim do this?

Will it work for categorical data

After the forecast of the full data

请问在定义的BGCP算法中返回的两个参数分别代表什么呢？
如果我需要返回补缺后的数据，请问应改成哪个参数呢？
谢谢！

请问是否有交通流量数据呢

您好，
请问你们的数据集是否能提高交通流量的数据集呢，如果没有是否有别的公开数据集能提供流量呢，我发现我现在能找的公开数据集都是速度数据，找不到流量数据，谢谢了！

数据集问题

请问 dataset\Guangzhou-data-set下面的 random_matrix.mat 和 random_tensor.mat 是随机生成的么，还是符合某种分布规律，画图看不出来呀

PeMS graph

Hi, I want to know for the PeMS dataset which graph to use- one called graph_pems_new and another graph_pems.

Thanks

LinAlgError: SVD did not converge using LRTC-TNN

I have non-random missing values of about 50% orginal values with 5 feature. I try to use LRTC-TNN to restore the missing values, however, it shows LinAlgError: SVD did not converge. What can I do ? Or is there any other method can be used to impute my data? Thanks.
The original data is shown below (just ignor the last figure, bottom right one with nothing showing):

dataset

I think it is very interesting. But unfortunately, when I try to implement your algorithm with my own data, I have some problems. I am stuck in generating third-order tensor, could you please send me the source code of your data processing.

Appreciate your help.
Thanks and regards.

How can I set the parameter "low_rank" in different application scenarios?

First of all, your open source work is very beautiful! Let me have a good understanding of the main content of your paper.

Here I would like to discuss an initial parameter setting pointed out in the code or the paper. In the code, the value of initial parameter low_rank needs to be specified when executing BGCP notebook. How can the value be defined according to the given different time series? Or is the definition of this value completely random? I'm troubled by that.

If convenient, please reply. I will be very grateful! Thank you again for your open source work.

How to apply LATC algorithm to my own dataset

I am glad to ask you a question. My dataset is not complete so I want to use LATC for completion. This means I don't have a dense_tensor, so what should I do? My dataset is a matrix with dimension 21x4081, but I added a new dimension to it and converted it into a tensor with dimension 21x1x4081.
If I use sparse_tensor.copy() to instead dense_tensor, RMSE and MAPE was nan.

Looking forward to your reply