paddlepaddle / paddlespatial Goto Github PK

PaddleSpatial is an open-source spatial-temporal computing tool based on PaddlePaddle.

License: Apache License 2.0

Python 17.13% Jupyter Notebook 7.52% GLSL 74.90% Shell 0.03% Makefile 0.07% Batchfile 0.01% Edge 0.33%

paddlespatial's Introduction

Introduction

PaddleSpatial is a spatial-temporal computing tool, taking advantage of the advanced spatial-temporal data mining capacities including spatial transfer learning, time series prediction and region profiling, for facilitating the development of urban computing applications.

Resources

Installation Guide

PaddleSpatial is an open source spatial-temporal computing tool based on PaddlePaddle. The installation prerequisites and guide can be found here.

Tutorials

We provide tutorials to help you navigate the repository and start quickly.

Guide for Developers

To develope new functions based on the source code of PaddleSpatial, please refer to guide for developers.
For more details of the APIs, please refer to the documents.

Feedback and Community Support

Questions, reports, and suggestions are welcome through Github Issues!

License

The release of PaddleSpatial is certified by the Apache 2.0 license.

paddlespatial's People

Contributors

Stargazers

Watchers

Forkers

xiaolingis stjordanis gddclct zhoujingbo liyuhub gongsunlong andrewdididi zhongkailv 793328114 dawnywu findma bingblackbean 5663015 tomato996 xyz100h liuaqcsu manusomiedo yuanjianrui adam618 cra2ydavid fanxingrong jw-huang98 dataamber luloby thusuy zhaoyubbd caizy1709 smarts027 xh15377083512 ciaonylon fansong1983 manitprats sth0123 wenze7 lazyzhaoyang superorangeman wangsiji aptx1231 alicia-ux dataminingdidiyr georgeiap yelrose raphanussativusl asd8360 liujianlun begovegar wenyawilla yxm1040539881 xiaoyangyang2 qiyea guyingyuexia xuanlancognition itachicheng bairui82 bazookahe tandaman03 bruceqd jun2hou playcv seline02 friends-aa yoontae6719 icebear6832k f74076205 ethan-yxx cuizhengliang jnby-algo losser11231 linlang1837 290251949 jiujiansun hakurena suryatmodulus hssip deepkashiwa20 shuowang-ai wizzniu yeyeyetta rockyhoo1209 nquinteropato111191 wowoho jlo404 unknowed-er lzl0318 2635729828xb unstoppable198 ywatcher qhdyhy001 gingair jl-guo umbreller-f winne666cty rasmus-engholm greatofdream ustcearthdefense vgeek-z panelding elden-king phoenixes94 malo818

paddlespatial's Issues

lightgbm maybe not correctly installed in base environment

At the beginning, I found that it would appear 'NoneType' object has no attribute 'forecast' when using lightgbm in the base environment, and After commenting import lightgbm as lgb, this error is solved. could you please check the base environment whether lightgbm is installed correctly. thank you!

Commands in README to reproduce results don't match paper information

Hi, trying to reproduce the results as well right now, and reading through the paper there is a mismatch between information in the paper and the commands describe in https://github.com/PaddlePaddle/PaddleSpatial/blob/main/research/D3VAE/README.md
in the "Reproduce Experimental Results" section.

The paper states in 3.1 Implementation Details: "The trade-off hyperparameters are set as ψ = 0.05, λ = 0.1, γ = 0.001 for ETTs, and ψ = 0.5, λ = 1.0, γ = 0.01 for others"

But in the README the parameters psi, gamma and lambda1 do not get changed for any of the datasets? (the default to the non-ETT parameters).

result is nan

Run the Conformer model, python -u train.py --data ECL, the output result is all nan, and the loss is also nan, what is the reason?

kddcup22-sdwpf-evaluation提交结果报错

在此次比赛中，我们队伍的提交结果总是提示 Error occurred：Err: Accuracy (-0.004) is lower than Zero, which means that the RMSE (in latest 24 hours) of the 120...

因为 Err 显示不完全，我们并不清楚是哪里产生了错误，导致无法得到测试结果的分数。

我们在本地运行 evaluation.py 是完全可以跑通的，没用任何报错，可否提供远程服务器完整的报错信息？

quit

Excessive usage of memory

Problem

I am using the full dataset and baseline code, with modification only to the file path in prepare.py. I can successfully train my 134 models, but when I want to evaluate, the program always use too much memory (like over 32 GB). Have inspected the code and suspect that it's the responses list [256:common.py] taking too much memory, but I don't know what to do with it. Can somebody help me.

My project structure

./wpf_baseline_gru
| --- /data
| --- wtbdata_245days.csv
| ---- ... rest of files

NotImplementedError: You must implement the backward function for custom autograd.Function.

Thank you for your wonderful work. When I reproduced the pytorch version of d3vae, the following error occurred.

Traceback (most recent call last):
  File "main.py", line 112, in <module>
    exp.train(setting)
  File "/root/Project/Seq_diff/d3vae_pytorch/exp/exp_model.py", line 131, in train
    loss.backward()
  File "/root/miniconda3/envs/seq_DM/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/root/miniconda3/envs/seq_DM/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/root/miniconda3/envs/seq_DM/lib/python3.6/site-packages/torch/autograd/function.py", line 89, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "/root/miniconda3/envs/seq_DM/lib/python3.6/site-packages/torch/autograd/function.py", line 201, in backward
    raise NotImplementedError("You must implement the backward function for custom"
NotImplementedError: You must implement the backward function for custom autograd.Function.

My pytoch version is 1.7.1
Solution
- Add neural_operations.py at line 68 of the code @staticmethod to fix the bug.

现在leaderboard 的结果是需要提交代码的吗....confusing.....

想问一下当前提交的结果是根据给的245 的最后15天，然后预测未来两天的结果吗？

D3VAE number of disentanglement factors?

Hi, the paper states in 3.1 implementation details:
"The number of disentanglement factors is chosen from {4, 8}"

However, I don't see an option to configure that in the provided python code (both the paddlepaddle and the pytorch based ones). Is that option named differently or is that hard-coded somewhere?

Are there any ways to inverse the pred.npy in D3VAE?

Hi, I'm planning to utilize the D3VAE in this repo.

The result of prediction (pred.npy) seems normalized and there is no option to inverse the result. Are there any ways to inverse the pred.npy and get the real prediction results?
If not, I can also make a PR.

Thanks!

Dependency confusion supply-chain vulnerability detected

Hi,

I'm a Cybersecurity researcher developing PackjGuard [1]. Our tool has detected a dependency confusion vulnerability in this repository. In order for me to disclose it, kindly enable GitHub Private vulnerability reporting, which allows security research to responsibly disclose a security vulnerability.

Thanks!

PackjGuard is a Github app that monitors repos for malicious, vulnerable, abandoned, and other "risky" dependencies and mitigates attacks by creating pull requests for automatic remediation https://github.com/marketplace/packjguard

'NoneType' object has no attribute 'forecast'

hi, when I use base environment, I got error like this, 'NoneType' object has no attribute 'forecast'.
But I did define the correct predict.py and prepare.py files, and I can run "python evaluation.py" locally to get the results on my server.

【PaddlePaddle Hackathon 2】87、PaddleSpatial添加路网表征学习模块

（此 ISSUE 为 PaddlePaddle Hackathon 第二期活动的任务 ISSUE，更多详见【PaddlePaddle Hackathon 第二期】任务总览）

【任务说明】

任务标题： PaddleSpatial添加路网表征学习模块
技术标签：Python、 PaddlePaddle
任务难度：中等
详细描述：路网表征学习是时空大数据分析中的关键任务之一。需要实现基于 Geom-GCN，chebCovn等表征学习方法（代码），完成路网表征学习的功能模块。作为中等任务，需要实现4种路网表征学习模块（Geom-GCN, ChebConv, DeepWalk, LINE) 。

【提交流程】

直接PR至 https://github.com/PaddlePaddle/PaddleSpatial/tree/main/paddlespatial 即可开启验收。

【提交内容】

相关模块源代码、开发文档、Benchmark。

【合入标准】

开发完成，并参与后续维护至正式版本发布。

【技术要求】

熟悉PaddlePaddle和PGL基本算法模块。

【参考内容】

https://github.com/LibCity/Bigscity-LibCity/tree/master/libcity/model/road_representation 中含有路网表征学习的实现情况。

【答疑交流】

如果在开发中对于上述任务有任何问题，欢迎在本 ISSUE 下留言交流。
对于开发中的共性问题，在活动过程中，会定期组织答疑，请大家关注官网&QQ群的通知，及时参与。

AGAIN! lightgbm maybe not correctly installed in base environment

We encountered this problem again'NoneType' object has no attribute 'forecast' in base environment.

Can you confirm that lightGBM is correctly installed in the base environment.

Thx!

Issue with the scoring net part of D3VAE's PyTorch code

First of all, thank you for your interesting work and for releasing the code on github!

I ran into an issue when I tried to reproduce the paper's results on the ETTh1 dataset. The issue is related to the line below:

PaddleSpatial/research/D3VAE/d3vae_pytorch/model/resnet.py

Line 152 in 13bd2d9

output = output.view(-1, int(self.hw/8)*int(self.hw/8)*8*self.dim)

or one of the previos steps, since it only seems to work with evenly numbered batch sizes.

To reproduce, with a fresh clone of the repository, run https://github.com/PaddlePaddle/PaddleSpatial/blob/main/research/D3VAE/d3vae_pytorch/main.py
with all arguments set to default except for the path to the dataset, the batch_size and optionally specify which gpu to use. If run with an even integer as batch_size it works:
python main.py --root_path /path/to/ETDataset/ETT-small/ --data_path ETTh1.csv --batch_size 2 --gpu 0
but if the batch_size is uneven, e.g. 1, the code crashes with the following error:
PaddleSpatial/research/D3VAE/d3vae_pytorch/model/resnet.py", line 152, in forward output = output.view(-1, int(self.hw/8)*int(self.hw/8)*8*self.dim) RuntimeError: shape '[-1, 8192]' is invalid for input of size

sagnn方位sector划分

感谢代码开源~
关于sagnn模型的sector划分，我有如下问题想请教一下：
PaddleSpatial/paddlespatial/networks/sagnn.py的def get_subgraphs(self, g)中，sec_ind = int(angle / (np.pi / self.num_sectors))（第115行）是什么含义呢？
我理解的self.num_sectors是将一个整圆均分为self.num_sectors个扇区，但是(np.pi / self.num_sectors)只是将一个半圆分成了self.num_sectors个扇区。
期待您的解答~

Questioning the validity of the diffusion module

I have run the pytorch version code and found a unreasonable result: I set the parameter "diff_steps" to 0, 10, 50, 100, 200, 500, 1000, then run the code in ETTh1 and wind datasets. What I find is that , as the number of diffusion steps increases, the results are getting worse. (Both mse and mae are). Below is the result record I run on ETTh1. I also turn the mse result into a picture (at the end of this issue). So I doubt the validity of the diffusion module.

dataset name: ETTh1
prediction_length: 8
diff_steps: 0
mse mean:0.24760339
mse std:0.020427
mae std:0.3878688
mae std:0.017404405

dataset name: ETTh1
prediction_length: 8
diff_steps: 10
mse mean:0.26035553
mse std:0.012089639
mae std:0.39704275
mae std:0.017935857

dataset name: ETTh1
prediction_length: 8
diff_steps: 50
mse mean:0.26268333
mse std:0.0169106
mae std:0.40157977
mae std:0.015663896

dataset name: ETTh1
prediction_length: 8
diff_steps: 100
mse mean:0.281945
mse std:0.06063651
mae std:0.41526827
mae std:0.051474944

dataset name: ETTh1
prediction_length: 8
diff_steps: 200
mse mean:0.26382175
mse std:0.03267344
mae std:0.39231223
mae std:0.028107507

dataset name: ETTh1
prediction_length: 8
diff_steps: 500
mse mean:0.363977
mse std:0.06955655
mae std:0.45770007
mae std:0.048627686

dataset name: ETTh1
prediction_length: 8
diff_steps: 1000
mse mean:0.35823375
mse std:0.05329487
mae std:0.4454666
mae std:0.032828197

提几个建议,望各位大佬采纳

1.教程目前是英文的,能否提供中文版本,很简单吧.目前教程目录中的几个链接也都是英文的,希望提供中文.
2.PaddleSpatial仓库开设之前,我就一直跟某个飞桨大佬提议,开设时空序列预测仓库,因为我使用了easydl的时间预测序列,效果不好,说白了就是性能差,完全达不到预期,希望能有更专业的仓库,比如可以提供多种算法,easydl只需要我提供数据execl,我完全不知道内部做了什么,而且效果不怎么样.时间序列预测,应该是一个学术界研究热点,如果有一个很牛逼的时间预测序列算法,每个人都可以炒股发财了,哈哈哈哈.开个玩笑.不指望时间预测序列能预测股票,但希望,可以复现顶级论文算法,很多时间预测序列算法很牛逼的样子,似乎前景也不错,希望复现.
3.作为仓库的开发者,你们希望,输入数据,是一种怎样的组织形式呢?用execl?
4.房价预测问题,只需要弄个3层小神经网络,就可以达到很好的效果.输入是13维数组,模型输出是房价,训练时数据集得分成训练集和验证集.我想说的是,现实世界中,往往做一个实验,可以找到很多数据,输入N维数据,得到某个输出数据,比如为了研究重力和质量的关系,通过实验,得到了大量数据集,使用这些数据集是否就能得出关系呢?神经网络是否可以很好的拟合两者之间的关系呢?这个问题很简单,是正比关系,但是对于一些关系很复杂,目前不知道存在什么关系的情况下,神经网络能否很好拟合呢?举个例子,端流就很难.这也可以看出输入N维数据,输出某个结果.你们目前提供这种输入N维数据,输出某个结果的模型吗?好比房价预测问题,输入13维数据,输出房价.

MIG

How do we calculate the MIG cause the sequence does not have the ground truth factor?

【PaddlePaddle Hackathon 2】114、PaddleSpatial添加时间序列预测模块

（此 ISSUE 为 PaddlePaddle Hackathon 第二期活动的任务 ISSUE，更多详见 https://github.com/PaddlePaddle/Paddle/issues/40234）

【任务说明】
任务标题： PaddleSpatial添加路时间序列预测模块
技术标签：Python、 PaddlePaddle
任务难度：简单
详细描述：实现最新的时间序列预测算法，完善PaddleSpatial时间序列预测能力
包括：1）基于RNN的时序预测（非常简单）
2）论文 Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (AAAI'21 Best Paper)
【提交流程】
直接PR至 https://github.com/PaddlePaddle/PaddleSpatial/tree/main/paddlespatial/modelzoo/timeseries即可开启验收。
【提交内容】
相关模块源代码、开发文档、Benchmark。
【合入标准】
开发完成，并参与后续维护至正式版本发布。
【技术要求】
熟悉PaddlePaddle基本算法模块。
【参考内容】
informer：https://github.com/zhouhaoyi/Informer2020 包含informer的实现代码；
Seq2Seq:https://github.com/LibCity/Bigscity-LibCity/blob/master/libcity/model/traffic_speed_prediction/Seq2Seq.py
中含有一个RNN做时间序列预测的示例。
【答疑交流】
如果在开发中对于上述任务有任何问题，欢迎在本 ISSUE 下留言交流。
对于开发中的共性问题，在活动过程中，会定期组织答疑，请大家关注官网&QQ群的通知，及时参与。

Hod to adapt framework to custom data?

Hi team,
great work first of all. I would like to use your framework for the domain of sports. I have a time series data of players in regards to different parameters. How would I use the data witht the current model? In what input format does the data have to be in? Is a dataframe sufficient that is grouped by player and his parameters? Can the model be trained on the data of an entire team and then predict a parameter of a given player?
Thank's a lot in advance!

I wonder whether there will be a preprocessing py and it prompted that you must implement the backward function for custom autograd.Function

Hello, I want to ask whether the author can provide preprocessing code, otherwise it is difficult to reproduce the corresponding results. In addition, I put wind.csv into data folder and found that it prompted that you must implement the backward function for custom autograd.Function

About dataset file

Hi ,do you have the .csv files of these three datasets?
Electricity Traffic Weather

Reproducibility of D3VAE

Hi, I have been trying to reproduce the results published in the paper D3VAE, with the following command:

python main.py       --root_path ~/data/ETDataset-main/ETT-small --data_path ETTh1.csv \
                     --input_dim 7 --percentage 0.05 --diff_steps 1000 \
                     --beta_end 0.1

The last lines I got were:

mse:0.3795701265335083, mae:0.4926460385322571
0.4403777 0.0682598 0.5076214 0.040027253

If I understand well, 0.4403777 is the average mse over several runs of the experiment. But I do not find this value in the paper rather the value indicated is 0.292.

Is there something I am missing?

in KDD cup, I have a question

in image, what is means the "framework" argument ???

Data of wpf

Where can I find the wpf data?

Package Requirements for WPF

hi, we want to try the model ensemble of machine learning models and deep learning models. Could you install lightgbm==3.3.2 in the pytorch environment? thx!

D3VAE: overwrites timestamps?

Hi,

I came across something weird in the D3VAE code. In data_loader.py, line 93, you just manually overwrite the timestamps of the data:

df_stamp = pd.date_range(start='4/1/2018',periods=border2-border1, freq='H')
data_stamp = time_features(df_stamp, timeenc=self.timeenc, freq=self.freq)

Can you tell me why that is?

建议

1.我虽然看不懂英文,但我通过百度搜索Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecas,看到了很多关于该篇论文的解读,主要就是作者认为Transformer用于长时间序列预测研究潜力很大,但目前存在3个问题,分别提出了解决办法,并且效果验证还不错.可是论文很多东西没有涉及.
比方说,论文采用官方数据集这种方式进行模型效果验证,这样一来,因变量y由几个自变量x决定,已经由数据集定义好了,可是现实生活中不是这样的.为了通过多维x预测y,我们首先得确定哪些维度x会影响y.我想问问,假设我为了预测一个变量y,我通过实验或者某种方法,得到了100维自变量的数据和因变量y的数据,但我如何通过一个算法或者模型,判断哪些维度的自变量x是关于y的有效自变量,哪些维度的自变量x是关于y的无效自变量.比如为了预测引力,我得到了物体质量,体积这2个维度的数据和重力数据,我如何通过一个算法或者模型,得到引力只跟质量有关,跟体积无关这个结论?我又如何通过模型,得到引力不仅仅跟质量有关,还跟某个或某些未知自变量有关的结论?(注:引力=[引力常量]乘以两物体质量的[乘积]除以它们距离的平方。)
希望百度paddlespatial可以提供这个模型或者算法,谢谢.这对我很重要.
2.时间序列预测,在我看来分为两类,一类是短时间时间预测序列,比如根据过去值,预测未来一天的值.听过多数论文都是研究短时间时间预测序列.这种短时间时间序列预测是重中之重,希望可以提供很多模型.比如clas图像分类就提供了不下于100个模型.上面那篇论文提到的似乎属于长时间时间预测序列.我经常听说长时间预测序列并不靠谱,根据过去很长一段时间的天气温度,湿度等变量,预测30天后的天气温度?这不扯淡一样吗?
不知道现在技术如何了,真能做到长时间序列预测吗?
扯远了,我想说的是,对于时间预测序列,我应该使用过去多长时间的数据去训练,去预测.比如预测天气,我应该使用过去一年,过去一月,还是过去一周的数据进行模型训练和预测呢?目前好像没有人提供这种模型,告诉我应该选择过去多久的数据进行训练和预测.
希望百度paddlespatial可以提供这个模型或者算法,谢谢.这对我很重要.
3.上个问题,我认为你们回答了一半吧.通过issue交流很不方便,我也知道罗马帝国不是一天建成的,paddlespatial肯定也不是短时间可以取得巨大成功.希望你们可以多复现那些SOTA算法,无论短时间时间序列预测还是长时间时间序列预测.其实在我看来,短时间时间序列预测和长时间时间预测序列应该使用不同的模型算法,不知道你们认为如何.
4.能加个QQ聊聊吗?issue太不方便了,微信也行,拜托.QQ:1226194560 微信:18820785964