abdullaho / tspdb Goto Github PK

View Code? Open in Web Editor NEW

184.0 184.0 58.0 125.26 MB

tspdb: Time Series Predict DB

License: Apache License 2.0

Makefile 0.01% PLpgSQL 0.09% Python 3.78% Jupyter Notebook 96.12% Dockerfile 0.01%

tspdb's People

Contributors

Stargazers

Watchers

Forkers

david-ml jknthn valeman ibozkurt79 rogerlarsson swipswaps interregna afloyd chetanmehra qxxi emoen mnsalim hadryan jochasinga dhughes-poolcorp crt-fork debackerl afcarl getziadz doytsujin shrisharaob jamestiotio vmbbc shicheng-guo rishirelan davideroznowicz optionsraghu peggypan0411 jameaba masteralmada zuiyueyin zengxinle vineet-codes snowrunfly chengyan7857 ask-build pandahii newtolan kkkzxh zhuzhibin1988 leshan63 arpieb webclinic017 jigjunior hassanksalim adrienbrault tsdb-io microhello miaoshenmao ansonyuen1124 andyyangcn efeoz-cn lixiaozhi8811 330305020 dr-heba-13 nayan2167 zarif98sjs

tspdb's Issues

ERROR: syntax error at or near "normalize"

Hi,
I have this problem when I try to create the extension.

postgres=# CREATE EXTENSION tspdb;
ERROR: syntax error at or near "normalize"
postgres=#

Can someone help me?
All lot thanks for your time!

tspdb

Hi,

I installed and ran the script as suggested but i believe all simulations are arima type. Can you please let me know if there is a way to check if the program performs as indicated in research paper or it outputs arima results. thanks :)

can I install tspdb with conda?

The official installation instruction shows the installation approach independent from conda. I am curious if the developers have tested the conda installation? Thanks.

Poor forecast results

I've been experimenting with tspdb with PostgreSQL 13 & Python 3.7 (both from EDB) on macOS. I have a sample dataset that I've previously experimented with using WaveNet models in Tensorflow quite successfully. It represents a simulated number of connections to a Postgres database with multiple seasonalities, such as you might find when users log on and off throughout the day, with quiet and peak times, and fewer users at weekends - all with some noise mixed in to better emulate a real world scenario.

My aim is to be able to predict workload into the near future based on past monitoring data. Unfortunately I'm not getting useful predictions for tspdb.

Here's my data set: data.csv, and a rough graph of it:

I've experimented with various values for the k parameter:

tspdb=# SELECT * FROM list_pindices();
       index_name       | value_columns | relation | time_column |  initial_timestamp  |   last_timestamp    | agg_interval | uncertainty_quantification 
------------------------+---------------+----------+-------------+---------------------+---------------------+--------------+----------------------------
 backends_tsp_k1        | {processes}   | backends | ts          | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00     | t
 backends_tsp_k10       | {processes}   | backends | ts          | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00     | t
 backends_tsp_k2        | {processes}   | backends | ts          | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00     | t
 backends_tsp_k20       | {processes}   | backends | ts          | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00     | t
 backends_tsp_k3        | {processes}   | backends | ts          | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00     | t
 backends_tsp_k4        | {processes}   | backends | ts          | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00     | t
 backends_tsp_k5        | {processes}   | backends | ts          | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00     | t
 backends_tsp_k_default | {processes}   | backends | ts          | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00     | t
(8 rows)

tspdb=# SELECT * FROM pindices_stat();
       index_name       | column_name | number_of_observations | number_of_trained_models |  imputation_score  |   forecast_score   | test_forecast_score 
------------------------+-------------+------------------------+--------------------------+--------------------+--------------------+---------------------
 backends_tsp_k1        | processes   |                   7779 |                        1 | 0.9358105618035297 | 0.8986683760778478 |                    
 backends_tsp_k2        | processes   |                   7779 |                        1 | 0.9543878543235768 | 0.9493582039740182 |                    
 backends_tsp_k3        | processes   |                   7779 |                        1 | 0.9587100784133898 | 0.9573324136174277 |                    
 backends_tsp_k5        | processes   |                   7779 |                        1 | 0.9658566939198344 | 0.9779691559430637 |                    
 backends_tsp_k10       | processes   |                   7779 |                        1 |  0.978661078459557 | 0.9851500580756463 |                    
 backends_tsp_k4        | processes   |                   7779 |                        1 | 0.9624307902466809 | 0.9603959478873095 |                    
 backends_tsp_k20       | processes   |                   7779 |                        1 | 0.9942970665081167 | 0.9934041036452561 |                    
 backends_tsp_k_default | processes   |                   7779 |                        1 | 0.9543878543235768 | 0.9493582039740182 |                    
(8 rows)

The results I get look as follows. Each graph shows the actual data and the predicted data, but note that on some of them the scale changes radically so the actual data may look like a straight line:

Do you have any suggestions as to why this may be happening, and how I might get a reasonable prediction as I do with Tensorflow?

Thanks!

Not able to install tspdb

While trying to install tspdb and testing it, I am getting the below error -

I am following the installation manual - https://tspdb.mit.edu/installation/ don't know where I am getting wrong. Can you please help? Thanks.

Unsatisfactory forecast results

Hi, I was testing this tool to forecast 10 time series, however, in none of the predictions I did I can't consider that the tool returns satisfactory results (Considering that in the examples (notebook_examples/tspDB Example-Multiple Time Series (Real-world Data).ipynb] almost all time series achieve good results). Below you can see some screenshots of the results I have obtained for each time series. The screenshots have been obtained from the following datasets:

The following images have been obtained from tmpseries.csv and all of them have been obtained predicting between 8 and 14 days in the future:

Time Series A:
Time Series B:
Time Series C:
Time Series D:
Time Series E:
Time Series F:
Time Series G:
Time Series H:
Time Series I:

These results were obtained as shown in the following .ipynb:
codemultipleTS.zip

The following images have been obtained from tmpseries2.csv. The first image corresponds to a one-month forecast and the second image to a 10-day forecast.

Time Series one month ahead:
Time Series 10 days ahead:

These results have been obtained using the same code as the previous predictions (evidently changing the dataset path, the prediction dates and the distribution of the data in the training data (80% of the dataset) and test data (20% of the dataset).

Does anyone else get the same results/or have the same problem with other time series? Or am I using this tool wrong?

I obtained these results with the following software versions:

Ubuntu 20.04.1
PostgreSQL 12.10
Python 3.9

name 'plpy' is not defined

Hi,

I'm practicing the "tspDB example-Single time Series (Synthetic data).ipynb" in jupyter notebook. When creating prediction index, I have the error: name 'plpy' is not defined.

error is below: NameError: name 'plpy' is not defined
CONTEXT: Traceback (most recent call last):
PL/Python function "create_pindex", line 2, in
from tspdb.src.pindex.predict import get_prediction_range, get_prediction
PL/Python function "create_pindex", line 2, in
PL/Python function "create_pindex"

But if I execute "select create_pindex('synth_data', 'time','{"ts_7"}','pindex', k =>4);" in PostgreSQL via dbeaver, it can be executed successfully.

How to solve it? Thank you very much.

Tspdb

Hello, I'm working with tspdb time series predict. I do it through TablePlus by linking the series table. In this table, I have 5 rows and 37 columns. My question is, what is the minimum number of rows for a good prediction? With the limited data I have, I adjust hyperparameters and achieve a good prediction. However, when I provide the 38th column, I no longer get a good prediction.

SELECT test_tspdb();

postgres=# SELECT test_tspdb();
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I get the above error. Please can you help.

Setting boundaries for predicted values

Hello,

for my casestudy I would expect to get a much better prediction accuracy, if I would be able to set boundaries for the predicted values. Is there an option to do so? So say minimum possible prediction value is 0?

SELECT test_tspdb(); gives error

Hi,
I followed the instruction and everything is working well up until I get to the "SELECT test_tspdb();" command in postgres. I then get the error below.

postgres=# CREATE EXTENSION tspdb;
CREATE EXTENSION
postgres=# SELECT test_tspdb();
ERROR: ModuleNotFoundError: No module named 'tspdb'
CONTEXT: Traceback (most recent call last):
PL/Python function "test_tspdb", line 2, in
from tspdb.src.database_module.plpy_imp import plpyimp
PL/Python function "test_tspdb"

Any suggestions on how else to troubleshoot would be very much appreciated?

Also, I had to change the permission on both tspdb files copied to...
/usr/share/postgresql/12/extension/

Thanks

Best system requirement for tspdb/PSQL/PlPython

Like many others even though I managed to go through the installation process I am not able to make PlPython3u work on Windows 10.

I get a fatal error when I run SELECT test_tspdb(); or when I try to create and use a simple max(int a, int b) function with PlPython3u.

If you managed to make this lib work can you please describe your system setup (OS/PSQL version/ edb version/ Python version...) so I don't lose too much time trying to fix an incompatibility between win10/PlPython.

Getting error on the module ?

removed all prior python installation and still getting this error. ? Any suggestion.

ERROR: ModuleNotFoundError: No module named 'tspdb'
CONTEXT: Traceback (most recent call last):
PL/Python function "test_tspdb", line 2, in
from tspdb.src.database_module.plpy_imp import plpyimp
PL/Python function "test_tspdb"

No module named sklearn

Faced this issue while trying to test the installation by running the command 'select test_tspdb();'

Although could resolve temporarily by adding 'scikit-learn' in the installation packages in the 'setup.py' file.

But would like to know a better way to handle this.

Thanks!

Demo Use case III: utilizing more than 1000 stocks to create pindex

Hello,

I am currently experimenting with tspDB and have a scenario that can be compared to use case 3.
I have data for the last 50 years on a monthly timestep and the columns are correlated (they actually represent geographical areas).
These areas (or stocks in terminology of use-case 3 in your demo) exceed the dimensional limits of psql as they count over 65000.
I found that the maximum of columns that I can use lies at about 1000, (psql max is 1600).
Is there some way (maybe over multiindex or sth) to make use of all of the data to train the model?

Thanks a lot in advance!

I can install tspdb successfully,but I can't use it test example

everything is ok,but when i run "select test_tspdb();",it always shows Server Disconnected.Is there something I didn't do well？

Predict Function "could not broadcast input array from (2,) to (28,)

I'm writing an application using tspDB that receives live input and makes a prediction based off of it, and while the first two feeds of live data go smoothly and produce a prediction, the third causes the following error:

Any help would be greatly appreciated!

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pandas/io/sql.py", line 2018, in execute
cur.execute(*args, **kwargs)
psycopg2.errors.ExternalRoutineException: ValueError: could not broadcast input array from shape (2,) into shape (28,)
CONTEXT: Traceback (most recent call last):
PL/Python function "predict", line 13, in
prediction,interval = get_prediction( index_name_, table_name, value_column, plpyimp(plpy), t, uq, projected = projected, uq_method = uq_method, c = c)
PL/Python function "predict", line 225, in get_prediction
PL/Python function "predict", line 493, in _get_forecast_range
PL/Python function "predict"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/brodymadd/Autotrader/main.py", line 67, in
c.run(handle_msg=handle_msg)
File "/home/brodymadd/.local/lib/python3.10/site-packages/polygon/websocket/init.py", line 180, in run
asyncio.run(self.connect(handle_msg_wrapper, close_timeout, **kwargs))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/home/brodymadd/.local/lib/python3.10/site-packages/polygon/websocket/init.py", line 146, in connect
await processor(cmsg) # type: ignore
File "/home/brodymadd/.local/lib/python3.10/site-packages/polygon/websocket/init.py", line 178, in handle_msg_wrapper
handle_msg(msgs)
File "/home/brodymadd/Autotrader/main.py", line 64, in handle_msg
predictor.predict_single(ta_row)
File "/home/brodymadd/Autotrader/tspDB.py", line 89, in predict_single
pdf = pd.read_sql_query(q%int(data[0]), conn)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/sql.py", line 397, in read_sql_query
return pandas_sql.read_query(
File "/usr/local/lib/python3.10/dist-packages/pandas/io/sql.py", line 2078, in read_query
cursor = self.execute(*args)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/sql.py", line 2030, in execute
raise ex from exc
pandas.errors.DatabaseError: Execution failed on sql 'select * from predict('aapl1m', 'close', 2184940, 'pindex_aapl1m', c => 98);': ValueError: could not broadcast input array from shape (2,) into shape (28,)
CONTEXT: Traceback (most recent call last):
PL/Python function "predict", line 13, in
prediction,interval = get_prediction( index_name_, table_name, value_column, plpyimp(plpy), t, uq, projected = projected, uq_method = uq_method, c = c)
PL/Python function "predict", line 225, in get_prediction
PL/Python function "predict", line 493, in _get_forecast_range
PL/Python function "predict"

Is there a method within the tspDB, to discover the correlation between columns of a matrix?

I am trying to understand the atomic behavior of the library by playing with it within PostgreSQL.

I have the following data in my test_data table:

 t | v1 | v2 
---+----+----
 1 |  1 |  0
 2 |  2 |  2
 3 |  3 |  4
 4 |  3 |  6
 5 |  3 |  6
 6 |  4 |  6
 7 |  5 |  8
(7 rows)

Please note the correlation between columns are $v2(t)=2.v1(t-1)$.

For example let's take $t=6$.

$v2(6) = 2 . v1(5) = 2 . 3 = 6$

So I am willing the library to discover (or approximate) this correlation within the matrix.

To experiment I run the following SQL commands:

drop table if exists test_data;
create table test_data
(
	t int,
	v1 numeric,
	v2 numeric
);

insert into test_data (t, v1, v2) values
	(1, 1, 0),
	(2, 2, 2),
	(3, 3, 4),
	(4, 3, 6),
	(5, 3, 6),
	(6, 4, 6),
	(7, 5, 8);

select delete_pindex('test_data_index1');
select delete_pindex('test_data_index2');
select delete_pindex('test_data_index3');
select create_pindex('test_data', 't', '{"v1"}', 'test_data_index1');
select create_pindex('test_data', 't', '{"v2"}', 'test_data_index2');
select create_pindex('test_data', 't', '{"v1", "v2"}', 'test_data_index3');

Unfortunately, predictions seem just the average of the columns. When I change the values of either column, the prediction of the other column is never changed. (Possibly I am not able to supply the correct parameters to the predict function.)

The outputs are:

select 'Predicting:v1, Using:test_data_index1' as Prediction_Type, a.* from predict('test_data', 'v1', 8, 'test_data_index1') as a;

            prediction_type            | prediction | lb | ub 
---------------------------------------+------------+----+----
 Predicting:v1, Using:test_data_index1 |          3 |  3 |  3
(1 row)

select 'Predicting:v1, Using:test_data_index3' as Prediction_Type, a.* from predict('test_data', 'v1', 8, 'test_data_index3') as a;

            prediction_type            | prediction | lb | ub 
---------------------------------------+------------+----+----
 Predicting:v1, Using:test_data_index3 |          3 |  3 |  3
(1 row)

select 'Predicting:v2, Using:test_data_index2' as Prediction_Type, a.* from predict('test_data', 'v2', 8, 'test_data_index2') as a;

            prediction_type            |          prediction           |              lb               |              ub               
---------------------------------------+-------------------------------+-------------------------------+-------------------------------
 Predicting:v2, Using:test_data_index2 | 4.571428571428571428571428571 | 4.571428571428571428571428571 | 4.571428571428571428571428571
(1 row)

select 'Predicting:v2, Using:test_data_index3' as Prediction_Type, a.* from predict('test_data', 'v2', 8, 'test_data_index3') as a;

            prediction_type            |          prediction           |              lb               |              ub               
---------------------------------------+-------------------------------+-------------------------------+-------------------------------
 Predicting:v2, Using:test_data_index3 | 4.571428571428571428571428571 | 4.571428571428571428571428571 | 4.571428571428571428571428571
(1 row)

Hence, long question in short, is there a method within the tspDB, to discover the correlation between columns of a matrix?

tspdb crashing

Hi,
I have been testing tspdb on few datasets and noticed it crashes sometimes. It originally happens because calling create_pindex on a dataset throws:
PL/Python function "create_pindex", line 268, in create_index
PL/Python function "create_pindex", line 309, in update_model
PL/Python function "create_pindex", line 140, in update_model
PL/Python function "create_pindex", line 238, in fitModels
PL/Python function "create_pindex", line 222, in fit
PL/Python function "create_pindex", line 119, in _computeWeights
PL/Python function "create_pindex", line 77, in matrixFromSVD

Looking into it I realized it happens because in line 55 of algorithms/svdWrapper.py, you are returnig a list instead of datafram. so I changed it to:
return (np.array([]), np.array([]), np.array([]))

Then the error changes:
ERROR: IndexError: invalid index to scalar variable.
CONTEXT: Traceback (most recent call last):
PL/Python function "create_pindex", line 18, in
TSPD.create_index()
PL/Python function "create_pindex", line 268, in create_index
PL/Python function "create_pindex", line 309, in update_model
PL/Python function "create_pindex", line 140, in update_model
PL/Python function "create_pindex", line 238, in fitModels
PL/Python function "create_pindex", line 222, in fit
PL/Python function "create_pindex", line 123, in _computeWeights
PL/Python function "create_pindex"

And this happens because in previous part returned empty dataframe. looking into why that line is executed, it is because k==0 (kSingularValue==0)

Then I traced back and it happens because in prediction_models/ts_svd_model.py fit function, this code:
(self.sk, self.Uk, self.Vk) = svdMod.reconstructMatrix(self.kSingularValues, returnMatrix=False)
if self.kSingularValues is None:
self.kSingularValues= len(self.sk)
will change kSingularValues to 0 (somehow self.sk is 0)

That's where I stopped and could not check further.
If you want, I can send you my dataset to simulate the problem.

Thanks!

python script select create_pindex call error

I wrote a simple python3.5 script to test the create_pindex call taken from the 2020 NeurLPS tspDB Demo.ipynb example. I run the example first in colab then converted into a .py file. Used the cursor.execute call in a simpler test file a shown below.

import psycopg2
import tspdb

conn = psycopg2.connect(dbname="postgres",
user="demo",
host="127.0.0.1",
password="00",
port="5432")
cursor = conn.cursor()
#cursor.execute('SELECT * FROM electricity')
cursor.execute("""select create_pindex('electricity', 'time','{"h1","h2","h3","h4","h5","h6","h7","h8","h9","h10"}','pindex1');""")
#rows = cursor.fetchall()
#for table in rows:
#print(table)
conn.close()

When run I get the following error:

Traceback (most recent call last):
File "testconnection.py", line 13, in
cursor.execute("""select create_pindex('electricity', 'time','{"h1","h2","h3","h4","h5","h6","h7","h8","h9","h10"}','pindex1');""")
psycopg2.errors.ExternalRoutineException: TypeError: unsupported operand type(s) for %: 'Int64Index' and 'int'
CONTEXT: Traceback (most recent call last):
PL/Python function "create_pindex", line 18, in
TSPD.create_index()
PL/Python function "create_pindex", line 269, in create_index
PL/Python function "create_pindex", line 354, in write_model
PL/Python function "create_pindex", line 488, in write_tsmm_model
PL/Python function "create_pindex"

Tried python2.7, python3.5
postgresql version - PostgreSQL 12.7 (Ubuntu 12.7-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit

Ubuntu 16.04

Thanks

Wilbert Jackson

Cannot reproduce the data loading process as the demo Jupyter notebook

Hi, I wanted to reproduce the financial parts of the demo results. I performed the following steps and seems not working properly:

Copied the tspDB-Demo-checkpoint.ipynb to the notebook_examples and renamed to be "tspDB-Demo-checkpoint-financial.ipynb"
I expanded the load_data.sql inside the demo_data_v3.zip and copied the data, .DS_Store, load_data.sql folders/files to the same base directory as follows:

But when I try to execute the following line (I entered the password in the Anaconda prompt), it only show lines of drop & create tables, without any data copying process:

Is that expected? I thought there should be some data copy process as shown in the demo notebook:

abdullaho / tspdb Goto Github PK

tspdb's People

Contributors

Stargazers

Watchers

Forkers

tspdb's Issues

Recommend Projects

Recommend Topics

Recommend Org