abdullaho / tspdb Goto Github PK
View Code? Open in Web Editor NEWtspdb: Time Series Predict DB
License: Apache License 2.0
tspdb: Time Series Predict DB
License: Apache License 2.0
Hi,
I have this problem when I try to create the extension.
postgres=# CREATE EXTENSION tspdb;
ERROR: syntax error at or near "normalize"
postgres=#
Can someone help me?
All lot thanks for your time!
Hi,
I installed and ran the script as suggested but i believe all simulations are arima type. Can you please let me know if there is a way to check if the program performs as indicated in research paper or it outputs arima results. thanks :)
The official installation instruction shows the installation approach independent from conda. I am curious if the developers have tested the conda installation? Thanks.
Hi
I've been experimenting with tspdb with PostgreSQL 13 & Python 3.7 (both from EDB) on macOS. I have a sample dataset that I've previously experimented with using WaveNet models in Tensorflow quite successfully. It represents a simulated number of connections to a Postgres database with multiple seasonalities, such as you might find when users log on and off throughout the day, with quiet and peak times, and fewer users at weekends - all with some noise mixed in to better emulate a real world scenario.
My aim is to be able to predict workload into the near future based on past monitoring data. Unfortunately I'm not getting useful predictions for tspdb.
Here's my data set: data.csv, and a rough graph of it:
I've experimented with various values for the k parameter:
tspdb=# SELECT * FROM list_pindices();
index_name | value_columns | relation | time_column | initial_timestamp | last_timestamp | agg_interval | uncertainty_quantification
------------------------+---------------+----------+-------------+---------------------+---------------------+--------------+----------------------------
backends_tsp_k1 | {processes} | backends | ts | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00 | t
backends_tsp_k10 | {processes} | backends | ts | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00 | t
backends_tsp_k2 | {processes} | backends | ts | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00 | t
backends_tsp_k20 | {processes} | backends | ts | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00 | t
backends_tsp_k3 | {processes} | backends | ts | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00 | t
backends_tsp_k4 | {processes} | backends | ts | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00 | t
backends_tsp_k5 | {processes} | backends | ts | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00 | t
backends_tsp_k_default | {processes} | backends | ts | 2020-10-06 13:25:01 | 2020-11-02 13:35:01 | 00:05:00 | t
(8 rows)
tspdb=# SELECT * FROM pindices_stat();
index_name | column_name | number_of_observations | number_of_trained_models | imputation_score | forecast_score | test_forecast_score
------------------------+-------------+------------------------+--------------------------+--------------------+--------------------+---------------------
backends_tsp_k1 | processes | 7779 | 1 | 0.9358105618035297 | 0.8986683760778478 |
backends_tsp_k2 | processes | 7779 | 1 | 0.9543878543235768 | 0.9493582039740182 |
backends_tsp_k3 | processes | 7779 | 1 | 0.9587100784133898 | 0.9573324136174277 |
backends_tsp_k5 | processes | 7779 | 1 | 0.9658566939198344 | 0.9779691559430637 |
backends_tsp_k10 | processes | 7779 | 1 | 0.978661078459557 | 0.9851500580756463 |
backends_tsp_k4 | processes | 7779 | 1 | 0.9624307902466809 | 0.9603959478873095 |
backends_tsp_k20 | processes | 7779 | 1 | 0.9942970665081167 | 0.9934041036452561 |
backends_tsp_k_default | processes | 7779 | 1 | 0.9543878543235768 | 0.9493582039740182 |
(8 rows)
The results I get look as follows. Each graph shows the actual data and the predicted data, but note that on some of them the scale changes radically so the actual data may look like a straight line:
Do you have any suggestions as to why this may be happening, and how I might get a reasonable prediction as I do with Tensorflow?
Thanks!
While trying to install tspdb and testing it, I am getting the below error -
I am following the installation manual - https://tspdb.mit.edu/installation/ don't know where I am getting wrong. Can you please help? Thanks.
Hi, I was testing this tool to forecast 10 time series, however, in none of the predictions I did I can't consider that the tool returns satisfactory results (Considering that in the examples (notebook_examples/tspDB Example-Multiple Time Series (Real-world Data).ipynb] almost all time series achieve good results). Below you can see some screenshots of the results I have obtained for each time series. The screenshots have been obtained from the following datasets:
The following images have been obtained from tmpseries.csv and all of them have been obtained predicting between 8 and 14 days in the future:
These results were obtained as shown in the following .ipynb:
codemultipleTS.zip
The following images have been obtained from tmpseries2.csv. The first image corresponds to a one-month forecast and the second image to a 10-day forecast.
These results have been obtained using the same code as the previous predictions (evidently changing the dataset path, the prediction dates and the distribution of the data in the training data (80% of the dataset) and test data (20% of the dataset).
Does anyone else get the same results/or have the same problem with other time series? Or am I using this tool wrong?
I obtained these results with the following software versions:
Hi,
I'm practicing the "tspDB example-Single time Series (Synthetic data).ipynb" in jupyter notebook. When creating prediction index, I have the error: name 'plpy' is not defined.
error is below: NameError: name 'plpy' is not defined
CONTEXT: Traceback (most recent call last):
PL/Python function "create_pindex", line 2, in
from tspdb.src.pindex.predict import get_prediction_range, get_prediction
PL/Python function "create_pindex", line 2, in
PL/Python function "create_pindex"
But if I execute "select create_pindex('synth_data', 'time','{"ts_7"}','pindex', k =>4);" in PostgreSQL via dbeaver, it can be executed successfully.
How to solve it? Thank you very much.
Hello, I'm working with tspdb time series predict. I do it through TablePlus by linking the series table. In this table, I have 5 rows and 37 columns. My question is, what is the minimum number of rows for a good prediction? With the limited data I have, I adjust hyperparameters and achieve a good prediction. However, when I provide the 38th column, I no longer get a good prediction.
postgres=# SELECT test_tspdb();
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
I get the above error. Please can you help.
Hello,
for my casestudy I would expect to get a much better prediction accuracy, if I would be able to set boundaries for the predicted values. Is there an option to do so? So say minimum possible prediction value is 0?
Hi,
I followed the instruction and everything is working well up until I get to the "SELECT test_tspdb();" command in postgres. I then get the error below.
postgres=# CREATE EXTENSION tspdb;
CREATE EXTENSION
postgres=# SELECT test_tspdb();
ERROR: ModuleNotFoundError: No module named 'tspdb'
CONTEXT: Traceback (most recent call last):
PL/Python function "test_tspdb", line 2, in
from tspdb.src.database_module.plpy_imp import plpyimp
PL/Python function "test_tspdb"
Any suggestions on how else to troubleshoot would be very much appreciated?
Also, I had to change the permission on both tspdb files copied to...
/usr/share/postgresql/12/extension/
Thanks
Like many others even though I managed to go through the installation process I am not able to make PlPython3u work on Windows 10.
I get a fatal error when I run SELECT test_tspdb(); or when I try to create and use a simple max(int a, int b) function with PlPython3u.
If you managed to make this lib work can you please describe your system setup (OS/PSQL version/ edb version/ Python version...) so I don't lose too much time trying to fix an incompatibility between win10/PlPython.
removed all prior python installation and still getting this error. ? Any suggestion.
ERROR: ModuleNotFoundError: No module named 'tspdb'
CONTEXT: Traceback (most recent call last):
PL/Python function "test_tspdb", line 2, in
from tspdb.src.database_module.plpy_imp import plpyimp
PL/Python function "test_tspdb"
Faced this issue while trying to test the installation by running the command 'select test_tspdb();'
Although could resolve temporarily by adding 'scikit-learn' in the installation packages in the 'setup.py' file.
But would like to know a better way to handle this.
Thanks!
Hello,
I am currently experimenting with tspDB and have a scenario that can be compared to use case 3.
I have data for the last 50 years on a monthly timestep and the columns are correlated (they actually represent geographical areas).
These areas (or stocks in terminology of use-case 3 in your demo) exceed the dimensional limits of psql as they count over 65000.
I found that the maximum of columns that I can use lies at about 1000, (psql max is 1600).
Is there some way (maybe over multiindex or sth) to make use of all of the data to train the model?
Thanks a lot in advance!
everything is ok,but when i run "select test_tspdb();",it always shows Server Disconnected.Is there something I didn't do well?
I'm writing an application using tspDB that receives live input and makes a prediction based off of it, and while the first two feeds of live data go smoothly and produce a prediction, the third causes the following error:
Any help would be greatly appreciated!
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pandas/io/sql.py", line 2018, in execute
cur.execute(*args, **kwargs)
psycopg2.errors.ExternalRoutineException: ValueError: could not broadcast input array from shape (2,) into shape (28,)
CONTEXT: Traceback (most recent call last):
PL/Python function "predict", line 13, in
prediction,interval = get_prediction( index_name_, table_name, value_column, plpyimp(plpy), t, uq, projected = projected, uq_method = uq_method, c = c)
PL/Python function "predict", line 225, in get_prediction
PL/Python function "predict", line 493, in _get_forecast_range
PL/Python function "predict"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/brodymadd/Autotrader/main.py", line 67, in
c.run(handle_msg=handle_msg)
File "/home/brodymadd/.local/lib/python3.10/site-packages/polygon/websocket/init.py", line 180, in run
asyncio.run(self.connect(handle_msg_wrapper, close_timeout, **kwargs))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/home/brodymadd/.local/lib/python3.10/site-packages/polygon/websocket/init.py", line 146, in connect
await processor(cmsg) # type: ignore
File "/home/brodymadd/.local/lib/python3.10/site-packages/polygon/websocket/init.py", line 178, in handle_msg_wrapper
handle_msg(msgs)
File "/home/brodymadd/Autotrader/main.py", line 64, in handle_msg
predictor.predict_single(ta_row)
File "/home/brodymadd/Autotrader/tspDB.py", line 89, in predict_single
pdf = pd.read_sql_query(q%int(data[0]), conn)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/sql.py", line 397, in read_sql_query
return pandas_sql.read_query(
File "/usr/local/lib/python3.10/dist-packages/pandas/io/sql.py", line 2078, in read_query
cursor = self.execute(*args)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/sql.py", line 2030, in execute
raise ex from exc
pandas.errors.DatabaseError: Execution failed on sql 'select * from predict('aapl1m', 'close', 2184940, 'pindex_aapl1m', c => 98);': ValueError: could not broadcast input array from shape (2,) into shape (28,)
CONTEXT: Traceback (most recent call last):
PL/Python function "predict", line 13, in
prediction,interval = get_prediction( index_name_, table_name, value_column, plpyimp(plpy), t, uq, projected = projected, uq_method = uq_method, c = c)
PL/Python function "predict", line 225, in get_prediction
PL/Python function "predict", line 493, in _get_forecast_range
PL/Python function "predict"
I am trying to understand the atomic behavior of the library by playing with it within PostgreSQL.
I have the following data in my test_data
table:
t | v1 | v2
---+----+----
1 | 1 | 0
2 | 2 | 2
3 | 3 | 4
4 | 3 | 6
5 | 3 | 6
6 | 4 | 6
7 | 5 | 8
(7 rows)
Please note the correlation between columns are
For example let's take
So I am willing the library to discover (or approximate) this correlation within the matrix.
To experiment I run the following SQL commands:
drop table if exists test_data;
create table test_data
(
t int,
v1 numeric,
v2 numeric
);
insert into test_data (t, v1, v2) values
(1, 1, 0),
(2, 2, 2),
(3, 3, 4),
(4, 3, 6),
(5, 3, 6),
(6, 4, 6),
(7, 5, 8);
select delete_pindex('test_data_index1');
select delete_pindex('test_data_index2');
select delete_pindex('test_data_index3');
select create_pindex('test_data', 't', '{"v1"}', 'test_data_index1');
select create_pindex('test_data', 't', '{"v2"}', 'test_data_index2');
select create_pindex('test_data', 't', '{"v1", "v2"}', 'test_data_index3');
Unfortunately, predictions seem just the average of the columns. When I change the values of either column, the prediction of the other column is never changed. (Possibly I am not able to supply the correct parameters to the predict
function.)
The outputs are:
select 'Predicting:v1, Using:test_data_index1' as Prediction_Type, a.* from predict('test_data', 'v1', 8, 'test_data_index1') as a;
prediction_type | prediction | lb | ub
---------------------------------------+------------+----+----
Predicting:v1, Using:test_data_index1 | 3 | 3 | 3
(1 row)
select 'Predicting:v1, Using:test_data_index3' as Prediction_Type, a.* from predict('test_data', 'v1', 8, 'test_data_index3') as a;
prediction_type | prediction | lb | ub
---------------------------------------+------------+----+----
Predicting:v1, Using:test_data_index3 | 3 | 3 | 3
(1 row)
select 'Predicting:v2, Using:test_data_index2' as Prediction_Type, a.* from predict('test_data', 'v2', 8, 'test_data_index2') as a;
prediction_type | prediction | lb | ub
---------------------------------------+-------------------------------+-------------------------------+-------------------------------
Predicting:v2, Using:test_data_index2 | 4.571428571428571428571428571 | 4.571428571428571428571428571 | 4.571428571428571428571428571
(1 row)
select 'Predicting:v2, Using:test_data_index3' as Prediction_Type, a.* from predict('test_data', 'v2', 8, 'test_data_index3') as a;
prediction_type | prediction | lb | ub
---------------------------------------+-------------------------------+-------------------------------+-------------------------------
Predicting:v2, Using:test_data_index3 | 4.571428571428571428571428571 | 4.571428571428571428571428571 | 4.571428571428571428571428571
(1 row)
Hence, long question in short, is there a method within the tspDB, to discover the correlation between columns of a matrix?
Hi,
I have been testing tspdb on few datasets and noticed it crashes sometimes. It originally happens because calling create_pindex on a dataset throws:
PL/Python function "create_pindex", line 268, in create_index
PL/Python function "create_pindex", line 309, in update_model
PL/Python function "create_pindex", line 140, in update_model
PL/Python function "create_pindex", line 238, in fitModels
PL/Python function "create_pindex", line 222, in fit
PL/Python function "create_pindex", line 119, in _computeWeights
PL/Python function "create_pindex", line 77, in matrixFromSVD
Looking into it I realized it happens because in line 55 of algorithms/svdWrapper.py, you are returnig a list instead of datafram. so I changed it to:
return (np.array([]), np.array([]), np.array([]))
Then the error changes:
ERROR: IndexError: invalid index to scalar variable.
CONTEXT: Traceback (most recent call last):
PL/Python function "create_pindex", line 18, in
TSPD.create_index()
PL/Python function "create_pindex", line 268, in create_index
PL/Python function "create_pindex", line 309, in update_model
PL/Python function "create_pindex", line 140, in update_model
PL/Python function "create_pindex", line 238, in fitModels
PL/Python function "create_pindex", line 222, in fit
PL/Python function "create_pindex", line 123, in _computeWeights
PL/Python function "create_pindex"
And this happens because in previous part returned empty dataframe. looking into why that line is executed, it is because k==0 (kSingularValue==0)
Then I traced back and it happens because in prediction_models/ts_svd_model.py fit function, this code:
(self.sk, self.Uk, self.Vk) = svdMod.reconstructMatrix(self.kSingularValues, returnMatrix=False)
if self.kSingularValues is None:
self.kSingularValues= len(self.sk)
will change kSingularValues to 0 (somehow self.sk is 0)
That's where I stopped and could not check further.
If you want, I can send you my dataset to simulate the problem.
Thanks!
I wrote a simple python3.5 script to test the create_pindex call taken from the 2020 NeurLPS tspDB Demo.ipynb example. I run the example first in colab then converted into a .py file. Used the cursor.execute call in a simpler test file a shown below.
import psycopg2
import tspdb
conn = psycopg2.connect(dbname="postgres",
user="demo",
host="127.0.0.1",
password="00",
port="5432")
cursor = conn.cursor()
#cursor.execute('SELECT * FROM electricity')
cursor.execute("""select create_pindex('electricity', 'time','{"h1","h2","h3","h4","h5","h6","h7","h8","h9","h10"}','pindex1');""")
#rows = cursor.fetchall()
#for table in rows:
#print(table)
conn.close()
When run I get the following error:
Traceback (most recent call last):
File "testconnection.py", line 13, in
cursor.execute("""select create_pindex('electricity', 'time','{"h1","h2","h3","h4","h5","h6","h7","h8","h9","h10"}','pindex1');""")
psycopg2.errors.ExternalRoutineException: TypeError: unsupported operand type(s) for %: 'Int64Index' and 'int'
CONTEXT: Traceback (most recent call last):
PL/Python function "create_pindex", line 18, in
TSPD.create_index()
PL/Python function "create_pindex", line 269, in create_index
PL/Python function "create_pindex", line 354, in write_model
PL/Python function "create_pindex", line 488, in write_tsmm_model
PL/Python function "create_pindex"
Tried python2.7, python3.5
postgresql version - PostgreSQL 12.7 (Ubuntu 12.7-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit
Ubuntu 16.04
Thanks
Wilbert Jackson
Hi, I wanted to reproduce the financial parts of the demo results. I performed the following steps and seems not working properly:
But when I try to execute the following line (I entered the password in the Anaconda prompt), it only show lines of drop & create tables, without any data copying process:
Is that expected? I thought there should be some data copy process as shown in the demo notebook:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.