Fundamentals of a Time series forecasting problem using Tensorflow.
In this repository we will be building a Bitcoin price predictor (BitPredict ๐ฐ๐) using it's historical price over the years.
Time series problems deal with data over time.
Such as, the number of staff members in a company over 10-years, sales of computers for the past 5-years, electricity usage for the past 50-years.
The timeline can be short (seconds/minutes) or long (years/decades). And the problems you might investigate using can usually be broken down into two categories.
Problem Type | Examples | Output |
---|---|---|
Classification | Anomaly detection, time series identification (where did this time series come from?) | Discrete (a label) |
Forecasting | Predicting stock market prices, forecasting future demand for a product, stocking inventory requirements | Continuous (a number) |
In both above cases, a supervised learning approach is often used. Meaning, you'd have some example data and a label associated with that data.
For example, in forecasting the price of Bitcoin, your data could be the historical price of Bitcoin for the past month and the label could be today's price (the label can't be tomorrow's price because that's what we'd want to predict).
- Get time series data (the historical price of Bitcoin)
- Load in time series data using pandas/Python's CSV module
- Format data for a time series problem
- Creating training and test sets (the wrong way)
- Creating training and test sets (the right way)
- Visualizing time series data
- Turning time series data into a supervised learning problem (windowing)
- Preparing univariate and multivariate (more than one variable) data
- Evaluating a time series forecasting model
- Setting up a series of deep learning modelling experiments
- Dense (fully-connected) networks
- Sequence models (LSTM and 1D CNN)
- Ensembling (combining multiple models together)
- Multivariate models
- Replicating the N-BEATS algorithm using TensorFlow layer subclassing
- Creating a modelling checkpoint to save the best performing model during training
- Making predictions (forecasts) with a time series model
- Creating prediction intervals for time series model forecasts
- Discussing two different types of uncertainty in machine learning (data uncertainty and model uncertainty)
- Demonstrating why forecasting in an open system is BS (the turkey problem)
- Does scaling the data help for univariate/multivariate data? (e.g. getting all of the values between 0 & 1)
- Try doing this for a univariate model (e.g.
model_1
) and a multivariate model (e.g.model_6
) and see if it effects model training or evaluation results.
- Try doing this for a univariate model (e.g.
- Get the most up to date data on Bitcoin, train a model & see how it goes (our data goes up to May 18 2021).
- You can download the Bitcoin historical data for free from coindesk.com/price/bitcoin and clicking "Export Data" -> "CSV".
- For most of our models we used
WINDOW_SIZE=7
, but is there a better window size?- Setup a series of experiments to find whether or not there's a better window size.
- For example, you might train 10 different models with
HORIZON=1
but with window sizes ranging from 2-12.
- Create a windowed dataset just like the ones we used for
model_1
using tf.keras.preprocessing.timeseries_dataset_from_array() and retrainmodel_1
using the recreated dataset. - For our multivariate modelling experiment, we added the Bitcoin block reward size as an extra feature to make our time series multivariate.
- Are there any other features you think you could add?
- If so, try it out, how do these affect the model?
- Make prediction intervals for future forecasts. To do so, one way would be to train an ensemble model on all of the data, make future forecasts with it and calculate the prediction intervals of the ensemble just like we did for
model_8
. - For future predictions, try to make a prediction, retrain a model on the predictions, make a prediction, retrain a model, make a prediction, retrain a model, make a prediction (retrain a model each time a new prediction is made). Plot the results, how do they look compared to the future predictions where a model wasn't retrained for every forecast (
model_9
)? - Throughout this notebook, we've only tried algorithms we've handcrafted ourselves. But it's worth seeing how a purpose built forecasting algorithm goes.
- Try out one of the extra algorithms listed in the modelling experiments part such as:
- Facebook's Kats library - there are many models in here, remember the machine learning practioner's motto: experiment, experiment, experiment.
- LinkedIn's Greykite library
- Try out one of the extra algorithms listed in the modelling experiments part such as: