sky-uk / anticipy Goto Github PK

View Code? Open in Web Editor NEW

82.0 82.0 12.0 356 KB

A Python library for time series forecasting

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

forecasting python regression time-series

anticipy's People

Contributors

Stargazers

Watchers

Forkers

argysamo gichoc valeman amirunpri2018 sahanduiuc rz0718 watkyns eolix fagan2888 wallace-163 bezes tarunkhanna1112

anticipy's Issues

Add performance benchmarks

Before we start to optimise performance, we need to set up some benchmarks. Pandas uses the asv library (https://github.com/airspeed-velocity/asv), we can try that out.

Error in setup.py requirements

setup.py states that 'pandas>=0.20.3' is required. However, this should be pandas>=0.23 , due to changes in the pandas API

Forecast error: seasonality with missing values

We are getting an error where our seasonality models are applied to time series with gaps, if the gaps are aligned with the seasonality period. For example a time series with daily samples here all Friday samples are missing. In this case, the weekly seasonality parameter for Fridays will be fitted to a random value.

def array_zeros_in_indices(n, l_indices):
    return (~np.isin(np.arange(0, n), l_indices)).astype(float)

# Original time series, no gaps
df1 = pd.DataFrame({'y': np.full(14,10.0)+np.random.normal(0.0, 0.1, 14),
                    'source':'src1',
                   'date':pd.date_range('2018-01-01', periods=14, freq='D')})

# Copy of df1 with gaps on same weekday
df2 = df1.copy()
df2['weight']=array_zeros_in_indices(14,[5,12])
df2['source']='src2'
dict_forecast1 = forecast.run_forecast(df1, extrapolate_years=0.1, simplify_output=False,
                                      l_model_trend=forecast_models.model_linear+forecast_models.model_season_wday,
                                      l_model_season=[],
                                      l_model_naive=[],
                                     include_all_fits=True)
df_forecast1 = dict_forecast1['data']
print df_forecast1.tail(3)

dict_forecast2 = forecast.run_forecast(df2, extrapolate_years=0.1, simplify_output=False,
                                      l_model_trend=forecast_models.model_linear+forecast_models.model_season_wday,
                                      l_model_season=[],
                                      l_model_naive=[],
                                     include_all_fits=True)
df_forecast2 = dict_forecast2['data']
print df_forecast2.tail(3)

df_forecast = pd.concat([df_forecast1, df_forecast2], ignore_index=True)

We should run a check before model fitting that identifies these scenarios and skips attempting a fit for that model.

User not warned when trying to install on unsupported python version

We have defined dependencies to specific python versions in setup.py, to address some incompatibilities. This has unintended consequences:

When trying to install anticipy on an unsupported python version (e.g. 2.7.10), anticipy v0.0.2 will be installed. This is the last version before we implemented the python version requirements.
If we try to install a specific anticipy version (e.g. 0.1.0) on python 2.7.10, installation fails with a 'no matching distribution found' message. This is very confusing, since the distribution is in pypi, and the actual reason for the error is an unsupported python version.

This looks like a setuptools issue, and I don't know if there's much we can do about it. But we should definitely add this to the FAQ (when we make one), and possibly add a note somewhere in the docs. We may also want to remove anticipy 0.0.2 and earlier from pypi - we don't want users getting that by default.

Get rid of unused code in forecast_models.get_model_outliers()

Improve verbose output in forecast.run_forecast()

Right now, the only verbose output displayed is Running forecast for source: src, in case there is more than one sources. If only one source is provided, no extra output is displayed.

Incompatibility with python 2.7.10

An error has been identified when installing the library while using python 2.7.10. We should set a requirement for python 2.7.11 or greater in setup.py:

setup(name="my_package_name",
      python_requires='>3.5.2')

See https://stackoverflow.com/a/48777286

Unify duplicate functions for outlier detection

Functions get_model_outliers and find_steps_and_spikes in forecast_models.py have the same functionality.

Add test coverage

Error in python 3: print statement in tests/test_forecast.py

A print statement added to test_forecast while debugging is causing crashes on python 3. We should remove it

Fill the missing gaps in the Github community profile

https://github.com/sky-uk/anticipy/community

3 missing elements:

Code of conduct
Issue templates
Pull request template

Make library naming consistent across documentation

Library name should be AnticiPy, we are currently inconsistent in capitalisation.

Also, fix a number of documentation typos that have been found.

Make path an optional input variable in forecast_plot.plot_forecast()

When using output='jupyter', no path is needed.
When output='png' or output='html' we need to verify path is not empty.

In readthedocs, images are not shown

In https://anticipy.readthedocs.io/en/latest/tutorial.html , there is an image that is not rendered, .static/images/tutorial-forecast1.png. This works fine when building the sphinx docs locally. We need to fix this, and find if there is a way to test for readthedocs-specific bugs without merging to master.

In forecast_model docs, conflict between sphinx docs and pep-8 annotations

Some code lines include #noqa to ignore pep-8 checks. However, that is messing with the output of some sphinx docs, such as anticipy.forecast_models.ForecastModel:118

Rename linear non-decreasing model

Change from 'linear' to 'linear_nondec'

Calling numpy.unique() on lists of functions causes errors in some python versions

We sometimes want to combine lists of functions or ForecastModels while removing duplicates. The current approach, np.unique(), works on our local environments but causes errors in some CI environments. We should try replacing it with:

list(set(my_list_of_functions))

Add detailed code comment to forecast_models.get_model_outliers()

While working on #6 , we found that the code in forecast_models.get_model_outliers() is hard to follow without additional comments. We should expand the function documentation.

In forecast.plot_forecast(), reduce space between subplots with plotly

plotly.tools.make_subplots() has 2 arguments, horizontal_spacing and vertical_spacing, that adjust the space between subplots. Default values are too high for us, we should adjust them to get nicer plots.

Prepare for release 0.1.3

Minor update to fix a documentation issue in 0.1.2

Add travis-ci configuration

Implement new model components, InputTransform, OutputTransform

Our forecast logic uses ForecastModel objects that encapsulate model functions and add additional features such as:

model composition: add and multiply models with the '+' and '*' operators
parameter initialisation
parameter boundaries
input validation

This gives us great flexibility in defining our models, but still falls short in certain scenarios. We should implement new components to allow us to transform our input series in specific ways:

InputTransform would transform the input time series prior to computing the residuals when fitting, then reverse the transformation when computing the model output. The primary application of this component would be to implement Box-Cox transformations
OutputTransform would transform the output of the model (both when computing the residuals and when generating a forecast). This would also apply to confidence intervals. The main application of this component would be to ensure that certain outputs are always positive.

The current plan for using these components is that they would be composed with ForecastModels, using an operator other than + or *. '|' Would be a great option, if available. A model using these features could look as follows:

itrans_boxcox | model_linear + season_wday | otrans_positive

Specific instances of these model transformations, such as the Box-Cox transform and the positive output transform, would be defined in separate git issues.

Convert pypi long description to restructuredtext

Project description in pypi is not correctly rendered. This is because it is written in markdown, which pypi doesn't support. We should use restructuredtext instead.

pip-licenses reports UNKNOWN license for anticipy

Prepare release 0.1.0

Migrate sphinx documentation

The original project had docs generated with sphinx. We need to move them to Github.

Implement parallel processing with Dask

We have been experimenting with using dask (https://dask.org/) to support parallel processing. Unfortunately, the code for some of these experiments has been left in tests.test_forecast.py in the main branch. We should move this unfinished code to a separate branch and complete this feature.

We will need to close #37 first in order to evaluate any performance gains achieved with dask.

Prepare for release v0.1.1

Set up readthedocs

Although we already have working documentation set up in GitHub pages, we have decided to use readthedocs instead. Readthedocs offers easy integration with our sphinx docs, eliminating the need to manually update the documentation and push to github.

However, using readthedocs will require some changes in our project, which we discuss below.

There are two ways to build the sphinx docs for a project with readthedocs: just looking at the code and documentation, or installing the project first with pip install. Our build is failing in both cases, for different reasons:

1. Build without install

The following line causes an error when readthedocs tries to build the documentation:

File "conf.py", line 10, in <module>
    from anticipy import __version__

DistributionNotFound: The 'anticipy' distribution was not found and is required by the application

That line gets the version number from the project setup.py file, avoiding the need to keep track of version numbers in multiple files. But that will only work if we run pip install first.

If we want to be able to build docs without installing the project, we need to keep track of version numbers in the docs.

2. Build with install

Readthedocs uses an environment with either Cython 2.7 or 3.6 to run pip install. We try to install on Cython 2.7, with the following result:

Running scipy-1.1.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-zEdHi1/scipy-1.1.0/egg-dist-tmp-gTL7Fz
/tmp/easy_install-zEdHi1/scipy-1.1.0/setup.py:375: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
  warnings.warn("Unrecognized setuptools command, proceeding with "

ImportError: No module named numpy.distutils.core

We have experience with a similar error when installing on python 2.7.10, which went away if we used python>=2.7.11

Add unit tests: app.py

app.py currently lacks any unit tests. We should fix this.

Raise exceptions when forecast_plot.plot_forecast() fails to plot

When forecast_plot.plot_forecast() has invalid input or missing libraries, raise ValueError or ImportError respectively, instead of logging the error and exiting gracefully

Replace reduce built-in with functools reduce

NameError: name 'reduce' is not defined

Prepare release: 0.1.2

New release:

Added interactive plots with plotly.
Moved matplotlib to an optional dependency.

Migrate open issues from gitlab repo

The previous version of this project was hosted in a private gitlab repo. As of writing this, that repo has 61 open issues. We need to go through them, filter them, and reopen them here when appropriate.

Support dynamic visualisations with Plotly

Clean up ggplot and R code remains (i.e. functions, tests etc.)

Implement a framework based on Plotly that supports dynamic visualisations

Rewrite and improve current plotting tests (i.e use realistic data instead of dummy dfs)

Fill gaps in documentation

We still have several functions with incomplete or missing docs. Time to fix that!

Test CI configuration in temporary project

We plan to deploy continuous integration with Travis for this project, once it becomes public. Unfortunately, it may be some time before this is possible. In the meantime, we could test for installation and deployment issues in the following way:

Create a public project in a personal Github repo
This project will include no code, but it will have the same setup.py as anticipy. We will set up Travis CI there, and check for any deployment issues.
Once we are happy with the CI configuration and deployment, we can add a hello_world.py module and a test_hello_world.py unit test, and configure Travis for tests, coverage, and PEP-8 compliance.

We may need to update setup.py in anticipy as a result of these tests. Also, we should be able to just copy the travis configuration from the test project to anticipy.

In forecast_plot.plot_forecast, only use facets if multiple source id's present

forecast_plot.plot_forecast() uses faceted plots whenever the input data has multiple source ID's. However, there is a minor bug in the logic to determine this: currently, the subplots variable is true if a 'source' column is present in the input. This should be changed so that the variable is only true if the input has a 'source' column and that column has multiple values.

Set up Travis CI

The travis config file is ready, but we need to change the project configuration to use Travis, and check that the tests work as expected.

Add readthedocs support

When the repository becomes open source we will have access to readthedocs. We should implement the "build docs" feature in to the repository and link the github pages to readthedocs for documentation.

Fix BSD3 license issue

Make sure that Github recognizes the project's license as BSD3.

Add license, setup.py, update README

Adding license and setup.py to project skeleton

Conform to PEP8

Add UK Holiday calendar to use with forecast_models.get_model_dummy()

We have support for calendar-based events, but we need to add Holiday data.

We can use this as a starting point: https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/holiday.py

Documentation issues

https://pypi.org/project/anticipy/#description - Page formatting messed up. We were previously reusing the github README.md file, but this is no longer possible because badges break pypi formatting. We will add a new file, pypi_description.md
https://anticipy.readthedocs.io/en/latest/tutorial.html - several typos

Prepare release 0.1.1

Since release 0.1.0, we have migrated the project documentation to ReadTheDocs and fixed several typos in the documentation. We should push a new release, 0.1.1, so that pypi points to the latest docs version.

TypeError with nunique in model_utils

File "...\anticipy\model_utils.py",line 257, in interpolate_df
    if df.x.diff().nunique <=1:
TypeError: '<=' not supported between instances of 'method' and 'int'

Migrate code from previous project version

In forecast_plot, add support for plotting forecast inputs

The plotting functions in module forecast_plot generate plots for our forecast outputs. It would be convenient if we were also able to generate plots for our forecast inputs, so that we could have the following workflow:

# We define an input dataframe for our forecast
df_input = (...) 
# This is not currently supported, useful for exploration, prototyping
forecast_plot.plot_forecast(df_input, ...)  

df_ouput = forecast.run_forecast(df_input, ...)
# This is currently supported
forecast_plot.plot_forecast(df_output, ...)

The input dataframe format is flexible, and has the following columns:

y: value of time series
x: time series indices
date (optional): time series dates
source (optional): source identifier
weight (optional): weights for time series samples

For implementing this, I'd suggest avoiding new plotting logic. Instead, it's probably easier to transform the input dataframe into a format suitable for our current plotting logic:

run forecast.normalize_df() to ensure that the dataframe meets our requirements for forecast input.
transform the dataframe: rename 'source' column to model. Add column is_actuals=True. Assert that date column exists, or possibly rename column 'x' to 'date', if a time series with numeric indices instead of dates can work for plotting

Build requirements - add minimum setuptools version

Our setup.py, some of our dependencies are specific to certain python versions. This is not supported by older versions of setuptools - we need to figure out the minimum supported version of setuptools and add it as a requirement

dependencies = [
    'matplotlib==2.2.3;python_version<"3.5"', # Last version compatible with python 2.7
    'matplotlib>=2.2.3;python_version>="3.5"',
    'numpy>=1.15.1',
    'pandas>=0.23.0',
    'scipy>=1.0.0',
]