sky-uk / anticipy Goto Github PK
View Code? Open in Web Editor NEWA Python library for time series forecasting
License: BSD 3-Clause "New" or "Revised" License
A Python library for time series forecasting
License: BSD 3-Clause "New" or "Revised" License
Before we start to optimise performance, we need to set up some benchmarks. Pandas uses the asv library (https://github.com/airspeed-velocity/asv), we can try that out.
setup.py states that 'pandas>=0.20.3' is required. However, this should be pandas>=0.23 , due to changes in the pandas API
We are getting an error where our seasonality models are applied to time series with gaps, if the gaps are aligned with the seasonality period. For example a time series with daily samples here all Friday samples are missing. In this case, the weekly seasonality parameter for Fridays will be fitted to a random value.
def array_zeros_in_indices(n, l_indices):
return (~np.isin(np.arange(0, n), l_indices)).astype(float)
# Original time series, no gaps
df1 = pd.DataFrame({'y': np.full(14,10.0)+np.random.normal(0.0, 0.1, 14),
'source':'src1',
'date':pd.date_range('2018-01-01', periods=14, freq='D')})
# Copy of df1 with gaps on same weekday
df2 = df1.copy()
df2['weight']=array_zeros_in_indices(14,[5,12])
df2['source']='src2'
dict_forecast1 = forecast.run_forecast(df1, extrapolate_years=0.1, simplify_output=False,
l_model_trend=forecast_models.model_linear+forecast_models.model_season_wday,
l_model_season=[],
l_model_naive=[],
include_all_fits=True)
df_forecast1 = dict_forecast1['data']
print df_forecast1.tail(3)
dict_forecast2 = forecast.run_forecast(df2, extrapolate_years=0.1, simplify_output=False,
l_model_trend=forecast_models.model_linear+forecast_models.model_season_wday,
l_model_season=[],
l_model_naive=[],
include_all_fits=True)
df_forecast2 = dict_forecast2['data']
print df_forecast2.tail(3)
df_forecast = pd.concat([df_forecast1, df_forecast2], ignore_index=True)
We should run a check before model fitting that identifies these scenarios and skips attempting a fit for that model.
We have defined dependencies to specific python versions in setup.py, to address some incompatibilities. This has unintended consequences:
This looks like a setuptools issue, and I don't know if there's much we can do about it. But we should definitely add this to the FAQ (when we make one), and possibly add a note somewhere in the docs. We may also want to remove anticipy 0.0.2 and earlier from pypi - we don't want users getting that by default.
Right now, the only verbose output displayed is Running forecast for source: src
, in case there is more than one sources. If only one source is provided, no extra output is displayed.
An error has been identified when installing the library while using python 2.7.10. We should set a requirement for python 2.7.11 or greater in setup.py:
setup(name="my_package_name",
python_requires='>3.5.2')
Functions get_model_outliers and find_steps_and_spikes in forecast_models.py have the same functionality.
A print statement added to test_forecast while debugging is causing crashes on python 3. We should remove it
https://github.com/sky-uk/anticipy/community
3 missing elements:
Library name should be AnticiPy, we are currently inconsistent in capitalisation.
Also, fix a number of documentation typos that have been found.
When using output='jupyter'
, no path is needed.
When output='png'
or output='html'
we need to verify path is not empty.
In https://anticipy.readthedocs.io/en/latest/tutorial.html , there is an image that is not rendered, .static/images/tutorial-forecast1.png. This works fine when building the sphinx docs locally. We need to fix this, and find if there is a way to test for readthedocs-specific bugs without merging to master.
Some code lines include #noqa
to ignore pep-8 checks. However, that is messing with the output of some sphinx docs, such as anticipy.forecast_models.ForecastModel:118
Change from 'linear' to 'linear_nondec'
We sometimes want to combine lists of functions or ForecastModels while removing duplicates. The current approach, np.unique(), works on our local environments but causes errors in some CI environments. We should try replacing it with:
list(set(my_list_of_functions))
While working on #6 , we found that the code in forecast_models.get_model_outliers() is hard to follow without additional comments. We should expand the function documentation.
plotly.tools.make_subplots() has 2 arguments, horizontal_spacing and vertical_spacing, that adjust the space between subplots. Default values are too high for us, we should adjust them to get nicer plots.
Minor update to fix a documentation issue in 0.1.2
Our forecast logic uses ForecastModel objects that encapsulate model functions and add additional features such as:
This gives us great flexibility in defining our models, but still falls short in certain scenarios. We should implement new components to allow us to transform our input series in specific ways:
The current plan for using these components is that they would be composed with ForecastModels, using an operator other than + or *. '|' Would be a great option, if available. A model using these features could look as follows:
itrans_boxcox | model_linear + season_wday | otrans_positive
Specific instances of these model transformations, such as the Box-Cox transform and the positive output transform, would be defined in separate git issues.
Project description in pypi is not correctly rendered. This is because it is written in markdown, which pypi doesn't support. We should use restructuredtext instead.
The original project had docs generated with sphinx. We need to move them to Github.
We have been experimenting with using dask (https://dask.org/) to support parallel processing. Unfortunately, the code for some of these experiments has been left in tests.test_forecast.py in the main branch. We should move this unfinished code to a separate branch and complete this feature.
We will need to close #37 first in order to evaluate any performance gains achieved with dask.
Although we already have working documentation set up in GitHub pages, we have decided to use readthedocs instead. Readthedocs offers easy integration with our sphinx docs, eliminating the need to manually update the documentation and push to github.
However, using readthedocs will require some changes in our project, which we discuss below.
There are two ways to build the sphinx docs for a project with readthedocs: just looking at the code and documentation, or installing the project first with pip install. Our build is failing in both cases, for different reasons:
The following line causes an error when readthedocs tries to build the documentation:
File "conf.py", line 10, in <module>
from anticipy import __version__
DistributionNotFound: The 'anticipy' distribution was not found and is required by the application
That line gets the version number from the project setup.py file, avoiding the need to keep track of version numbers in multiple files. But that will only work if we run pip install first.
If we want to be able to build docs without installing the project, we need to keep track of version numbers in the docs.
Readthedocs uses an environment with either Cython 2.7 or 3.6 to run pip install. We try to install on Cython 2.7, with the following result:
Running scipy-1.1.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-zEdHi1/scipy-1.1.0/egg-dist-tmp-gTL7Fz
/tmp/easy_install-zEdHi1/scipy-1.1.0/setup.py:375: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
warnings.warn("Unrecognized setuptools command, proceeding with "
ImportError: No module named numpy.distutils.core
We have experience with a similar error when installing on python 2.7.10, which went away if we used python>=2.7.11
app.py currently lacks any unit tests. We should fix this.
When forecast_plot.plot_forecast()
has invalid input or missing libraries, raise ValueError
or ImportError
respectively, instead of logging the error and exiting gracefully
NameError: name 'reduce' is not defined
New release:
The previous version of this project was hosted in a private gitlab repo. As of writing this, that repo has 61 open issues. We need to go through them, filter them, and reopen them here when appropriate.
Clean up ggplot and R code remains (i.e. functions, tests etc.)
Implement a framework based on Plotly that supports dynamic visualisations
Rewrite and improve current plotting tests (i.e use realistic data instead of dummy dfs)
We still have several functions with incomplete or missing docs. Time to fix that!
We plan to deploy continuous integration with Travis for this project, once it becomes public. Unfortunately, it may be some time before this is possible. In the meantime, we could test for installation and deployment issues in the following way:
We may need to update setup.py in anticipy as a result of these tests. Also, we should be able to just copy the travis configuration from the test project to anticipy.
forecast_plot.plot_forecast() uses faceted plots whenever the input data has multiple source ID's. However, there is a minor bug in the logic to determine this: currently, the subplots
variable is true if a 'source' column is present in the input. This should be changed so that the variable is only true if the input has a 'source' column and that column has multiple values.
The travis config file is ready, but we need to change the project configuration to use Travis, and check that the tests work as expected.
When the repository becomes open source we will have access to readthedocs. We should implement the "build docs" feature in to the repository and link the github pages to readthedocs for documentation.
Make sure that Github recognizes the project's license as BSD3.
Adding license and setup.py to project skeleton
We have support for calendar-based events, but we need to add Holiday data.
We can use this as a starting point: https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/holiday.py
Since release 0.1.0, we have migrated the project documentation to ReadTheDocs and fixed several typos in the documentation. We should push a new release, 0.1.1, so that pypi points to the latest docs version.
File "...\anticipy\model_utils.py",line 257, in interpolate_df
if df.x.diff().nunique <=1:
TypeError: '<=' not supported between instances of 'method' and 'int'
The plotting functions in module forecast_plot generate plots for our forecast outputs. It would be convenient if we were also able to generate plots for our forecast inputs, so that we could have the following workflow:
# We define an input dataframe for our forecast
df_input = (...)
# This is not currently supported, useful for exploration, prototyping
forecast_plot.plot_forecast(df_input, ...)
df_ouput = forecast.run_forecast(df_input, ...)
# This is currently supported
forecast_plot.plot_forecast(df_output, ...)
The input dataframe format is flexible, and has the following columns:
For implementing this, I'd suggest avoiding new plotting logic. Instead, it's probably easier to transform the input dataframe into a format suitable for our current plotting logic:
Our setup.py, some of our dependencies are specific to certain python versions. This is not supported by older versions of setuptools - we need to figure out the minimum supported version of setuptools and add it as a requirement
dependencies = [
'matplotlib==2.2.3;python_version<"3.5"', # Last version compatible with python 2.7
'matplotlib>=2.2.3;python_version>="3.5"',
'numpy>=1.15.1',
'pandas>=0.23.0',
'scipy>=1.0.0',
]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.