Code Monkey home page Code Monkey logo

anticipy's People

Contributors

bezes avatar capelastegui avatar slenas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anticipy's Issues

Error in setup.py requirements

setup.py states that 'pandas>=0.20.3' is required. However, this should be pandas>=0.23 , due to changes in the pandas API

Forecast error: seasonality with missing values

We are getting an error where our seasonality models are applied to time series with gaps, if the gaps are aligned with the seasonality period. For example a time series with daily samples here all Friday samples are missing. In this case, the weekly seasonality parameter for Fridays will be fitted to a random value.

def array_zeros_in_indices(n, l_indices):
    return (~np.isin(np.arange(0, n), l_indices)).astype(float)

# Original time series, no gaps
df1 = pd.DataFrame({'y': np.full(14,10.0)+np.random.normal(0.0, 0.1, 14),
                    'source':'src1',
                   'date':pd.date_range('2018-01-01', periods=14, freq='D')})

# Copy of df1 with gaps on same weekday
df2 = df1.copy()
df2['weight']=array_zeros_in_indices(14,[5,12])
df2['source']='src2'
dict_forecast1 = forecast.run_forecast(df1, extrapolate_years=0.1, simplify_output=False,
                                      l_model_trend=forecast_models.model_linear+forecast_models.model_season_wday,
                                      l_model_season=[],
                                      l_model_naive=[],
                                     include_all_fits=True)
df_forecast1 = dict_forecast1['data']
print df_forecast1.tail(3)

dict_forecast2 = forecast.run_forecast(df2, extrapolate_years=0.1, simplify_output=False,
                                      l_model_trend=forecast_models.model_linear+forecast_models.model_season_wday,
                                      l_model_season=[],
                                      l_model_naive=[],
                                     include_all_fits=True)
df_forecast2 = dict_forecast2['data']
print df_forecast2.tail(3)

df_forecast = pd.concat([df_forecast1, df_forecast2], ignore_index=True)

image

We should run a check before model fitting that identifies these scenarios and skips attempting a fit for that model.

User not warned when trying to install on unsupported python version

We have defined dependencies to specific python versions in setup.py, to address some incompatibilities. This has unintended consequences:

  • When trying to install anticipy on an unsupported python version (e.g. 2.7.10), anticipy v0.0.2 will be installed. This is the last version before we implemented the python version requirements.
  • If we try to install a specific anticipy version (e.g. 0.1.0) on python 2.7.10, installation fails with a 'no matching distribution found' message. This is very confusing, since the distribution is in pypi, and the actual reason for the error is an unsupported python version.

This looks like a setuptools issue, and I don't know if there's much we can do about it. But we should definitely add this to the FAQ (when we make one), and possibly add a note somewhere in the docs. We may also want to remove anticipy 0.0.2 and earlier from pypi - we don't want users getting that by default.

Implement new model components, InputTransform, OutputTransform

Our forecast logic uses ForecastModel objects that encapsulate model functions and add additional features such as:

  • model composition: add and multiply models with the '+' and '*' operators
  • parameter initialisation
  • parameter boundaries
  • input validation

This gives us great flexibility in defining our models, but still falls short in certain scenarios. We should implement new components to allow us to transform our input series in specific ways:

  • InputTransform would transform the input time series prior to computing the residuals when fitting, then reverse the transformation when computing the model output. The primary application of this component would be to implement Box-Cox transformations
  • OutputTransform would transform the output of the model (both when computing the residuals and when generating a forecast). This would also apply to confidence intervals. The main application of this component would be to ensure that certain outputs are always positive.

The current plan for using these components is that they would be composed with ForecastModels, using an operator other than + or *. '|' Would be a great option, if available. A model using these features could look as follows:

itrans_boxcox | model_linear + season_wday | otrans_positive

Specific instances of these model transformations, such as the Box-Cox transform and the positive output transform, would be defined in separate git issues.

Implement parallel processing with Dask

We have been experimenting with using dask (https://dask.org/) to support parallel processing. Unfortunately, the code for some of these experiments has been left in tests.test_forecast.py in the main branch. We should move this unfinished code to a separate branch and complete this feature.

We will need to close #37 first in order to evaluate any performance gains achieved with dask.

Set up readthedocs

Although we already have working documentation set up in GitHub pages, we have decided to use readthedocs instead. Readthedocs offers easy integration with our sphinx docs, eliminating the need to manually update the documentation and push to github.

However, using readthedocs will require some changes in our project, which we discuss below.

There are two ways to build the sphinx docs for a project with readthedocs: just looking at the code and documentation, or installing the project first with pip install. Our build is failing in both cases, for different reasons:

1. Build without install

The following line causes an error when readthedocs tries to build the documentation:

File "conf.py", line 10, in <module>
    from anticipy import __version__

DistributionNotFound: The 'anticipy' distribution was not found and is required by the application

That line gets the version number from the project setup.py file, avoiding the need to keep track of version numbers in multiple files. But that will only work if we run pip install first.

If we want to be able to build docs without installing the project, we need to keep track of version numbers in the docs.

2. Build with install

Readthedocs uses an environment with either Cython 2.7 or 3.6 to run pip install. We try to install on Cython 2.7, with the following result:

Running scipy-1.1.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-zEdHi1/scipy-1.1.0/egg-dist-tmp-gTL7Fz
/tmp/easy_install-zEdHi1/scipy-1.1.0/setup.py:375: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
  warnings.warn("Unrecognized setuptools command, proceeding with "

ImportError: No module named numpy.distutils.core

We have experience with a similar error when installing on python 2.7.10, which went away if we used python>=2.7.11

Prepare release: 0.1.2

New release:

  • Added interactive plots with plotly.
  • Moved matplotlib to an optional dependency.

Migrate open issues from gitlab repo

The previous version of this project was hosted in a private gitlab repo. As of writing this, that repo has 61 open issues. We need to go through them, filter them, and reopen them here when appropriate.

Support dynamic visualisations with Plotly

Clean up ggplot and R code remains (i.e. functions, tests etc.)

Implement a framework based on Plotly that supports dynamic visualisations

Rewrite and improve current plotting tests (i.e use realistic data instead of dummy dfs)

Test CI configuration in temporary project

We plan to deploy continuous integration with Travis for this project, once it becomes public. Unfortunately, it may be some time before this is possible. In the meantime, we could test for installation and deployment issues in the following way:

  • Create a public project in a personal Github repo
  • This project will include no code, but it will have the same setup.py as anticipy. We will set up Travis CI there, and check for any deployment issues.
  • Once we are happy with the CI configuration and deployment, we can add a hello_world.py module and a test_hello_world.py unit test, and configure Travis for tests, coverage, and PEP-8 compliance.

We may need to update setup.py in anticipy as a result of these tests. Also, we should be able to just copy the travis configuration from the test project to anticipy.

In forecast_plot.plot_forecast, only use facets if multiple source id's present

forecast_plot.plot_forecast() uses faceted plots whenever the input data has multiple source ID's. However, there is a minor bug in the logic to determine this: currently, the subplots variable is true if a 'source' column is present in the input. This should be changed so that the variable is only true if the input has a 'source' column and that column has multiple values.

Set up Travis CI

The travis config file is ready, but we need to change the project configuration to use Travis, and check that the tests work as expected.

Add readthedocs support

When the repository becomes open source we will have access to readthedocs. We should implement the "build docs" feature in to the repository and link the github pages to readthedocs for documentation.

Prepare release 0.1.1

Since release 0.1.0, we have migrated the project documentation to ReadTheDocs and fixed several typos in the documentation. We should push a new release, 0.1.1, so that pypi points to the latest docs version.

TypeError with nunique in model_utils

File "...\anticipy\model_utils.py",line 257, in interpolate_df
    if df.x.diff().nunique <=1:
TypeError: '<=' not supported between instances of 'method' and 'int'

In forecast_plot, add support for plotting forecast inputs

The plotting functions in module forecast_plot generate plots for our forecast outputs. It would be convenient if we were also able to generate plots for our forecast inputs, so that we could have the following workflow:

# We define an input dataframe for our forecast
df_input = (...) 
# This is not currently supported, useful for exploration, prototyping
forecast_plot.plot_forecast(df_input, ...)  

df_ouput = forecast.run_forecast(df_input, ...)
# This is currently supported
forecast_plot.plot_forecast(df_output, ...) 

The input dataframe format is flexible, and has the following columns:

  • y: value of time series
  • x: time series indices
  • date (optional): time series dates
  • source (optional): source identifier
  • weight (optional): weights for time series samples

For implementing this, I'd suggest avoiding new plotting logic. Instead, it's probably easier to transform the input dataframe into a format suitable for our current plotting logic:

  • run forecast.normalize_df() to ensure that the dataframe meets our requirements for forecast input.
  • transform the dataframe: rename 'source' column to model. Add column is_actuals=True. Assert that date column exists, or possibly rename column 'x' to 'date', if a time series with numeric indices instead of dates can work for plotting

Build requirements - add minimum setuptools version

Our setup.py, some of our dependencies are specific to certain python versions. This is not supported by older versions of setuptools - we need to figure out the minimum supported version of setuptools and add it as a requirement

dependencies = [
    'matplotlib==2.2.3;python_version<"3.5"', # Last version compatible with python 2.7
    'matplotlib>=2.2.3;python_version>="3.5"',
    'numpy>=1.15.1',
    'pandas>=0.23.0',
    'scipy>=1.0.0',
]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.