Code Monkey home page Code Monkey logo

sam's People

Contributors

abontsema avatar daanvanes avatar fennovj avatar mi2354 avatar multits avatar philiproeleveld avatar rubenpeters91 avatar rutgerke avatar ruudkassing avatar sbuergers avatar sburgers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sam's Issues

Add a check on quantile values

You can supply the sam models with quantile values outside of the 0,1 range, but this will lead to -Inf losses, maybe good to add a line that checks if all the quantile values lie within the [0, 1] range and if not, throw an error

Implementation of forward fill / interpolation as a transformer

Replacing missing values is most convenient with a sklearn.impute_BaseImputer. For time series data, forward fill / interpolation / etc. are the most common methods to impute missing values, but for those techniques there are no transformers in scikit-learn.

Revise and refactor sam.visualization

The visualization module needs some refactoring, right now it seems like a random collection of (useful) functions.
This might also be an opportunity to get rid of the seaborn dependency and switch to OO-style of matplotlib

RNNTimeseriesRegressor

Similar to MLPTimeseriesRegressor. Support for recurrent neural networks (LSTM/GRU) would be nice to have.

We can use sam.models.create_keras_quantile_rnn() and sam.preprocessing.RNNReshaper

Add requirement for geom in read_regenradar

The optional geom argument (string) can only have a certain length. If too long, the request will fail.

We need to raise a warning to inform users why an error occurs (the API does not handle that well).

Refactor MLPTimeseriesRegressor `score` function

The score function calculates the tilted/pinball loss without using the included joint_tilted_loss function of this package. It would be a lot cleaner to use the internal function, instead of calculating it in multiple places

Decide on model naming conventions

SAmQuantileMLP / TimeseriesMLP or something different?

Option 1: *TimeseriesRegressor
BaseTimeseriesRegressor
ConstantTimeseriesRegressor
LinearTimeseriesRegressor
MLPTimeseriesRegressor

Option 2:*QuantileForecaster
BaseQuantileForecaster
ConstantQuantileForecaster
LinearQuantileForecaster
MLPQuantileForecaster

Other suggestions are welcome.

Add temporal alignment functionality of two signals

Having two signals measuring the same thing (+ some independent noise), we want to be able to align them using e.g. the cross-correlation. It should be possible to have signals of unequal length. It should be possible to do this for numpy arrays, as well as pandas data frames, where not only the signals of interest, but the whole dataframes are aligned according to the specified alignment columns for each data frame.

Make sure all DOCSTRING examples work

All the DOCSTRING examples should run without any doctest errors, right now that's not the case

DoD checklist

  • python -m pytest --doctest-modules should succeed
  • doctest option is added to unit test workflow

Add synthetic_samdata() function

Right now we can add synthethic timeseries and dateranges, but not synthetic sam dataframes, that would be really helpful for testing purposes

Use lagged y features for `predict_ahead==0` should be possible

SAM doesn't allow to use lagged features of the target, but also have predict_ahead==0. This was by design to prevent leaking data, however there could be a usecase where you only want lagged features of the target (not the target itself). This is however hard to check, but could be a nice addition.

Consider adding pre-commit hooks with pre-commit package

An example could look something like this, including

  • isort for ordering imports consistently
  • black for formatting in black style
  • trailing-whitespace to remove trailing whitespaces
  • flake8 for flake8 (pep8, pyflakes and circular complexity) - should be black compatible
  • bandit for checking for security vulnerabilities (e.g. writing securities in code). We want to allow using assert statements (B101) and allow using pickle (B301).

repos:

  • repo: https://github.com/pycqa/isort
    rev: 5.10.1
    hooks:
    • id: isort
      name: isort
      args: [--profile "black"]
  • repo: https://github.com/psf/black.git
    rev: 22.3.0
    hooks:
    • id: black
      name: black
      language_version: python3
  • repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.2.0
    hooks:
    • id: trailing-whitespace
      args: [--markdown-linebreak-ext=md]
    • id: mixed-line-ending
    • id: fix-byte-order-marker
    • id: check-executables-have-shebangs
    • id: check-shebang-scripts-are-executable
    • id: check-merge-conflict
    • id: check-symlinks
    • id: check-case-conflict
    • id: check-docstring-first
    • id: check-json
    • id: check-toml
    • id: check-xml
    • id: check-yaml
  • repo: https://gitlab.com/pycqa/flake8
    rev: 4.0.1
    hooks:
    • id: flake8
      additional_dependencies:
      • pyproject-flake8
      • flake8-absolute-import
      • flake8-black
      • flake8-docstrings
  • repo: https://github.com/PyCQA/bandit
    rev: 1.7.4
    hooks:
    • id: bandit
      name: bandit
      args: [--skip, "B101,B301" ]

Empty package from workflow to PyPI

Building the wheels locally and uploading the dist to PyPI manually works fine, but probably something goes wrong in the workflow.

Somehow the built package is empty, and no modules of functions can be uploaded.

Not critical, because current build was uploaded manually.

Simplify ConstantTimeseriesRegressor

ConstantTemplate (the underlying sklearn estimator for ConstantTimeseriesRegressor) only uses the input data X to determine the output shape of the predictions. It shouldn't actually be necessary for X to even contain data, or be array-like at all, as long as it specifies a length (implements the __len__() dunder).

In #75 and #76 I already loosened the validation on X by allowing NaN/Inf values, but the requirements are still needlessly restrictive because of the assumptions in BaseTimeseriesRegressor. I would like the ability at least to pass an empty dataframe X = pd.DataFrame(index=range(100)).

Moreover, it should be noted that scikit-learn actually has DummyRegressor, implementing the same logic as ConstantTemplate. Although I haven't tested it, ConstantTemplate is probably equivalent to DummyRegressor, and if so can be removed completely in favor of the latter.

BUG: SamQuantileMLP predict_ahead doesn't support Sequence

The type-hint for predict_ahead in SamQuantileMLP is
predict_ahead: Union[int, Sequence[int]] = 1,

But the parent BaseTimeseriesRegressor only supports List:
predict_ahead: Union[int, List[int]] = 1,

>>> predict_ahead = (0,)
>>> isinstance(predict_ahead, Sequence)
True
>>> model = SamQuantileMLP(predict_ahead=predict_ahead)
>>> model.predict_ahead
[(0,)]
>>> model.validate_predict_ahead()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\921266\source\repos\sam\sam\models\base_model.py", line 141, in validate_predict_ahead
    if not all([p >= 0 for p in self.predict_ahead]):
  File "C:\Users\921266\source\repos\sam\sam\models\base_model.py", line 141, in <listcomp>
    if not all([p >= 0 for p in self.predict_ahead]):
TypeError: '>=' not supported between instances of 'tuple' and 'int'
This is caused by the following line:
self.predict_ahead = (
    predict_ahead if isinstance(predict_ahead, List) else [predict_ahead]
)

Improve feature engineering in SamQuantileMLP

This should reduce the learning curve of using SAM. The current feature engineer in SamQuantile models are too complicated.

BuildRollingFeatures is also not a necessity, since pandas rolling functionality is providing the same. A way to provide a custom feature engineering function would make the required code for a simple model much easier.

Consider effect of removing first rows after rolling features

With version 3.0, when using the TimeSeriesMLP the first rows of the data will be removed in fit because of the rolling features, this no longer happens in the feature engineer, since it can also contain custom functions. This behaviour of course changes when using an imputer. We should consider if this is what we want

Unittests will fail when KNMI API returns empty results

This is not reflective of SAM code not working, so rather than throwing an error, this should only raise a warning that KNMI API is down.

affected:

  • sam.data_sources.weather.knmi.read_knmi
  • TestWeather.test_read_knmi_hourly

Use `use_diff_of_y` and `predict_ahead == [0]` at the same time

When using use_diff_of_y you apparently can't set predict_ahead = [0] in TimeseriesMLP, there are multiple checks for this in the code, and removing the first error will lead to predicting a straight line. Using use_diff_of_y with any other predict_ahead works as expected.

Update documentation for release

We should update the documentation for release and also fix the autodoc functionality, since currently it seems broken on: https://sam-rhdhv.readthedocs.io/en/latest/

  • Move important information from General documents to a new "Introduction" section
  • Remove General documents section
  • Replace example notebooks with the new examples
  • Fix autodoc errors so the docs work on readthedocs
  • Fix tensorflow import so autodocs works for metrics, models and visualization

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.