wildboar-foundation / wildboar Goto Github PK

View Code? Open in Web Editor NEW

27.0 4.0 3.0 383.48 MB

wildboar is a Python module for temporal machine learning

Home Page: https://wildboar.dev

License: BSD 3-Clause "New" or "Revised" License

Python 62.36% Cython 30.20% C 7.39% Makefile 0.05%

python timeseries machine-learning distance-measures dtw euclidean-distances cython numpy scipy dynamic-time-warping

wildboar's People

Contributors

Stargazers

Watchers

Forkers

sandy4321 vishalbelsare bmyywzgl

wildboar's Issues

Transform of SAX and PAA does not check if fitted

`ShapeletForestClassifier` abstraction

Is your feature request related to a problem? Please describe.
A ShapeletForestClassifier-class to not force users to use BaggingClassifier. It could also hide the force_dim parameter of the ShapeletTreeClassifier

Describe the solution you'd like
A classifier interface similar to RandomForestClassifier

Describe alternatives you've considered
None

Additional context
None

Improve examples

Is your feature request related to a problem? Please describe.
It is not clear how the code can be used

Describe the solution you'd like
Clear examples

Describe alternatives you've considered
Not applicable

Additional context
Add any other context or screenshots about the feature request here.

Extraction of Shapelets and return as a list

Dear Isak Samsten,

is it possible to retrieve/obtain all the shapelets extracted by ShapeletForestClassifier?
Which part of the code has to be modified? I could do it alone.

Thanks in advance,

Unneeded comment (not WIP)

https://github.com/isakkarlsson/wildboar/blob/4c4adaff354bc34f01de4aa6bd94e92df83ad5f3/wildboar/_dtw_distance.pyx#L160

Cannot run the examples

Describe the bug
I want to run the shapelet_forest_example.py but it ends at reporting the following traceback:

Traceback (most recent call last):
  File "D:/MachineLearningDeveloper/wildboar/examples/shapelet_forest_example.py", line 5, in <module>
    from wildboar.ensemble import ExtraShapeletTreesClassifier, ShapeletForestClassifier
  File "D:\MachineLearningDeveloper\wildboar\wildboar\ensemble\__init__.py", line 18, in <module>
    from ._ensemble import ShapeletForestClassifier
  File "D:\MachineLearningDeveloper\wildboar\wildboar\ensemble\_ensemble.py", line 32, in <module>
    from ..tree import ExtraShapeletTreeClassifier
  File "D:\MachineLearningDeveloper\wildboar\wildboar\tree\__init__.py", line 18, in <module>
    from ._tree import ShapeletTreeClassifier
  File "D:\MachineLearningDeveloper\wildboar\wildboar\tree\_tree.py", line 24, in <module>
    from ._tree_builder import ClassificationShapeletTreeBuilder
ModuleNotFoundError: No module named 'wildboar.tree._tree_builder'

To Reproduce
Steps to reproduce the behavior:
run the shapelet_forest_example.py

Expected behavior
Expect to print out results in the console.

Setup (please complete the following information):

OS: Windows 10 64-bit
Python version: 3.8.6
NumPy version: 1.18.5
Cython version: 0.29.21

Additional context
None

Support pairwise_distance(2d-array, 3d-array)

Given a single multivariate sample (n_dims, n_timestep) and a dataset of (n_samples, n_dims, n_timestep), pairwise_distance should return an array of shape (n_samples) (if dim="mean" or int) or (n_dims, n_samples) if dim="full".

Inaccurate computation of standard deviation

The standard deviation in ScaledDTW and ScaledEuclidean uses a simple one pass algorithm which is inaccurate and unstable.

We should investigate if there are more accurate one-pass algorithms.

CI builds fail on Windows.

The windows builds seems to be randomly failing.

The issue is reported upstream and there is a pull-request to resolve it: pypa/cibuildwheel#1740

Waiting for the fix to update the workflow.

Unit tests

Is your feature request related to a problem? Please describe.
Unit tests

Describe the solution you'd like
A test suite to detect bugs

Describe alternatives you've considered
Not applicable

Additional context
Add any other context or screenshots about the feature request here.

Windows specified cache path doesn't exist.

Describe the bug
In \wildboar\datasets\__init__.py, _os_cache_path set as %LOCALAPPDATA%\Caches. In fact, this path doesn't exist, the default path is %LOCALAPPDATA%\cache.

To Reproduce
Steps to reproduce the behavior:

In Windows 10 environment
run shapelet_forest_example.py, the FileNotFoundError will occur.

Expected behavior
A clear and concise description of what you expected to happen.
We should get RSF and Extra metrics.

Setup (please complete the following information):

OS: Windows 10 64-bit
Python version: 3.8.6
NumPy version: 1.18.5
Cython version: 0.29.21

Additional context

Traceback (most recent call last):
  File "D:/MachineLearningDeveloper/wildboar/examples/shapelet_forest_example.py", line 7, in <module>
    x, y = datasets.load_gun_point()
  File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\__init__.py", line 119, in load_gun_point
    return load_dataset(
  File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\__init__.py", line 292, in load_dataset
    x, y, n_train_samples = repository.load(
  File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\_repository.py", line 164, in load
    with self._download_repository(
  File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\_repository.py", line 249, in _download_repository
    os.mkdir(cache_dir)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\Hephaest\\AppData\\Local\\Caches\\wildboar'

Process finished with exit code 1

Multivariate Case

Hi!
Thanks a lot for providing the code alongside the paper!
I managed to get the examples running but all the datasets are univariate, if I'm not mistaken.

I am wondering what's the format expected for a multivariate dataset?
Could you provide a (minimal) working example?

n_features_in_ not set

BaseEnsemble from scikit-learn no longer set n_features_in_

Workaround

Downgrade sklearn to version < 1.0

version missed

When I try to build this rep from source code, I got an error as follows:

     from .version import version as __version__
ModuleNotFoundError: No module named 'wildboar.version'

So, I think the version should be properly assigned in the init.py

ShapeletForestClassifier error with multivariate time-series of different lengths

Dear Isak,

when I try to fit the ShapeletForestClassifier (scaled_dtw) on a nested dataframe containing multivariate time-series of different lengths, I get the following error:

"ValueError: setting an array element with a sequence."

If I try to fit on a Multiindex Dataframe, I get the following error instead:

"ValueError: Number of labels=712 does not match number of samples=509628"

It seems that it cannot handle multivariate time-series of different lengths, but yet the scaled dtw metric should work on such items.
I kindly ask for your help.

Thanks in advance.

Multivariate support for ShapeletForestCounterfactual

As mentioned in discussions, is it possible to implement multivariate support in the explainability module?

In particular, I'm interested in the ShapeletForestCounterfactual method. Thanks in advance!

Unecessary splits in shapele trees

I have a hypothesis on what can go wrong. here we skip shapelets that produce constant distances to all samples. The bug would present it self if all shapelets produce such constant distances and the best split is not initialised. Perhaps that’s what’s happening. I will try to implement some sanity check for that case and see if I can reproduce the bug.

Originally posted by @isaksamsten in #87 (reply in thread)

Segmentation Faults

Hello,
I installed wildboar on Ubuntu 18.04 LTS, with Python 3.6.8 using pip. The examples run perfectly fine, but whenever I try to run it on my data I get a segmentation fault. How can I resolve this?

Thanks

_check_estimator_name no longer included in sklearn.utils.validation

Hi Isak,

"from sklearn.utils.validation import (
_check_estimator_name,
_check_y,
check_consistent_length,
warnings,
)"

This import is no longer working with the most recent version of scikit-learn.

The same is happening with '_fit_context' from 'sklearn.base'. Any suggestions? Thank you!

Bug when using dict or list scoring in importances

https://github.com/isaksamsten/wildboar/blob/bd2560f94ca5e44067e57365eb64b10be26bc2de/src/wildboar/explain/_importance.py#L148

Raises ModuleNotFoundError when importing the datasets module

We use the packaging module to parse version numbers.

The module is almost always installed (it's used by setuptools), so I don't think we should specify it as dependency, but use soft_dependency to warn users and suggest them to install the correct package.

Datasets documentation is outdated

will it work for multivariate time series classification for example mixture of categorical and continues data?

Is your feature request related to a problem? Please describe.

will it work for  multivariate  time series classification for example mixture of categorical and continues data? 
for example at time t1 we have observation: red, 2.4 , 5, 12.456 and time t2: green, 3.5, 2, 45.78; time t3: black, 5.6, 7, 23.56; t4: red, 2.1, 5, 12.6 ?

Update What's New for Version 1.2

Correct the module of CastorTransform

_joblib_parallel_args removed from scikit-learn

Describe the bug
You're importing _joblib_parallel_args from sklearn.fixes however it's been removed

To Reproduce

$ pip install "scikit-learn>=0.21.3"
Requirement already satisfied: scikit-learn>=0.21.3 in ./.venv/lib/python3.8/site-packages (from wildboar) (1.1.1)

from sklearn.utils.fixes import _joblib_parallel_args
ImportError: cannot import name '_joblib_parallel_args' from 'sklearn.utils.fixes'

$ pip install "scikit-learn<=1.1.0"
Installing collected packages: scikit-learn
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 1.1.1
Uninstalling scikit-learn-1.1.1:
Successfully uninstalled scikit-learn-1.1.1
Successfully installed scikit-learn-1.0.2

from sklearn.utils.fixes import _joblib_parallel_args

Expected behavior
Compatible with latest version(s) of scikit-learn (>1.0.2)

Setup (please complete the following information):

OS: Ubuntu 20.04
Python version: 3.8
Wildboar version: 1.0.12

RocketClassifier does not have a classes_-property

ShapeletForestClassifier does not support single dimension 3D input (i.e. shape (10, 1, 20))

Describe the bug

ShapeletForestClassifier crashes with a single dimension 3D numpy array input as sklearn cannot process it.

Traceback (most recent call last):
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\local_code.py", line 18, in <module>
    rf.fit(x, y)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\wildboar\ensemble\_ensemble.py", line 186, in fit
    self._fit(
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\wildboar\ensemble\_ensemble.py", line 240, in _fit
    return super()._fit(X, y, max_samples, max_depth, sample_weight, False)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\sklearn\ensemble\_bagging.py", line 434, in _fit
    all_results = Parallel(
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 1085, in __call__
    if self.dispatch_one_batch(iterator):
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 901, in dispatch_one_batch
    self._dispatch(tasks)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 819, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\_parallel_backends.py", line 597, in __init__
    self.results = batch()
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\sklearn\utils\fixes.py", line 117, in __call__
    return self.function(*args, **kwargs)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\sklearn\ensemble\_bagging.py", line 84, in _parallel_build_estimators
    n_samples, n_features = X.shape
ValueError: too many values to unpack (expected 2)

Process finished with exit code 1

To Reproduce

import warnings
import numpy as np
from wildboar.datasets import load_dataset
from wildboar.ensemble import ShapeletForestClassifier

if __name__ == "__main__":
    warnings.simplefilter(action='ignore', category=FutureWarning)

    x, y = load_dataset("Beef")
    print(x.shape)
    rf = ShapeletForestClassifier()
    rf.fit(x, y)
    print(rf.score(x, y))

    x = np.reshape(x, (60, 1, 470))
    print(x.shape)
    rf = ShapeletForestClassifier()
    rf.fit(x, y)
    print(rf.score(x, y))

More than 1 dimension is fine, i.e. if you were to reshape it to (60, 2, 235).

Expected behavior

The classifier processes single dimension 3D input.

Setup (please complete the following information):

OS: Windows
Python version: 3.9
NumPy version: 1.22.4
Wildboar version: 1.1.1

Additional context

If this is a deliberate design decision, feel free to close.

Impurity at forest nodes is always -1 in Multivariate Shapelet Forest Classifier

Dear Isak Samsten,

while I was checking the internal attributes of ShapeletForestClassifier after the fit method was called, I noticed that the Impurity attribute for each node of one hundred trees was alywas -1.

I think it is an error, maybe due to the bad structure of my data. Do you have any suggestions?

Thanks in advance!

Array length invariant not checked in constant_lower_bound

If length < 3 undefined memory will be read.

Timeout too low

          Ok I updated scikit-learn to 14.0, but now I'm facing this error:

"requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='datasets.wildboar.dev', port=443): Read timed out. (read timeout=3)"

Originally posted by @lorebon in #92 (comment)

Windows use_math

Describe the bug
Can't compile windows

To Reproduce
Compile on windows

Expected behavior
Should compile on windows

Setup (please complete the following information):

OS: Windows
Python version: 3.7
NumPy version:
Cython version:

Additional context
Fix: remove "m"

Load multiple datasets with merge_train_test=False

Describe the bug
Runtime error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/isak/.miniconda3/envs/wbweb/lib/python3.11/site-packages/wildboar/datasets/__init__.py", line 235, in load_datasets
    x, y = load_dataset(dataset, repository=repository, **kwargs)

To Reproduce

load_datasets("wildboar/ucr-tiny", merge_train_test=False)

Expected behavior

Load train/test parts for each dataset

Assigned leaf node in decision_path()

Hi Isak,

I have a question about the RandomShapeletForest method "decision_path": I was wondering if it's possible to modify the boolean 2D matrix in order to keep track of the assigned leaf node for each sample.

Is there another way to recover such information? Suppose that there are at least two leaves with the same class.
Thank you in advance!

'IsolationShapeletForest' object has no attribute 'n_dims_'

Describe the bug
n_dims_ is no longer set.

To Reproduce
set bootstrap=True and contamination_set="oob" in IsolationShapeletForest

Expected behavior
n_dims_in_ should be used instead

ShapeletForestEmbedding returns a matrix of all ones

Edit: that does fix the import error but I cannot get the unsupervised example to work (either with scikit-learn==1.1.1 and v1.1.0rc4 or the previous release). The ShapeletForestEmbedding returns a matrix of all ones which causes issues in the remaining steps

tree.plot_tree(clf) for ShapeletDecisionTree class

Dear Isak,

I was wondering if it's possible to plot a ShapeletDecisionTree structure like the scikit-learn method. For example:

It would be nice plotting the corresponding shapelet for each split. I don't think it's hard since the ShapeletDecisionTree class it's very similar to the classic one.

Do you have any ideas? Thanks in advance!

RocketClassifier with n_jobs > 1 while using cross_validation does not work

Describe the bug
RocketClassifier with n_jobs > 1 while using cross_validation does not work

RocketClassifier predict_proba is not working

Describe the bug
RocketClassifier predict_proba is not working

you request scikit-learn>=1.2rc1,<1.3a0 but pip install -U wildboar installs very old scikit-learn-1.1.3

Describe the bug
A clear and concise description of what the bug is.

you request
scikit-learn>=1.2rc1,<1.3a0
but pip install -U wildboar
installs very old
scikit-learn-1.1.3

installation fails
you have requirement
Cython>=0.29.24
numpy>=1.21.0
scikit-learn>=1.2rc1,<1.3a0
scipy>=1.3.2

Installation error
Installing collected packages: scikit-learn, wildboar
Attempting uninstall: scikit-learn
WARNING: Ignoring invalid distribution -orch (c:\my_py_environments\py310_env_apr2023\lib\site-packages)
Found existing installation: scikit-learn 1.2.2
Uninstalling scikit-learn-1.2.2:
Successfully uninstalled scikit-learn-1.2.2
WARNING: Ignoring invalid distribution -orch (c:\my_py_environments\py310_env_apr2023\lib\site-packages)
WARNING: Ignoring invalid distribution -orch (c:\my_py_environments\py310_env_apr2023\lib\site-packages)
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyts 0.13.0 requires scikit-learn>=1.2.0, but you have scikit-learn 1.1.3 which is incompatible.
Successfully installed scikit-learn-1.1.3 wildboar-1.1.1

To Reproduce
Steps to reproduce the behavior:
pip install -U wildboar

Expected behavior
A clear and concise description of what you expected to happen.

Setup (please complete the following information):

OS: [e.g. iOS] Windows 11
Python version: Python 3.10.11
NumPy version: np.version
'1.23.5'
Cython version:
Cython.version
'0.29.35'
Wildboar version: (version or commit)
wildboar-1.1.1

Additional context
Add any other context about the problem here.

will it work for windows ?

Universal binaries are compiled for GNU/Linux and Python 3.8, 3.9, 3.10

will it work for windows ?

Can’t create cache_dir if root folder does not exist

Relicense project as LGPL

The project is licensed as GPL (v3). While great, it limits the libraries inclusion in projects with other licenses (e.g., MIT or BSD )

To resolve this, the project will be relicensed as LGPL which allows for dynamically link with the library (e.g., through downloading it from PyPi)

Wildboar ShapeletForestClassifier not working?

Hello,
I was trying to use your package, so I did a pip install wildboar started out with the tutorial code as follows:

from wildboar.datasets import load_synthetic_control, load_two_lead_ecg
x_train, x_test, y_train, y_test = load_two_lead_ecg(merge_train_test=False)
from wildboar.ensemble import ShapeletForestClassifier
clf = ShapeletForestClassifier()
clf.fit(x_train, y_train)
print(clf.predict(x_test[-1:, :]))

But rather than give 6 as expected, it gives 2 as output.
I tried many other UCR datasets, as I have used an older version of the ShapeletForestClassifier before to great effect, but the accuracy on all the ones I tried (FordA, Ham, Gun_Point, CBF,Cricket_X,...) are all horribly poor. Perhaps there is something broken in the release, as I know the ShapeletForestClassifier performs very well?

Thanks in advance

Throw MemoryError if `malloc` return `NULL`

https://github.com/isakkarlsson/wildboar/blob/4c4adaff354bc34f01de4aa6bd94e92df83ad5f3/wildboar/_dtw_distance.pyx#L435-L442

wildboar-foundation / wildboar Goto Github PK

wildboar's People

Contributors

Stargazers

Watchers

Forkers

wildboar's Issues

Workaround

Recommend Projects

Recommend Topics

Recommend Org