Code Monkey home page Code Monkey logo

wildboar's People

Contributors

hephaest avatar isaksamsten avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

wildboar's Issues

`ShapeletForestClassifier` abstraction

Is your feature request related to a problem? Please describe.
A ShapeletForestClassifier-class to not force users to use BaggingClassifier. It could also hide the force_dim parameter of the ShapeletTreeClassifier

Describe the solution you'd like
A classifier interface similar to RandomForestClassifier

Describe alternatives you've considered
None

Additional context
None

Improve examples

Is your feature request related to a problem? Please describe.
It is not clear how the code can be used

Describe the solution you'd like
Clear examples

Describe alternatives you've considered
Not applicable

Additional context
Add any other context or screenshots about the feature request here.

Extraction of Shapelets and return as a list

Dear Isak Samsten,

is it possible to retrieve/obtain all the shapelets extracted by ShapeletForestClassifier?
Which part of the code has to be modified? I could do it alone.

Thanks in advance,

LB

Cannot run the examples

Describe the bug
I want to run the shapelet_forest_example.py but it ends at reporting the following traceback:

Traceback (most recent call last):
  File "D:/MachineLearningDeveloper/wildboar/examples/shapelet_forest_example.py", line 5, in <module>
    from wildboar.ensemble import ExtraShapeletTreesClassifier, ShapeletForestClassifier
  File "D:\MachineLearningDeveloper\wildboar\wildboar\ensemble\__init__.py", line 18, in <module>
    from ._ensemble import ShapeletForestClassifier
  File "D:\MachineLearningDeveloper\wildboar\wildboar\ensemble\_ensemble.py", line 32, in <module>
    from ..tree import ExtraShapeletTreeClassifier
  File "D:\MachineLearningDeveloper\wildboar\wildboar\tree\__init__.py", line 18, in <module>
    from ._tree import ShapeletTreeClassifier
  File "D:\MachineLearningDeveloper\wildboar\wildboar\tree\_tree.py", line 24, in <module>
    from ._tree_builder import ClassificationShapeletTreeBuilder
ModuleNotFoundError: No module named 'wildboar.tree._tree_builder'

To Reproduce
Steps to reproduce the behavior:
run the shapelet_forest_example.py

Expected behavior
Expect to print out results in the console.

Setup (please complete the following information):

  • OS: Windows 10 64-bit
  • Python version: 3.8.6
  • NumPy version: 1.18.5
  • Cython version: 0.29.21

Additional context
None

Support pairwise_distance(2d-array, 3d-array)

Given a single multivariate sample (n_dims, n_timestep) and a dataset of (n_samples, n_dims, n_timestep), pairwise_distance should return an array of shape (n_samples) (if dim="mean" or int) or (n_dims, n_samples) if dim="full".

Inaccurate computation of standard deviation

The standard deviation in ScaledDTW and ScaledEuclidean uses a simple one pass algorithm which is inaccurate and unstable.

We should investigate if there are more accurate one-pass algorithms.

Unit tests

Is your feature request related to a problem? Please describe.
Unit tests

Describe the solution you'd like
A test suite to detect bugs

Describe alternatives you've considered
Not applicable

Additional context
Add any other context or screenshots about the feature request here.

Windows specified cache path doesn't exist.

Describe the bug
In \wildboar\datasets\__init__.py, _os_cache_path set as %LOCALAPPDATA%\Caches. In fact, this path doesn't exist, the default path is %LOCALAPPDATA%\cache.

To Reproduce
Steps to reproduce the behavior:

  1. In Windows 10 environment
  2. run shapelet_forest_example.py, the FileNotFoundError will occur.

Expected behavior
A clear and concise description of what you expected to happen.
We should get RSF and Extra metrics.

Setup (please complete the following information):

  • OS: Windows 10 64-bit
  • Python version: 3.8.6
  • NumPy version: 1.18.5
  • Cython version: 0.29.21

Additional context

Traceback (most recent call last):
  File "D:/MachineLearningDeveloper/wildboar/examples/shapelet_forest_example.py", line 7, in <module>
    x, y = datasets.load_gun_point()
  File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\__init__.py", line 119, in load_gun_point
    return load_dataset(
  File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\__init__.py", line 292, in load_dataset
    x, y, n_train_samples = repository.load(
  File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\_repository.py", line 164, in load
    with self._download_repository(
  File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\_repository.py", line 249, in _download_repository
    os.mkdir(cache_dir)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\Hephaest\\AppData\\Local\\Caches\\wildboar'

Process finished with exit code 1

Multivariate Case

Hi!
Thanks a lot for providing the code alongside the paper!
I managed to get the examples running but all the datasets are univariate, if I'm not mistaken.

I am wondering what's the format expected for a multivariate dataset?
Could you provide a (minimal) working example?

n_features_in_ not set

BaseEnsemble from scikit-learn no longer set n_features_in_

Workaround

Downgrade sklearn to version < 1.0

__version__ missed

When I try to build this rep from source code, I got an error as follows:

     from .version import version as __version__
ModuleNotFoundError: No module named 'wildboar.version'

So, I think the version should be properly assigned in the init.py

ShapeletForestClassifier error with multivariate time-series of different lengths

Dear Isak,

when I try to fit the ShapeletForestClassifier (scaled_dtw) on a nested dataframe containing multivariate time-series of different lengths, I get the following error:

"ValueError: setting an array element with a sequence."

If I try to fit on a Multiindex Dataframe, I get the following error instead:

"ValueError: Number of labels=712 does not match number of samples=509628"

It seems that it cannot handle multivariate time-series of different lengths, but yet the scaled dtw metric should work on such items.
I kindly ask for your help.

Thanks in advance.

Unecessary splits in shapele trees

I have a hypothesis on what can go wrong. here we skip shapelets that produce constant distances to all samples. The bug would present it self if all shapelets produce such constant distances and the best split is not initialised. Perhaps that’s what’s happening. I will try to implement some sanity check for that case and see if I can reproduce the bug.

Originally posted by @isaksamsten in #87 (reply in thread)

Segmentation Faults

Hello,
I installed wildboar on Ubuntu 18.04 LTS, with Python 3.6.8 using pip. The examples run perfectly fine, but whenever I try to run it on my data I get a segmentation fault. How can I resolve this?

Thanks

_check_estimator_name no longer included in sklearn.utils.validation

Hi Isak,

"from sklearn.utils.validation import (
_check_estimator_name,
_check_y,
check_consistent_length,
warnings,
)"

This import is no longer working with the most recent version of scikit-learn.

The same is happening with '_fit_context' from 'sklearn.base'. Any suggestions? Thank you!

Raises ModuleNotFoundError when importing the datasets module

We use the packaging module to parse version numbers.

The module is almost always installed (it's used by setuptools), so I don't think we should specify it as dependency, but use soft_dependency to warn users and suggest them to install the correct package.

_joblib_parallel_args removed from scikit-learn

Describe the bug
You're importing _joblib_parallel_args from sklearn.fixes however it's been removed

To Reproduce

$ pip install "scikit-learn>=0.21.3"
Requirement already satisfied: scikit-learn>=0.21.3 in ./.venv/lib/python3.8/site-packages (from wildboar) (1.1.1)

from sklearn.utils.fixes import _joblib_parallel_args
ImportError: cannot import name '_joblib_parallel_args' from 'sklearn.utils.fixes'

$ pip install "scikit-learn<=1.1.0"
Installing collected packages: scikit-learn
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 1.1.1
Uninstalling scikit-learn-1.1.1:
Successfully uninstalled scikit-learn-1.1.1
Successfully installed scikit-learn-1.0.2

from sklearn.utils.fixes import _joblib_parallel_args

Expected behavior
Compatible with latest version(s) of scikit-learn (>1.0.2)

Setup (please complete the following information):

  • OS: Ubuntu 20.04
  • Python version: 3.8
  • Wildboar version: 1.0.12

ShapeletForestClassifier does not support single dimension 3D input (i.e. shape (10, 1, 20))

Describe the bug

ShapeletForestClassifier crashes with a single dimension 3D numpy array input as sklearn cannot process it.

Traceback (most recent call last):
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\local_code.py", line 18, in <module>
    rf.fit(x, y)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\wildboar\ensemble\_ensemble.py", line 186, in fit
    self._fit(
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\wildboar\ensemble\_ensemble.py", line 240, in _fit
    return super()._fit(X, y, max_samples, max_depth, sample_weight, False)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\sklearn\ensemble\_bagging.py", line 434, in _fit
    all_results = Parallel(
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 1085, in __call__
    if self.dispatch_one_batch(iterator):
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 901, in dispatch_one_batch
    self._dispatch(tasks)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 819, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\_parallel_backends.py", line 597, in __init__
    self.results = batch()
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\sklearn\utils\fixes.py", line 117, in __call__
    return self.function(*args, **kwargs)
  File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\sklearn\ensemble\_bagging.py", line 84, in _parallel_build_estimators
    n_samples, n_features = X.shape
ValueError: too many values to unpack (expected 2)

Process finished with exit code 1

To Reproduce

import warnings
import numpy as np
from wildboar.datasets import load_dataset
from wildboar.ensemble import ShapeletForestClassifier

if __name__ == "__main__":
    warnings.simplefilter(action='ignore', category=FutureWarning)

    x, y = load_dataset("Beef")
    print(x.shape)
    rf = ShapeletForestClassifier()
    rf.fit(x, y)
    print(rf.score(x, y))

    x = np.reshape(x, (60, 1, 470))
    print(x.shape)
    rf = ShapeletForestClassifier()
    rf.fit(x, y)
    print(rf.score(x, y))

More than 1 dimension is fine, i.e. if you were to reshape it to (60, 2, 235).

Expected behavior

The classifier processes single dimension 3D input.

Setup (please complete the following information):

  • OS: Windows
  • Python version: 3.9
  • NumPy version: 1.22.4
  • Wildboar version: 1.1.1

Additional context

If this is a deliberate design decision, feel free to close.

Impurity at forest nodes is always -1 in Multivariate Shapelet Forest Classifier

Dear Isak Samsten,

while I was checking the internal attributes of ShapeletForestClassifier after the fit method was called, I noticed that the Impurity attribute for each node of one hundred trees was alywas -1.

I think it is an error, maybe due to the bad structure of my data. Do you have any suggestions?

Thanks in advance!

Timeout too low

          Ok I updated scikit-learn to 14.0, but now I'm facing this error:

"requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='datasets.wildboar.dev', port=443): Read timed out. (read timeout=3)"

Originally posted by @lorebon in #92 (comment)

Windows use_math

Describe the bug
Can't compile windows

To Reproduce
Compile on windows

Expected behavior
Should compile on windows

Setup (please complete the following information):

  • OS: Windows
  • Python version: 3.7
  • NumPy version:
  • Cython version:

Additional context
Fix: remove "m"

Load multiple datasets with merge_train_test=False

Describe the bug
Runtime error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/isak/.miniconda3/envs/wbweb/lib/python3.11/site-packages/wildboar/datasets/__init__.py", line 235, in load_datasets
    x, y = load_dataset(dataset, repository=repository, **kwargs)

To Reproduce

  1. load_datasets("wildboar/ucr-tiny", merge_train_test=False)

Expected behavior

Load train/test parts for each dataset

Assigned leaf node in decision_path()

Hi Isak,

I have a question about the RandomShapeletForest method "decision_path": I was wondering if it's possible to modify the boolean 2D matrix in order to keep track of the assigned leaf node for each sample.

Is there another way to recover such information? Suppose that there are at least two leaves with the same class.
Thank you in advance!

tree.plot_tree(clf) for ShapeletDecisionTree class

Dear Isak,

I was wondering if it's possible to plot a ShapeletDecisionTree structure like the scikit-learn method. For example:

output_10_0

It would be nice plotting the corresponding shapelet for each split. I don't think it's hard since the ShapeletDecisionTree class it's very similar to the classic one.

Do you have any ideas? Thanks in advance!

you request scikit-learn>=1.2rc1,<1.3a0 but pip install -U wildboar installs very old scikit-learn-1.1.3

Describe the bug
A clear and concise description of what the bug is.

you request
scikit-learn>=1.2rc1,<1.3a0
but pip install -U wildboar
installs very old
scikit-learn-1.1.3

installation fails
you have requirement
Cython>=0.29.24
numpy>=1.21.0
scikit-learn>=1.2rc1,<1.3a0
scipy>=1.3.2

Installation error
Installing collected packages: scikit-learn, wildboar
Attempting uninstall: scikit-learn
WARNING: Ignoring invalid distribution -orch (c:\my_py_environments\py310_env_apr2023\lib\site-packages)
Found existing installation: scikit-learn 1.2.2
Uninstalling scikit-learn-1.2.2:
Successfully uninstalled scikit-learn-1.2.2
WARNING: Ignoring invalid distribution -orch (c:\my_py_environments\py310_env_apr2023\lib\site-packages)
WARNING: Ignoring invalid distribution -orch (c:\my_py_environments\py310_env_apr2023\lib\site-packages)
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyts 0.13.0 requires scikit-learn>=1.2.0, but you have scikit-learn 1.1.3 which is incompatible.
Successfully installed scikit-learn-1.1.3 wildboar-1.1.1

image

To Reproduce
Steps to reproduce the behavior:
pip install -U wildboar

Expected behavior
A clear and concise description of what you expected to happen.

Setup (please complete the following information):

  • OS: [e.g. iOS] Windows 11

  • Python version: Python 3.10.11

  • NumPy version: np.version
    '1.23.5'

  • Cython version:

  • Cython.version
    '0.29.35'

  • Wildboar version: (version or commit)

  • wildboar-1.1.1

Additional context
Add any other context about the problem here.

Relicense project as LGPL

The project is licensed as GPL (v3). While great, it limits the libraries inclusion in projects with other licenses (e.g., MIT or BSD )

To resolve this, the project will be relicensed as LGPL which allows for dynamically link with the library (e.g., through downloading it from PyPi)

Wildboar ShapeletForestClassifier not working?

Hello,
I was trying to use your package, so I did a pip install wildboar started out with the tutorial code as follows:

from wildboar.datasets import load_synthetic_control, load_two_lead_ecg
x_train, x_test, y_train, y_test = load_two_lead_ecg(merge_train_test=False)
from wildboar.ensemble import ShapeletForestClassifier
clf = ShapeletForestClassifier()
clf.fit(x_train, y_train)
print(clf.predict(x_test[-1:, :]))

But rather than give 6 as expected, it gives 2 as output.
I tried many other UCR datasets, as I have used an older version of the ShapeletForestClassifier before to great effect, but the accuracy on all the ones I tried (FordA, Ham, Gun_Point, CBF,Cricket_X,...) are all horribly poor. Perhaps there is something broken in the release, as I know the ShapeletForestClassifier performs very well?

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.