wildboar-foundation / wildboar Goto Github PK
View Code? Open in Web Editor NEWwildboar is a Python module for temporal machine learning
Home Page: https://wildboar.dev
License: BSD 3-Clause "New" or "Revised" License
wildboar is a Python module for temporal machine learning
Home Page: https://wildboar.dev
License: BSD 3-Clause "New" or "Revised" License
Is your feature request related to a problem? Please describe.
A ShapeletForestClassifier
-class to not force users to use BaggingClassifier
. It could also hide the force_dim
parameter of the ShapeletTreeClassifier
Describe the solution you'd like
A classifier interface similar to RandomForestClassifier
Describe alternatives you've considered
None
Additional context
None
Is your feature request related to a problem? Please describe.
It is not clear how the code can be used
Describe the solution you'd like
Clear examples
Describe alternatives you've considered
Not applicable
Additional context
Add any other context or screenshots about the feature request here.
Dear Isak Samsten,
is it possible to retrieve/obtain all the shapelets extracted by ShapeletForestClassifier?
Which part of the code has to be modified? I could do it alone.
Thanks in advance,
LB
Describe the bug
I want to run the shapelet_forest_example.py but it ends at reporting the following traceback:
Traceback (most recent call last):
File "D:/MachineLearningDeveloper/wildboar/examples/shapelet_forest_example.py", line 5, in <module>
from wildboar.ensemble import ExtraShapeletTreesClassifier, ShapeletForestClassifier
File "D:\MachineLearningDeveloper\wildboar\wildboar\ensemble\__init__.py", line 18, in <module>
from ._ensemble import ShapeletForestClassifier
File "D:\MachineLearningDeveloper\wildboar\wildboar\ensemble\_ensemble.py", line 32, in <module>
from ..tree import ExtraShapeletTreeClassifier
File "D:\MachineLearningDeveloper\wildboar\wildboar\tree\__init__.py", line 18, in <module>
from ._tree import ShapeletTreeClassifier
File "D:\MachineLearningDeveloper\wildboar\wildboar\tree\_tree.py", line 24, in <module>
from ._tree_builder import ClassificationShapeletTreeBuilder
ModuleNotFoundError: No module named 'wildboar.tree._tree_builder'
To Reproduce
Steps to reproduce the behavior:
run the shapelet_forest_example.py
Expected behavior
Expect to print out results in the console.
Setup (please complete the following information):
Additional context
None
Given a single multivariate sample (n_dims, n_timestep)
and a dataset of (n_samples, n_dims, n_timestep)
, pairwise_distance should return an array of shape (n_samples)
(if dim="mean"
or int
) or (n_dims, n_samples)
if dim="full"
.
The standard deviation in ScaledDTW and ScaledEuclidean uses a simple one pass algorithm which is inaccurate and unstable.
We should investigate if there are more accurate one-pass algorithms.
The windows builds seems to be randomly failing.
The issue is reported upstream and there is a pull-request to resolve it: pypa/cibuildwheel#1740
Waiting for the fix to update the workflow.
Is your feature request related to a problem? Please describe.
Unit tests
Describe the solution you'd like
A test suite to detect bugs
Describe alternatives you've considered
Not applicable
Additional context
Add any other context or screenshots about the feature request here.
Describe the bug
In \wildboar\datasets\__init__.py
, _os_cache_path
set as %LOCALAPPDATA%\Caches
. In fact, this path doesn't exist, the default path is %LOCALAPPDATA%\cache
.
To Reproduce
Steps to reproduce the behavior:
shapelet_forest_example.py
, the FileNotFoundError will occur.Expected behavior
A clear and concise description of what you expected to happen.
We should get RSF and Extra metrics.
Setup (please complete the following information):
Additional context
Traceback (most recent call last):
File "D:/MachineLearningDeveloper/wildboar/examples/shapelet_forest_example.py", line 7, in <module>
x, y = datasets.load_gun_point()
File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\__init__.py", line 119, in load_gun_point
return load_dataset(
File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\__init__.py", line 292, in load_dataset
x, y, n_train_samples = repository.load(
File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\_repository.py", line 164, in load
with self._download_repository(
File "D:\MachineLearningDeveloper\wildboar\wildboar\datasets\_repository.py", line 249, in _download_repository
os.mkdir(cache_dir)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\Hephaest\\AppData\\Local\\Caches\\wildboar'
Process finished with exit code 1
Hi!
Thanks a lot for providing the code alongside the paper!
I managed to get the examples running but all the datasets are univariate, if I'm not mistaken.
I am wondering what's the format expected for a multivariate dataset?
Could you provide a (minimal) working example?
BaseEnsemble from scikit-learn no longer set n_features_in_
Downgrade sklearn to version < 1.0
When I try to build this rep from source code, I got an error as follows:
from .version import version as __version__
ModuleNotFoundError: No module named 'wildboar.version'
So, I think the version should be properly assigned in the init.py
Dear Isak,
when I try to fit the ShapeletForestClassifier (scaled_dtw) on a nested dataframe containing multivariate time-series of different lengths, I get the following error:
"ValueError: setting an array element with a sequence."
If I try to fit on a Multiindex Dataframe, I get the following error instead:
"ValueError: Number of labels=712 does not match number of samples=509628"
It seems that it cannot handle multivariate time-series of different lengths, but yet the scaled dtw metric should work on such items.
I kindly ask for your help.
Thanks in advance.
As mentioned in discussions, is it possible to implement multivariate support in the explainability module?
In particular, I'm interested in the ShapeletForestCounterfactual method. Thanks in advance!
I have a hypothesis on what can go wrong. here we skip shapelets that produce constant distances to all samples. The bug would present it self if all shapelets produce such constant distances and the best split is not initialised. Perhaps that’s what’s happening. I will try to implement some sanity check for that case and see if I can reproduce the bug.
Originally posted by @isaksamsten in #87 (reply in thread)
Hello,
I installed wildboar on Ubuntu 18.04 LTS, with Python 3.6.8 using pip. The examples run perfectly fine, but whenever I try to run it on my data I get a segmentation fault. How can I resolve this?
Thanks
Hi Isak,
"from sklearn.utils.validation import (
_check_estimator_name,
_check_y,
check_consistent_length,
warnings,
)"
This import is no longer working with the most recent version of scikit-learn.
The same is happening with '_fit_context' from 'sklearn.base'. Any suggestions? Thank you!
We use the packaging
module to parse version numbers.
The module is almost always installed (it's used by setuptools
), so I don't think we should specify it as dependency, but use soft_dependency
to warn users and suggest them to install the correct package.
Is your feature request related to a problem? Please describe.
will it work for multivariate time series classification for example mixture of categorical and continues data?
for example at time t1 we have observation: red, 2.4 , 5, 12.456 and time t2: green, 3.5, 2, 45.78; time t3: black, 5.6, 7, 23.56; t4: red, 2.1, 5, 12.6 ?
CastorTransform
Describe the bug
You're importing _joblib_parallel_args
from sklearn.fixes
however it's been removed
To Reproduce
$ pip install "scikit-learn>=0.21.3"
Requirement already satisfied: scikit-learn>=0.21.3 in ./.venv/lib/python3.8/site-packages (from wildboar) (1.1.1)
from sklearn.utils.fixes import _joblib_parallel_args
ImportError: cannot import name '_joblib_parallel_args' from 'sklearn.utils.fixes'
$ pip install "scikit-learn<=1.1.0"
Installing collected packages: scikit-learn
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 1.1.1
Uninstalling scikit-learn-1.1.1:
Successfully uninstalled scikit-learn-1.1.1
Successfully installed scikit-learn-1.0.2
from sklearn.utils.fixes import _joblib_parallel_args
Expected behavior
Compatible with latest version(s) of scikit-learn (>1.0.2)
Setup (please complete the following information):
RocketClassifier does not have a classes_-property
Describe the bug
ShapeletForestClassifier crashes with a single dimension 3D numpy array input as sklearn cannot process it.
Traceback (most recent call last):
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\local_code.py", line 18, in <module>
rf.fit(x, y)
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\wildboar\ensemble\_ensemble.py", line 186, in fit
self._fit(
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\wildboar\ensemble\_ensemble.py", line 240, in _fit
return super()._fit(X, y, max_samples, max_depth, sample_weight, False)
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\sklearn\ensemble\_bagging.py", line 434, in _fit
all_results = Parallel(
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 1085, in __call__
if self.dispatch_one_batch(iterator):
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 901, in dispatch_one_batch
self._dispatch(tasks)
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 819, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\_parallel_backends.py", line 597, in __init__
self.results = batch()
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 288, in __call__
return [func(*args, **kwargs)
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
return [func(*args, **kwargs)
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\sklearn\utils\fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "D:\CMP Machine Learning\tsmlPy-estimator-evaluation\venv\lib\site-packages\sklearn\ensemble\_bagging.py", line 84, in _parallel_build_estimators
n_samples, n_features = X.shape
ValueError: too many values to unpack (expected 2)
Process finished with exit code 1
To Reproduce
import warnings
import numpy as np
from wildboar.datasets import load_dataset
from wildboar.ensemble import ShapeletForestClassifier
if __name__ == "__main__":
warnings.simplefilter(action='ignore', category=FutureWarning)
x, y = load_dataset("Beef")
print(x.shape)
rf = ShapeletForestClassifier()
rf.fit(x, y)
print(rf.score(x, y))
x = np.reshape(x, (60, 1, 470))
print(x.shape)
rf = ShapeletForestClassifier()
rf.fit(x, y)
print(rf.score(x, y))
More than 1 dimension is fine, i.e. if you were to reshape it to (60, 2, 235).
Expected behavior
The classifier processes single dimension 3D input.
Setup (please complete the following information):
Additional context
If this is a deliberate design decision, feel free to close.
Dear Isak Samsten,
while I was checking the internal attributes of ShapeletForestClassifier after the fit
method was called, I noticed that the Impurity
attribute for each node of one hundred trees was alywas -1.
I think it is an error, maybe due to the bad structure of my data. Do you have any suggestions?
Thanks in advance!
If length < 3
undefined memory will be read.
Ok I updated scikit-learn to 14.0, but now I'm facing this error:
"requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='datasets.wildboar.dev', port=443): Read timed out. (read timeout=3)"
Originally posted by @lorebon in #92 (comment)
Describe the bug
Can't compile windows
To Reproduce
Compile on windows
Expected behavior
Should compile on windows
Setup (please complete the following information):
Additional context
Fix: remove "m"
Describe the bug
Runtime error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/isak/.miniconda3/envs/wbweb/lib/python3.11/site-packages/wildboar/datasets/__init__.py", line 235, in load_datasets
x, y = load_dataset(dataset, repository=repository, **kwargs)
To Reproduce
Expected behavior
Load train/test parts for each dataset
Hi Isak,
I have a question about the RandomShapeletForest method "decision_path": I was wondering if it's possible to modify the boolean 2D matrix in order to keep track of the assigned leaf node for each sample.
Is there another way to recover such information? Suppose that there are at least two leaves with the same class.
Thank you in advance!
Describe the bug
n_dims_ is no longer set.
To Reproduce
set bootstrap=True and contamination_set="oob" in IsolationShapeletForest
Expected behavior
n_dims_in_ should be used instead
Edit: that does fix the import error but I cannot get the unsupervised example to work (either with scikit-learn==1.1.1 and v1.1.0rc4 or the previous release). The ShapeletForestEmbedding returns a matrix of all ones which causes issues in the remaining steps
Dear Isak,
I was wondering if it's possible to plot a ShapeletDecisionTree structure like the scikit-learn method. For example:
It would be nice plotting the corresponding shapelet for each split. I don't think it's hard since the ShapeletDecisionTree class it's very similar to the classic one.
Do you have any ideas? Thanks in advance!
Describe the bug
RocketClassifier with n_jobs > 1 while using cross_validation does not work
Describe the bug
RocketClassifier predict_proba is not working
Describe the bug
A clear and concise description of what the bug is.
you request
scikit-learn>=1.2rc1,<1.3a0
but pip install -U wildboar
installs very old
scikit-learn-1.1.3
installation fails
you have requirement
Cython>=0.29.24
numpy>=1.21.0
scikit-learn>=1.2rc1,<1.3a0
scipy>=1.3.2
Installation error
Installing collected packages: scikit-learn, wildboar
Attempting uninstall: scikit-learn
WARNING: Ignoring invalid distribution -orch (c:\my_py_environments\py310_env_apr2023\lib\site-packages)
Found existing installation: scikit-learn 1.2.2
Uninstalling scikit-learn-1.2.2:
Successfully uninstalled scikit-learn-1.2.2
WARNING: Ignoring invalid distribution -orch (c:\my_py_environments\py310_env_apr2023\lib\site-packages)
WARNING: Ignoring invalid distribution -orch (c:\my_py_environments\py310_env_apr2023\lib\site-packages)
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyts 0.13.0 requires scikit-learn>=1.2.0, but you have scikit-learn 1.1.3 which is incompatible.
Successfully installed scikit-learn-1.1.3 wildboar-1.1.1
To Reproduce
Steps to reproduce the behavior:
pip install -U wildboar
Expected behavior
A clear and concise description of what you expected to happen.
Setup (please complete the following information):
OS: [e.g. iOS] Windows 11
Python version: Python 3.10.11
NumPy version: np.version
'1.23.5'
Cython version:
Cython.version
'0.29.35'
Wildboar version: (version or commit)
wildboar-1.1.1
Additional context
Add any other context about the problem here.
Universal binaries are compiled for GNU/Linux and Python 3.8, 3.9, 3.10
will it work for windows ?
The project is licensed as GPL (v3). While great, it limits the libraries inclusion in projects with other licenses (e.g., MIT or BSD )
To resolve this, the project will be relicensed as LGPL which allows for dynamically link with the library (e.g., through downloading it from PyPi)
Hello,
I was trying to use your package, so I did a pip install wildboar
started out with the tutorial code as follows:
from wildboar.datasets import load_synthetic_control, load_two_lead_ecg
x_train, x_test, y_train, y_test = load_two_lead_ecg(merge_train_test=False)
from wildboar.ensemble import ShapeletForestClassifier
clf = ShapeletForestClassifier()
clf.fit(x_train, y_train)
print(clf.predict(x_test[-1:, :]))
But rather than give 6 as expected, it gives 2 as output.
I tried many other UCR datasets, as I have used an older version of the ShapeletForestClassifier before to great effect, but the accuracy on all the ones I tried (FordA, Ham, Gun_Point, CBF,Cricket_X,...) are all horribly poor. Perhaps there is something broken in the release, as I know the ShapeletForestClassifier performs very well?
Thanks in advance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.