Comments (9)
Looking a bit more into the logs, would have shorten my reasoning...
The logs are confirming that np.nan are part of the probs, which are forecasted.
from sktime.
Is there an way to automatically create issues from failure on main
branch? If not, may be we should track this in an issue (potentially good first issue).
from sktime.
Re errors: afaik these are all sporadic. I hope these are already reported.
Re automatic issue creation - I do not know.
from sktime.
hmmm, I do hope the first failure is sporadic. I'd think so, but I cannot find a reference.
from sktime.
skipping some of these in #6208 - we should of course keep this open to track.
from sktime.
Regarding the classification:
My assumption is that the array of the probabilities prob
is empty for some cases.
If you take a look into _predict
from classification/deep_learning/base.py
def _predict(self, X, **kwargs):
probs = self._predict_proba(X, **kwargs)
rng = check_random_state(self.random_state)
return np.array(
[
self.classes_[int(rng.choice(np.flatnonzero(prob == prob.max())))]
for prob in probs
]
)
The error ValueError: 'a' cannot be empty unless no samples are taken
is caused by rng.choice(np.flatnonzero(prob == prob.max()))
. In this line, the array np.flatnonzero(prob == prob.max())
has to be empty (the same error is raised by rng.choice([])
. Since prob
is not empty (in that case, prob.max()
would fail with ValueError: zero-size array to reduction operation maximum which has no identity
), the return of np.flatnonzero
must be empty, which here can only be the case if prob == prob.max()
returns an array containing only False
. Since the outputs of the network are normalized before this calculation is performed. I assume that one of the network's output is either np.inf
or np.nan
. Then the normalization would lead to an array of np.nans and np.array([np.nan, np.nan]) == np.array([np.nan, np.nan]).max()
is an array of False.
If this makes sense I would propose to:
- Check the network's output and raise an error in that case.
- Set the seed in get_test_params to a seed where we know that the network has no np.infs as output.
from sktime.
I see.
One solution that springs to mind, would it make sense to have nans to be overridden by the class frequencies in the training set?
That ensures a prediction is always made.
For reproducibility or scientific cleanness, it might be good to allow this to be controlled by a parameter or config, but leave it on as a default?
from sktime.
In my opinion that would be confusing, since the classifier would still produce valid results even if the state of the model is broken...
But I would also be okay, with your proposal.
P.S. I observed a few additional issues in the classifiers. E.g., the SimpleRNN Classifier has as activation a linear function and not a softmax, which is also not configurable. Thus, not sure, if we should fix these things while fixing this issue.
from sktime.
Regarding Forecasting Errors: I assume that the sporadic failures is caused by the random inputs. Perhaps it is related to that issue: statsmodels/statsmodels#5459
There is a comment that says enforce_stationarity=False can fix this error, but would might reduce the performance.. So perhaps we add this parameter in our test_params and hope that this helps to avoid the sporadic failures in the forecasting
from sktime.
Related Issues (20)
- [ENH] Logging intermediate results in forecasting benchmarking HOT 1
- [MNT] keras/transformers installation error on python 3.8-3.11 in CI HOT 2
- [ENH] EnbPI and SPCI algorithms
- [ENH][MNT] interface estimators from `neuralforecast` into `sktime`
- [BUG] `LabelEncoder` does not fit in a nested pipeline HOT 24
- [BUG] broken MACNNClassifier and MACNNRegressor
- [BUG] MCDCNNRegressor does not perform fitting/training?
- [BUG] CNTCRegressor raises an error on calling .fit()
- [BUG] In clustering averaging_method = 'dba' crashed HOT 3
- [ENH] categorical feature support: input checking - column type encoding by the `__dataframe__` protocol HOT 3
- [BUG] `TimeSeriesForestRegressor` failure - Parameter `self.criterion` does not exist in `TimeSeriesForestRegressor` HOT 1
- [BUG] pepy.tech downloads tracker on readme landing page is broken
- [BUG] Differencer is producing `SettingWithCopyWarning` HOT 3
- [ENH] wishlist: outlier detection, segmentation, change point detection (annotation module) HOT 3
- [MNT] Bump scikit-base to 0.8.0 HOT 1
- [BUG] failure of `test_differencer_cutoff` HOT 4
- [BUG] `ForecastingHorizon` `M` and `Y` `freq` incompatibility with `pandas >=2.2.0` HOT 10
- [BUG] `PyKanForecaster` `TypeError: 'NoneType' object is not subscriptable` HOT 3
- [BUG] ForecastingSkoptSearchCV documentation example fails to run HOT 25
- [BUG] KNeighborsTimeSeriesClassifier.neighbor() failed HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sktime.