Is there an way to automatically create issues from failure on <code class="notranslat

skipping some of these in <a class="issue-link js-issue-link" data-error-text="Failed

[BUG] sporadic failures on `main` branch - `SimpleRNNClassifier`, `SARIMAX`, `ProximityForest` about sktime HOT 9 OPEN

yarnabrina commented on July 18, 2024

[BUG] sporadic failures on `main` branch - `SimpleRNNClassifier`, `SARIMAX`, `ProximityForest`

from sktime.

Comments (9)

benHeid commented on July 18, 2024 1

Looking a bit more into the logs, would have shorten my reasoning...

The logs are confirming that np.nan are part of the probs, which are forecasted.

from sktime.

yarnabrina commented on July 18, 2024

Is there an way to automatically create issues from failure on main branch? If not, may be we should track this in an issue (potentially good first issue).

from sktime.

fkiraly commented on July 18, 2024

Re errors: afaik these are all sporadic. I hope these are already reported.

Re automatic issue creation - I do not know.

from sktime.

fkiraly commented on July 18, 2024

hmmm, I do hope the first failure is sporadic. I'd think so, but I cannot find a reference.

from sktime.

fkiraly commented on July 18, 2024

skipping some of these in #6208 - we should of course keep this open to track.

from sktime.

benHeid commented on July 18, 2024

Regarding the classification:

My assumption is that the array of the probabilities prob is empty for some cases.

If you take a look into _predict from classification/deep_learning/base.py

    def _predict(self, X, **kwargs):
        probs = self._predict_proba(X, **kwargs)
        rng = check_random_state(self.random_state)
        return np.array(
            [
                self.classes_[int(rng.choice(np.flatnonzero(prob == prob.max())))]
                for prob in probs
            ]
        )

The error ValueError: 'a' cannot be empty unless no samples are taken is caused by rng.choice(np.flatnonzero(prob == prob.max())). In this line, the array np.flatnonzero(prob == prob.max()) has to be empty (the same error is raised by rng.choice([]). Since prob is not empty (in that case, prob.max() would fail with ValueError: zero-size array to reduction operation maximum which has no identity), the return of np.flatnonzero must be empty, which here can only be the case if prob == prob.max() returns an array containing only False. Since the outputs of the network are normalized before this calculation is performed. I assume that one of the network's output is either np.inf or np.nan. Then the normalization would lead to an array of np.nans and np.array([np.nan, np.nan]) == np.array([np.nan, np.nan]).max() is an array of False.

If this makes sense I would propose to:

Check the network's output and raise an error in that case.
Set the seed in get_test_params to a seed where we know that the network has no np.infs as output.

from sktime.

fkiraly commented on July 18, 2024

I see.

One solution that springs to mind, would it make sense to have nans to be overridden by the class frequencies in the training set?
That ensures a prediction is always made.

For reproducibility or scientific cleanness, it might be good to allow this to be controlled by a parameter or config, but leave it on as a default?

from sktime.

benHeid commented on July 18, 2024

In my opinion that would be confusing, since the classifier would still produce valid results even if the state of the model is broken...
But I would also be okay, with your proposal.

P.S. I observed a few additional issues in the classifiers. E.g., the SimpleRNN Classifier has as activation a linear function and not a softmax, which is also not configurable. Thus, not sure, if we should fix these things while fixing this issue.

from sktime.

benHeid commented on July 18, 2024

Regarding Forecasting Errors: I assume that the sporadic failures is caused by the random inputs. Perhaps it is related to that issue: statsmodels/statsmodels#5459

There is a comment that says enforce_stationarity=False can fix this error, but would might reduce the performance.. So perhaps we add this parameter in our test_params and hope that this helps to avoid the sporadic failures in the forecasting

from sktime.

[BUG] sporadic failures on `main` branch - `SimpleRNNClassifier`, `SARIMAX`, `ProximityForest` about sktime HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent