I have the following error when trying to call explain on an AnchorTabular fitted on a

Anchor Tabular - KeyError in get_features_index about alibi HOT 5 OPEN

pocman commented on June 10, 2024

Anchor Tabular - KeyError in get_features_index

from alibi.

Comments (5)

jklaise commented on June 10, 2024

Hey, usually issues with mixed data arise from a mis-specification of the categorical_names parameter, I would double check this first and make sure that all categorical columns are keyed and the values of each key cover all categories.

Edit: I see the call to_numpy() which suggests that X is a pd.DataFrame. For categorical variables we use the convention that they need to be label-encoded (i.e. values per category should be [0, 1, ..., n-1] where n is the number of categories for that variable. Would also check if that's the case.

from alibi.

pocman commented on June 10, 2024

For categorical variables we use the convention that they need to be label-encoded (i.e. values per category should be [0, 1, ..., n-1] where n is the number of categories for that variable. Would also check if that's the case.

This is it, that's why my example is failing.
It would be great to add some assert check to enforce that convention in anchor_tabular init.

from alibi.

jklaise commented on June 10, 2024

For categorical variables we use the convention that they need to be label-encoded (i.e. values per category should be [0, 1, ..., n-1] where n is the number of categories for that variable. Would also check if that's the case.

This is it, that's why my example is failing. It would be great to add some assert check to enforce that convention in anchor_tabular init.

There's multiple ways to go about validation and it's usually fairly tricky to validate custom user data, would be keen to hear if you have more specific suggestions, e.g.:

validate categorical_names - this doesn't give us much as it wouldn't confirm whether the actual data is label-encoded or not
validate X_train during fit - here we could cross-reference with categorical_names and check that the categorical columns are as expected

from alibi.

pocman commented on June 10, 2024

I would suggest doing it in gen_category_map and maybe update the description of the method.

On another side, AnchorTabular supporting label-encoded values only is a major blocker for my use-case since in my stack I represent missing data as -1. I believe minor changes in the exampler would allow to suppose both encoded and raw label values.

from alibi.

jklaise commented on June 10, 2024

I understand that some our conventions make it more difficult to cater for all all use-cases, but this is a trade-off we've had to make, at least for the time being. The alternative here would be having to consider any custom encoding scheme, e.g. even allowing label-encoding with arbitrary user-supplied integers for categories would be infeasible without every user providing even more metadata about their specific encoding.

As for your case, there are a couple of workarounds. Essentially missing data is another type of category (separate for each categorical data column). This gives two options:

If changing your encoding is an option, you could encode the missing values as the last category for each column. E.g. instead of -1 for every missing value across all categories, for a column i with categories encoded as 0, 1, ..., n_i-1, a missing value would be encoded as n_i (i.e. as an extra category for column i).
If changing the encoding for your model is not feasible, you could consider writing a wrapper prediction function similar to this. I.e. the wrapper function would expect the data as alibi expects it (label-encoded - you could use the same trick as above to encode missing data as an extra category), then a transform_input function would transform all those extra categories to -1 before feeding into the model.

I believe minor changes in the exampler would allow to suppose both encoded and raw label values.

I'm not sure I follow here, do you mean string labels for "raw label values" here? Would be good to see what you have in mind.

from alibi.

Anchor Tabular - KeyError in get_features_index about alibi HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent