Comments (5)
Hey, usually issues with mixed data arise from a mis-specification of the categorical_names
parameter, I would double check this first and make sure that all categorical columns are keyed and the values of each key cover all categories.
Edit: I see the call to_numpy()
which suggests that X
is a pd.DataFrame
. For categorical variables we use the convention that they need to be label-encoded (i.e. values per category should be [0, 1, ..., n-1]
where n
is the number of categories for that variable. Would also check if that's the case.
from alibi.
For categorical variables we use the convention that they need to be label-encoded (i.e. values per category should be [0, 1, ..., n-1] where n is the number of categories for that variable. Would also check if that's the case.
This is it, that's why my example is failing.
It would be great to add some assert check to enforce that convention in anchor_tabular init.
from alibi.
For categorical variables we use the convention that they need to be label-encoded (i.e. values per category should be [0, 1, ..., n-1] where n is the number of categories for that variable. Would also check if that's the case.
This is it, that's why my example is failing. It would be great to add some assert check to enforce that convention in anchor_tabular init.
There's multiple ways to go about validation and it's usually fairly tricky to validate custom user data, would be keen to hear if you have more specific suggestions, e.g.:
- validate
categorical_names
- this doesn't give us much as it wouldn't confirm whether the actual data is label-encoded or not - validate
X_train
duringfit
- here we could cross-reference withcategorical_names
and check that the categorical columns are as expected
from alibi.
I would suggest doing it in gen_category_map
and maybe update the description of the method.
On another side, AnchorTabular
supporting label-encoded values only is a major blocker for my use-case since in my stack I represent missing data as -1. I believe minor changes in the exampler would allow to suppose both encoded and raw label values.
from alibi.
I understand that some our conventions make it more difficult to cater for all all use-cases, but this is a trade-off we've had to make, at least for the time being. The alternative here would be having to consider any custom encoding scheme, e.g. even allowing label-encoding with arbitrary user-supplied integers for categories would be infeasible without every user providing even more metadata about their specific encoding.
As for your case, there are a couple of workarounds. Essentially missing data is another type of category (separate for each categorical data column). This gives two options:
- If changing your encoding is an option, you could encode the missing values as the last category for each column. E.g. instead of
-1
for every missing value across all categories, for a columni
with categories encoded as0, 1, ..., n_i-1
, a missing value would be encoded asn_i
(i.e. as an extra category for columni
). - If changing the encoding for your model is not feasible, you could consider writing a wrapper prediction function similar to this. I.e. the wrapper function would expect the data as
alibi
expects it (label-encoded - you could use the same trick as above to encode missing data as an extra category), then atransform_input
function would transform all those extra categories to-1
before feeding into the model.
I believe minor changes in the exampler would allow to suppose both encoded and raw label values.
I'm not sure I follow here, do you mean string labels for "raw label values" here? Would be good to see what you have in mind.
from alibi.
Related Issues (20)
- `typing-extensions` 4.6.0 breakage HOT 1
- Predictor attr not properly cleared when saving HOT 2
- `numba` warnings regarding `nopython` keyword HOT 3
- `PartialDependenceVariance` cannot be saved
- CI failing due to use of `np.int` in `shap` HOT 2
- How to pass parameters to the /api/v1.0/explain using AnchorImage? HOT 2
- `KernelShap` returns no explanations when `link='logit'` and predicted proba is 0 or 1
- RuntimeError: The Session graph is empty. Add operations to the graph before calling run() HOT 6
- RuntimeError: tf.placeholder() is not compatible with eager execution.
- RuntimeError: The Session graph is empty. Add operations to the graph before calling run().
- TypeError: 'float' object is not subscriptable
- PDP plots failing with `matplotlib==3.8.0` HOT 1
- `matplotlib` 3.8.0 type hints for public APIs result in type-checking failures
- Logo in README.md HOT 1
- Columns and DataType Not Explicitly Set on line 61 of data.py
- When use explainer.fit(X_train)οΌit went into a loop of error
- After getting anchors, how to use it to predict the label of an instance?
- Mixed continuous and categorical features in the AnchorTabular explainers (my dataset doesn't contain NaNs)
- New release? HOT 1
- IndexError with AnchorExplainer and Yolov8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alibi.