carla-recourse / carla Goto Github PK

CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

License: MIT License

Makefile 0.31% Python 99.69%

python machine-learning artificial-intelligence explainable-ai explainable-ml explainability counterfactual-explanations counterfactuals counterfactual recourse

carla's People

Contributors

Stargazers

Watchers

Forkers

philoso-fish stjordanis afiqmuzaffar zeyefkey arnabkar sandy4321 simrit1 jianshijim waalbukhanajer personx000 jaedukseo mmovin sci-m-wang guidj fern001 johanvandenheuvel jayanthyetukuri gibbsg martinpawel 416014530 aredelmeier kdis-ds drobiu ai4verification oort77 ruchirk wei2624 mohamadmansourx pat-alt ah-ansari fwallyn voulgaris-sot xiemeigongzi shubhaguha grahams-uncle staeltchinda david0tt halicia theblackcoathunt augustebaum explainable-ml smkia istepka samsepiol1 geektoni jaalu maartenroest1 slbinilkumar astefano itwizalekh bogdan-kulynych xueyagaga syyunn 1058433796 twi09 xijunke cwl02716 teddyzander a4bello

carla's Issues

Migrate GS evaluation

Wrapper class/method around gs
Run gs & evaluate from entry point of application

Migrate adult dataset

Data loading (local/server/s3)
Data preprocessing

Autoencoder training for CEM

The current implementation of CEM does not allow to train an autoencoder while calling it.
This leads to errors when no pretrained autoencoder is available.

Add training inside constructor
Add training flag in hyperparameter
Add tests

Put CARLA on PyPi

Remove all print statements and replace with logger
Update setup.py with meta information
Create setup.cfg that links README
Add __version__ to package
Publish as package to github
Upload to PyPi
Put last 2 steps into CI/CD pipeline
Optional: Automatically increase version

Also check out this guide: https://realpython.com/pypi-publish-python-package/

Implement evaluation

Similar to the benchmarking process at https://github.com/Philoso-Fish/Benchmarkin_Counterfactual_Examples implement the evaluation to repeat the experiments from the KDD paper.

Make Actionable Recourse ready for linear model

AR for linear models does not need LIME. Coefficients and Intercepts are passed through parameter.

Fix output for CEM

The current output of CEM will not work with the benchmarking process.

Refactor output of CEM
Integrate in benchmarking process

Best practice for "Imputers"

What about imputers?

Do you have any recommendation or best practice for address them?

When they need to be applied for training data as well before to produce predictions.
When they need to be distinct for categorical and continuous values.

Migrating recourse methods: CEM

Implement the CEM method similar to the implementation at https://github.com/Philoso-Fish/Benchmarkin_Counterfactual_Examples
There are two variations of CEM to implement
- cem
- cem-vae
Add extensive test cases

The original Github repository can either be found asking Martin or in the old repository at CF_Models/cem_ml

Questions about parameters used in CARLA paper

Thank you for all the work you put into the CARLA package. It's a great help when trying to compare different counterfactual methods! My questions pertain the parameters/specifications used in the CARLA paper.

What was the train/test split for the ann model? Do you have a way to access the train/test data in the package so that I can fit my own model with the same data?
In the paper, the adult data set seems to fix the features age, sex, and race. However, in the dataset class, Datacatalog("adult").immutables gives only "age" and "sex" as the immutables. Can you comment on whether there was a change since the simulations in the paper were run or whether the default in the package is wrong?
I am getting an error when running the cem/cem-vae models. When running:

mlmodel = MLModelCatalog(dataset, "ann", backend)
CEM(factuals, mlmodel, hyperparams)

where hyperparams comes from the experimental_setup.yaml script, I get the following error:

ValueError: For hidden_layer is no default value defined, please pass this key and its value in hyperparams

Changing the hyperparameters "ae_params" dictionary to:
hyperparams["ae_params"] = {'hidden_layer': [20, 10, 7], 'train_ae': True, 'epochs': 5}

does the trick. Can you let me know if this is the same hidden_layer dimension used in the paper?

The give_me_some_credit data set models the response "SeriousDlqin2yrs" which is a negative response. However, the predict_negative_instances() function returns factuals as those with a predicted probability of less than 0.5 (specifically 1675 rows). I would argue that these are the positive instances and the negative instances have a probability of lower than 0.5. Do you have any comment about this?
Finally, are the results in Table 2 of the paper based on all factuals with a predicted probability less than 0.5 (i.e., 1675 rows for give_me_some_credit and 39954 rows for adult)?

Thanks for all of your help!
Annabelle

Refactor CEM to make it more readable

Add save path for experiment output

Until now, every call of run_experiment saves its results in cache. It would be good if we could define an argpars for individual save paths.

Migrating recourse methods: DICE

Implement the DICE method similar to the implementation at https://github.com/Philoso-Fish/Benchmarkin_Counterfactual_Examples
Add extensive test cases

ML Model Catalog yaml

Similar to data catalog, use a yaml file to configure feature order

Improve Face method

Improve the graph search method of Face, such that it can deal with continuous features.

However, authors also do not mention how to go about it

Migrate ANN predictions (H-)

Migrating recourse methods: FACE

Implement the FACE method similar to the implementation at https://github.com/Philoso-Fish/Benchmarkin_Counterfactual_Examples
There are two variations of FACE to implement
- face-eps
- face-knn
Add extensive test cases

The original Github repository can either be found asking Martin or in the old repository at CF_Models/face_ml

CLUE L0 distance of 18 (num features is 14)

The L0 distance for CLUE is computed with a factual and counterfactual list of 21 entries, rather then 14. Probably the one_hot_encoding that causes this.

MLModelCatalog predict method incompatible with pipeline

The MLModelCatalog predict method has the following signature:

def predict(
    self, x: Union[np.ndarray, pd.DataFrame, torch.Tensor, tf.Tensor]
) -> Union[np.ndarray, pd.DataFrame, torch.Tensor, tf.Tensor]:

however if the MLModelCatalog pipeline is enabled then x is also input for

def perform_pipeline(self, df: pd.DataFrame) -> pd.DataFrame:

i.e. the predict function can take input types that are incompatible with the possible model settings.

Fix the documentation

Some documentation left from the old function parameters

https://github.com/indyfree/CARLA/blob/406d5f03a16edb8c03d90ba926f5a7b583fe4603/carla/recourse_methods/catalog/dice/model.py#L46

Binary columns in CEM

In the get_counterfactuals method in CEM binary_cols is used. Is this the same as data.categoricals?

Also there is also a method map_binary_backto_string, this is implemented differently in the new api right?

https://github.com/indyfree/CARLA/blob/afb3cef7b3d412cfa40780e12c114be062352fb3/carla/recourse_methods/catalog/cem/cem.py#L567

Would NAMs be another interesting model to add?

We recently published Neural Additive Models (NAMs) at NeurIPS 2021 which combines the interpretability of generalized additive models with neural nets (see the architecture shown below). We also evaluated NAMs on some of the dataset here including COMPAS and adult. Do you think they'd make for a good addition in this repo too?

The source code for NAMs is open-sourced in tensorflow (official version) and pytorch!
For a quick summary of NAMs, look at this thread.

Migrating recourse methods: Actionabel Recourse

Implement the AR method similar to the implementation at https://github.com/Philoso-Fish/Benchmarkin_Counterfactual_Examples
Add extensive test cases

Add Command Line Interface with common functionality

Inaccurate Bug in constraint_violation Check

Hi,

When I tried to use the benchmark to test the constraint violations, I find an inaccurate casting in violations.py Line 34. When casting a float to an int using astype, we can get some inaccurate results which will break the constraint_violation check. Instead, we should use round before typecasting.

For example:

df_decoded_cfs[model.data.continous] = pd.DataFrame.round(df_decoded_cfs[
    model.data.continous
]).astype(
    "int64"
)

Add functionality to check for pretrained autoencoder models.

Missing tags in the examples

In https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/examples.html there are @property tags missing.
E.g.

     def feature_input_order(self):
         return [...]

should be

    @property
    def feature_input_order(self):
        return [...]

Need auto-encoder model

Restructure data api/ catalog

Given the problem between correct normalization, encoding, and feature order, which is specific to a certain, arbitrary black-box model, we need to have a setter method for class properties encoded, normalized, and encoded_normalized.

This method can use the black-box model and its pipeline as input to build the required dataframes.

Add documentation

Construct a documentation page with our docstring via Sphinx and integrate it via Github Pages

In predict_label normalizing can cause problems if model.use_pipeline is false

If a mlmodel is trained with use_pipeline = False then normalizing when calling predict_negative_instances seems to result in an error.

Loading ML model from catalog if directory already exists

For example, if someone loads first the tensorflow ann model into cache at '../Users/xxx/carla/models/ann/adult/ann.h5' then it is not possible to load the pytorch model as ann.pt into the same directory.

Even the directory already exists, the if-condition
if not os.path.exists(cache_path):
would not recognize it, since the new file ann.pt is not contained. Creating the path would lead to an error.

Add linear TF model

Add trained linear model
Write tests

yNN computation

Hi!
Just a quick clarification question about the code used to calculate yNN. I noticed that the the labels of the closest neighbours (neighbour_label) in nearest_neighbours.py is compared to something called 'cf_label' which is defined as

cf_label = row[mlmodel.data.target]

If I'm not mistaken, cf_label is then the TRUE label of the test observation (not the predicted label).

This seems to be different from what is written under 4.2 "yNN" in the arXiv paper. Do you think it should be the predicted label and not the TRUE label that should be compared instead?

Thanks for your help.

Loading Pytorch model

It is currently not possible to load a pytorch model which was saved in the banchmarking repsitory.

To load such a model it is necessary to have access to the original class, as discussed here and here.

Loading it without access to the class causes the error ModuleNotFound Error: No module named 'ML_Model'

Erase nan-values in adult

The current adult dataset contains still nan values.
Fix preprocessing and lose unknown data.

Migrate evaluation metrics

Evaluation into own module separate from main program
Write simplistic unit test

Migrating measurements

Implement the remaining measurements used in https://github.com/Philoso-Fish/Benchmarkin_Counterfactual_Examples

Use typehinting

Use typehinting for method parameter
Add automatic checks for typehinting in workflow

Check if feature order is correct for Tensorflow

Loading Tensorflow model with h5py > 2.1.0

An update in the h5py package causes errors in model loading for Tensorflow architectures (Link)

To successfully load our tensrflow model we need h5py version 2.10.0

Refactor D2 and D3 distances

We want the functions to be immutable & deterministc and not enforce the caller to have to use dataframes.

Calculate the range before calling the d2 and d3 distances and pass as parameter
Don't remove target labels within distance function. These functions should just calculate distances between any two lists/arrays
Write unit tests to test with empty and 1-element inputs

predict_negative_instances normalization can result in double normalization if model use_pipeline is True

predict_negative_instances calls 'predict_label' which normalized the input, and calls 'model.predict'. If 'model.use_pipeline==True' then 'model.predict' again normalizes the data, resulting in double normalization.

Write AE model structure and training

Keep the model structure and training of AE inside CARLA and train a model, if it is needed inside a recourse method. After Training, the model is saved in our cache and can be loaded from there, if the recourse method is called another time.

The training doesn't take long, so we don't need to keep trained models in a repository.

Basis project setup

Basic folder structure
Pre-commit
Makefile
Github pipeline linting

Migrating recourse methods: CLUE

Implement the CLUE method similar to the implementation at https://github.com/Philoso-Fish/Benchmarkin_Counterfactual_Examples
Add extensive test cases

The original Github repository can either be found asking Martin or in the old repository at CF_Models/clue_ml.

CLUE needs a pytorch model, maybe solving issue #16 is crucial for this.