Code Monkey home page Code Monkey logo

carla's People

Contributors

ah-ansari avatar aredelmeier avatar indyfree avatar johanvandenheuvel avatar philoso-fish avatar voulgaris-sot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

carla's Issues

Autoencoder training for CEM

The current implementation of CEM does not allow to train an autoencoder while calling it.
This leads to errors when no pretrained autoencoder is available.

  • Add training inside constructor
  • Add training flag in hyperparameter
  • Add tests

Put CARLA on PyPi

  • Remove all print statements and replace with logger
  • Update setup.py with meta information
  • Create setup.cfg that links README
  • Add __version__ to package
  • Publish as package to github
  • Upload to PyPi
  • Put last 2 steps into CI/CD pipeline
  • Optional: Automatically increase version

Also check out this guide: https://realpython.com/pypi-publish-python-package/

Fix output for CEM

The current output of CEM will not work with the benchmarking process.

  • Refactor output of CEM
  • Integrate in benchmarking process

Best practice for "Imputers"

What about imputers?

Do you have any recommendation or best practice for address them?

  • When they need to be applied for training data as well before to produce predictions.
  • When they need to be distinct for categorical and continuous values.

Questions about parameters used in CARLA paper

Thank you for all the work you put into the CARLA package. It's a great help when trying to compare different counterfactual methods! My questions pertain the parameters/specifications used in the CARLA paper.

  1. What was the train/test split for the ann model? Do you have a way to access the train/test data in the package so that I can fit my own model with the same data?

  2. In the paper, the adult data set seems to fix the features age, sex, and race. However, in the dataset class, Datacatalog("adult").immutables gives only "age" and "sex" as the immutables. Can you comment on whether there was a change since the simulations in the paper were run or whether the default in the package is wrong?

  3. I am getting an error when running the cem/cem-vae models. When running:

mlmodel = MLModelCatalog(dataset, "ann", backend)
CEM(factuals, mlmodel, hyperparams)

where hyperparams comes from the experimental_setup.yaml script, I get the following error:

ValueError: For hidden_layer is no default value defined, please pass this key and its value in hyperparams

Changing the hyperparameters "ae_params" dictionary to:
hyperparams["ae_params"] = {'hidden_layer': [20, 10, 7], 'train_ae': True, 'epochs': 5}

does the trick. Can you let me know if this is the same hidden_layer dimension used in the paper?

  1. The give_me_some_credit data set models the response "SeriousDlqin2yrs" which is a negative response. However, the predict_negative_instances() function returns factuals as those with a predicted probability of less than 0.5 (specifically 1675 rows). I would argue that these are the positive instances and the negative instances have a probability of lower than 0.5. Do you have any comment about this?

  2. Finally, are the results in Table 2 of the paper based on all factuals with a predicted probability less than 0.5 (i.e., 1675 rows for give_me_some_credit and 39954 rows for adult)?

Thanks for all of your help!
Annabelle

Add save path for experiment output

Until now, every call of run_experiment saves its results in cache. It would be good if we could define an argpars for individual save paths.

Improve Face method

Improve the graph search method of Face, such that it can deal with continuous features.

However, authors also do not mention how to go about it

MLModelCatalog predict method incompatible with pipeline

The MLModelCatalog predict method has the following signature:

def predict(
    self, x: Union[np.ndarray, pd.DataFrame, torch.Tensor, tf.Tensor]
) -> Union[np.ndarray, pd.DataFrame, torch.Tensor, tf.Tensor]:

however if the MLModelCatalog pipeline is enabled then x is also input for

def perform_pipeline(self, df: pd.DataFrame) -> pd.DataFrame:

i.e. the predict function can take input types that are incompatible with the possible model settings.

Would NAMs be another interesting model to add?

We recently published Neural Additive Models (NAMs) at NeurIPS 2021 which combines the interpretability of generalized additive models with neural nets (see the architecture shown below). We also evaluated NAMs on some of the dataset here including COMPAS and adult. Do you think they'd make for a good addition in this repo too?

The source code for NAMs is open-sourced in tensorflow (official version) and pytorch!
For a quick summary of NAMs, look at this thread.

image

Inaccurate Bug in constraint_violation Check

Hi,

When I tried to use the benchmark to test the constraint violations, I find an inaccurate casting in violations.py Line 34. When casting a float to an int using astype, we can get some inaccurate results which will break the constraint_violation check. Instead, we should use round before typecasting.

For example:

df_decoded_cfs[model.data.continous] = pd.DataFrame.round(df_decoded_cfs[
    model.data.continous
]).astype(
    "int64"
) 

Restructure data api/ catalog

Given the problem between correct normalization, encoding, and feature order, which is specific to a certain, arbitrary black-box model, we need to have a setter method for class properties encoded, normalized, and encoded_normalized.

This method can use the black-box model and its pipeline as input to build the required dataframes.

Add documentation

Construct a documentation page with our docstring via Sphinx and integrate it via Github Pages

Loading ML model from catalog if directory already exists

For example, if someone loads first the tensorflow ann model into cache at '../Users/xxx/carla/models/ann/adult/ann.h5' then it is not possible to load the pytorch model as ann.pt into the same directory.

Even the directory already exists, the if-condition
if not os.path.exists(cache_path):
would not recognize it, since the new file ann.pt is not contained. Creating the path would lead to an error.

yNN computation

Hi!
Just a quick clarification question about the code used to calculate yNN. I noticed that the the labels of the closest neighbours (neighbour_label) in nearest_neighbours.py is compared to something called 'cf_label' which is defined as

cf_label = row[mlmodel.data.target]

If I'm not mistaken, cf_label is then the TRUE label of the test observation (not the predicted label).

This seems to be different from what is written under 4.2 "yNN" in the arXiv paper. Do you think it should be the predicted label and not the TRUE label that should be compared instead?

Thanks for your help.

Loading Pytorch model

It is currently not possible to load a pytorch model which was saved in the banchmarking repsitory.

To load such a model it is necessary to have access to the original class, as discussed here and here.

Loading it without access to the class causes the error ModuleNotFound Error: No module named 'ML_Model'

Use typehinting

  • Use typehinting for method parameter
  • Add automatic checks for typehinting in workflow

Refactor D2 and D3 distances

We want the functions to be immutable & deterministc and not enforce the caller to have to use dataframes.

  • Calculate the range before calling the d2 and d3 distances and pass as parameter
  • Don't remove target labels within distance function. These functions should just calculate distances between any two lists/arrays
  • Write unit tests to test with empty and 1-element inputs

Write AE model structure and training

Keep the model structure and training of AE inside CARLA and train a model, if it is needed inside a recourse method. After Training, the model is saved in our cache and can be loaded from there, if the recourse method is called another time.

The training doesn't take long, so we don't need to keep trained models in a repository.

Automate docs building process

With this one: #81 we have documentation hosted on read-the-docs, now we want an automated build that pushes to it. According to @Philoso-Fish requires a github webhook that can be retrieved via the admin interface

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.