Code Monkey home page Code Monkey logo

equalityml's People

Contributors

bjb2088 avatar jamesng-dev avatar joaogranja avatar jzdavis66 avatar nyujwc331 avatar proinsights avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

equalityml's Issues

Independent thresholds for each method when using compare_mitigation_methods

Currently the compare_mitigation_methods() function seems to rely on a pre-defined threshold.

for mitigation_method in mitigation_methods:
ml_model = self.model_mitigation(mitigation_method=mitigation_method, **kwargs)
if self.mitigated_testing_data is not None:
testing_data = self.mitigated_testing_data
else:
testing_data = self.testing_data if self.testing_data is not None else self.training_data
score = binary_threshold_score(ml_model,
testing_data[self.features],
testing_data[self.target_variable],
scoring=scoring,
threshold=self.threshold,
utility_costs=utility_costs)
fairness_metric = self.fairness_metric(self._metric_name)
comparison_df.loc[mitigation_method] = [score, fairness_metric]

It is statistically more correct to select a new threshold for each model. This will require taking in from the user the decision_maker when calling compare_mitigation_methods() I don't know if this is urgent - as the current approach seems to give a good approximate result.

Also, recall that the threshold function uses random seed.

Support for phenotype assessment tools?

Hi! We love your work @onefact and are happy to help if we can.

Work I helped develop during my postdoc is here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148336/

We have assessed several large language models for compliance with the Affordable Care Act non-discrimination clause (https://www.hhs.gov/about/leadership/melanie-fontes-rainer.html).

Specifically, the demographic parity metric is one I haven't found in your repository, and such assessment is necessary prior to training machine learning/artificial intelligence algorithms using labels derived from clinical phenotypes. For example presence or absence of a disease could be computed as a SQL query executed against a clinical data repository such as the one we work with from the NIH, researchallofus.org (@all-of-us).

Are such algorithmic fairness criteria for clinical phenotype assessment out of scope for @EqualityAI?

Please let us know as we will be releasing open source tools around this over the summer and don't want to duplicate your excellent work here!

Correlation remover doesn't show up in plots when using compare_mitigation_methods()

Correlation remover doesn't show up in plots when using compare_mitigation_methods()

Suggest we add it to the list here.

def map_bias_mitigation(self):
return {'treatment_equality_ratio': [''],
'treatment_equality_difference': [''],
'balance_positive_class': [''],
'balance_negative_class': [''],
'equal_opportunity_ratio': [''],
'accuracy_equality_ratio': [''],
'predictive_parity_ratio': [''],
'predictive_equality_ratio': [''],
'statistical_parity_ratio': ['disparate-impact-remover', 'resampling',
'resampling-preferential', 'reweighing']}

self.threshold may mean self._threshold?

I didn't notice self.threshold being declared. Possibly I missed it. I did see self._threshold. Is it declared someplace I missed? If not, possible typo.

score = binary_threshold_score(self.orig_ml_model,
testing_data[self.features],
testing_data[self.target_variable],
scoring=scoring,
threshold=self.threshold,
utility_costs=utility_costs)
fairness_metric = self.fairness_metric(self._metric_name)
comparison_df.loc['reference'] = [score, fairness_metric]
# Iterate over mitigation methods list and re-evaluate score and fairness metric
for mitigation_method in mitigation_methods:
ml_model = self.model_mitigation(mitigation_method=mitigation_method, **kwargs)
if self.mitigated_testing_data is not None:
testing_data = self.mitigated_testing_data
else:
testing_data = self.testing_data if self.testing_data is not None else self.training_data
score = binary_threshold_score(ml_model,
testing_data[self.features],
testing_data[self.target_variable],
scoring=scoring,
threshold=self.threshold,
utility_costs=utility_costs)

Dependency issue on Tkinter while using on databricks jupyter notebooks

Hello,
I'm trying to install equalityml on databricks which is an environment that doesn't allow us to install system libraries using something like apt install python-tk.

Here's a screenshot of the error I get:
Screenshot 2023-03-10 at 04 18 11

I started investigating the issue until I found this ticket
Trusted-AI/AIF360#415

from the above issue, I realized that tkinter is not needed anymore as a dependency. However Aif360 hasn't had a new release since September, and I got no response from them so far when I asked for a newer release.

I've tried installing AIF360 from source, but the lastest commit has other newer dependencies that created issues.

That's why I've created a fork of AIF360 to have a quick fix for this issue until they make a new release including the fix.
https://github.com/lanterno/aif360

so, I'm opening this PR to see if anyone else is having a similar issue, and also to propose a solution.

I've tried modifying EqualityML dependencies with the fork I mentioned above, but it still didn't work because the complex interconnected dependencies that caused conflicts in dependencies.

The final solution that I finally managed to get to work came after switching the dependency management to poetry instead pip.

Poetry has better dependency resolution, and I was lucky that it solved the dependency issue without problems.

I'm not sure if the plan for EqualityML to switch to poetry, but it does seem like a good alternative since it really gives more power than a classic requirements.txt and setup.py

I will have a PR ready soon, and we can discuss it.

Add random_seed for resampling

Resampling is random so we need to make it reproducible.

I think dalex.resample uses numpy. possibly we can just put np.random_seed(random_seed) in the code someplace.

Let me know if we need to talk about this one.

def _resampling_data(self,
data,
mitigation_method):
"""
Resample the input data using 'resample' function from dalex package.
"""
# Uniform resampling
idx_resample = 0
if (mitigation_method == "resampling-uniform") or (mitigation_method == "resampling"):
idx_resample = resample(data[self.protected_variable],
data[self.target_variable],
type='uniform',
verbose=False)
# Preferential resampling
elif mitigation_method == "resampling-preferential":
_pred_prob = self._predict_binary_prob(self.orig_ml_model, data)
idx_resample = resample(data[self.protected_variable],
data[self.target_variable],
type='preferential', verbose=False,
probs=_pred_prob)
mitigated_data = data.iloc[idx_resample, :]
return mitigated_data

Inform user of 1/metric calculation

Suggestion for documentation.
We should inform the user here that we return the fairness parity metric or 1/metric as they are equivalent.

Returns the fairness metric score for the input fairness metric name.

Suggested text: "Returns the fairness metric score for the input fairness metric name. Note that in cases where the fairness metric is > 1 we return 1/fairness metric score to allow for easy comparison. "

Correlation remover uses fit_transform() instead of fit() and transform()

The training and testing sets should use the same correlation remover object. It should be fit() on only the training data, and should transform() both the training data and the testing data.

These lines of code show an example:

cr = CorrelationRemover(sensitive_feature_ids=['sex'], alpha=1)
cr.fit(train_data.drop(['two_year_recid'], axis=1))
train2 = cr.transform(train_data.drop(['two_year_recid'], axis=1))
test2 = cr.transform(testing_data.drop(['two_year_recid'], axis=1))

These are the relevant GitHub references for review.

def _cr_removing_data(self,
data,
alpha=1.0):
"""
Filters out sensitive correlations in a dataset using 'CorrelationRemover' function from fairlearn package.
"""
# Getting correlation coefficient for mitigation_method 'correlation_remover'. The input alpha parameter is
# used to control the level of filtering between the sensitive and non-sensitive features
# remove the outcome variable and sensitive variable
data_rm_columns = data.columns.drop([self.protected_variable, self.target_variable])
cr = CorrelationRemover(sensitive_feature_ids=[self.protected_variable], alpha=alpha)
data_std = cr.fit_transform(data.drop(columns=[self.target_variable]))
train_data_cr = pd.DataFrame(data_std, columns=data_rm_columns, index=data.index)
# Concatenate data after correlation remover
mitigated_data = pd.concat(
[pd.DataFrame(data[self.target_variable]),
pd.DataFrame(data[self.protected_variable]),
train_data_cr], axis=1)
# Keep the same columns order
mitigated_data = mitigated_data[data.columns]
return mitigated_data

elif mitigation_method == "correlation-remover":
mitigated_training_data = self._cr_removing_data(self.training_data, alpha)
mitigated_dataset['training_data'] = mitigated_training_data
self.mitigated_training_data = mitigated_training_data
if self.testing_data is not None:
mitigated_testing_data = self._cr_removing_data(self.testing_data, alpha)
mitigated_dataset['testing_data'] = mitigated_testing_data
self.mitigated_testing_data = mitigated_testing_data

Comment:

Disparate impact remover is coded to use only fit_transform(). AIF360 did not provide a transform() function https://github.com/Trusted-AI/AIF360/blob/master/aif360/algorithms/preprocessing/disparate_impact_remover.py. They say "In order to transform test data in the same manner as training data, the distributions of attributes conditioned on the protected attribute must be the same." We could technically make the same assumption and always use fit_transform() for correlation remover because it seems to perform well, but I think it is bad practice as in deployment sometimes we make only one prediction at a time and we can't estimate correlation when only one observation is present, fit_transform() will error out. This is a weakness of the disparate impact remover.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.