The equalityml from equalityai

Independent thresholds for each method when using compare_mitigation_methods

Currently the compare_mitigation_methods() function seems to rely on a pre-defined threshold.

Lines 714 to 728 in 90e0443

    
           for mitigation_method in mitigation_methods: 
        
               ml_model = self.model_mitigation(mitigation_method=mitigation_method, **kwargs) 
        
               if self.mitigated_testing_data is not None: 
        
                   testing_data = self.mitigated_testing_data 
        
               else: 
        
                   testing_data = self.testing_data if self.testing_data is not None else self.training_data 
        
               score = binary_threshold_score(ml_model, 
        
                                              testing_data[self.features], 
        
                                              testing_data[self.target_variable], 
        
                                              scoring=scoring, 
        
                                              threshold=self.threshold, 
        
                                              utility_costs=utility_costs) 
        
               fairness_metric = self.fairness_metric(self._metric_name) 
        
               comparison_df.loc[mitigation_method] = [score, fairness_metric]

It is statistically more correct to select a new threshold for each model. This will require taking in from the user the decision_maker when calling compare_mitigation_methods() I don't know if this is urgent - as the current approach seems to give a good approximate result.

Also, recall that the threshold function uses random seed.

Support for phenotype assessment tools?

Hi! We love your work @onefact and are happy to help if we can.

Work I helped develop during my postdoc is here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148336/

We have assessed several large language models for compliance with the Affordable Care Act non-discrimination clause (https://www.hhs.gov/about/leadership/melanie-fontes-rainer.html).

Specifically, the demographic parity metric is one I haven't found in your repository, and such assessment is necessary prior to training machine learning/artificial intelligence algorithms using labels derived from clinical phenotypes. For example presence or absence of a disease could be computed as a SQL query executed against a clinical data repository such as the one we work with from the NIH, researchallofus.org (@all-of-us).

Are such algorithmic fairness criteria for clinical phenotype assessment out of scope for @EqualityAI?

Please let us know as we will be releasing open source tools around this over the summer and don't want to duplicate your excellent work here!

Correlation remover doesn't show up in plots when using compare_mitigation_methods()

Suggest we add it to the list here.

EqualityML/equalityml/fair.py

Lines 172 to 182 in 90e0443

    
           def map_bias_mitigation(self): 
        
               return {'treatment_equality_ratio': [''], 
        
                       'treatment_equality_difference': [''], 
        
                       'balance_positive_class': [''], 
        
                       'balance_negative_class': [''], 
        
                       'equal_opportunity_ratio': [''], 
        
                       'accuracy_equality_ratio': [''], 
        
                       'predictive_parity_ratio': [''], 
        
                       'predictive_equality_ratio': [''], 
        
                       'statistical_parity_ratio': ['disparate-impact-remover', 'resampling', 
        
                                                    'resampling-preferential', 'reweighing']}

self.threshold may mean self._threshold?

I didn't notice self.threshold being declared. Possibly I missed it. I did see self._threshold. Is it declared someplace I missed? If not, possible typo.

EqualityML/equalityml/fair.py

Lines 704 to 726 in 90e0443

    
           score = binary_threshold_score(self.orig_ml_model, 
        
                                          testing_data[self.features], 
        
                                          testing_data[self.target_variable], 
        
                                          scoring=scoring, 
        
                                          threshold=self.threshold, 
        
                                          utility_costs=utility_costs) 
        
           fairness_metric = self.fairness_metric(self._metric_name) 
        
           comparison_df.loc['reference'] = [score, fairness_metric] 
        
           # Iterate over mitigation methods list and re-evaluate score and fairness metric 
        
           for mitigation_method in mitigation_methods: 
        
               ml_model = self.model_mitigation(mitigation_method=mitigation_method, **kwargs) 
        
               if self.mitigated_testing_data is not None: 
        
                   testing_data = self.mitigated_testing_data 
        
               else: 
        
                   testing_data = self.testing_data if self.testing_data is not None else self.training_data 
        
               score = binary_threshold_score(ml_model, 
        
                                              testing_data[self.features], 
        
                                              testing_data[self.target_variable], 
        
                                              scoring=scoring, 
        
                                              threshold=self.threshold, 
        
                                              utility_costs=utility_costs)

Make results of compare_mitigation_models() reproducible

Results are not reproducible. Add random_seed = ... to resampling method of bias mitigation, possibly other functions as well.

EqualityML/equalityml/fair.py

Lines 617 to 625 in 90e0443

    
           def compare_mitigation_methods(self, 
        
                                          scoring=None, 
        
                                          utility_costs=None, 
        
                                          metric_name=None, 
        
                                          mitigation_methods=None, 
        
                                          fairness_threshold=0.8, 
        
                                          show=False, 
        
                                          save_figure=False, 
        
                                          **kwargs):

Dependency issue on Tkinter while using on databricks jupyter notebooks

Hello,
I'm trying to install equalityml on databricks which is an environment that doesn't allow us to install system libraries using something like apt install python-tk.

Here's a screenshot of the error I get:

I started investigating the issue until I found this ticket
Trusted-AI/AIF360#415

from the above issue, I realized that tkinter is not needed anymore as a dependency. However Aif360 hasn't had a new release since September, and I got no response from them so far when I asked for a newer release.

I've tried installing AIF360 from source, but the lastest commit has other newer dependencies that created issues.

That's why I've created a fork of AIF360 to have a quick fix for this issue until they make a new release including the fix.
https://github.com/lanterno/aif360

so, I'm opening this PR to see if anyone else is having a similar issue, and also to propose a solution.

I've tried modifying EqualityML dependencies with the fork I mentioned above, but it still didn't work because the complex interconnected dependencies that caused conflicts in dependencies.

The final solution that I finally managed to get to work came after switching the dependency management to poetry instead pip.

Poetry has better dependency resolution, and I was lucky that it solved the dependency issue without problems.

I'm not sure if the plan for EqualityML to switch to poetry, but it does seem like a good alternative since it really gives more power than a classic requirements.txt and setup.py

I will have a PR ready soon, and we can discuss it.

Add random_seed for resampling

Resampling is random so we need to make it reproducible.

I think dalex.resample uses numpy. possibly we can just put np.random_seed(random_seed) in the code someplace.

Let me know if we need to talk about this one.

EqualityML/equalityml/fair.py

Lines 270 to 294 in 90e0443

    
               def _resampling_data(self, 
        
                                    data, 
        
                                    mitigation_method): 
        
                   """ 
        
                   Resample the input data using 'resample' function from dalex package. 
        
                   """ 
        
                   # Uniform resampling 
        
                   idx_resample = 0 
        
                   if (mitigation_method == "resampling-uniform") or (mitigation_method == "resampling"): 
        
                       idx_resample = resample(data[self.protected_variable], 
        
                                               data[self.target_variable], 
        
                                               type='uniform', 
        
                                               verbose=False) 
        
                   # Preferential resampling 
        
                   elif mitigation_method == "resampling-preferential": 
        
                       _pred_prob = self._predict_binary_prob(self.orig_ml_model, data) 
        
                       idx_resample = resample(data[self.protected_variable], 
        
                                               data[self.target_variable], 
        
                                               type='preferential', verbose=False, 
        
                                               probs=_pred_prob) 
        
                   mitigated_data = data.iloc[idx_resample, :] 
        
                   return mitigated_data

Inform user of 1/metric calculation

Suggestion for documentation.
We should inform the user here that we return the fairness parity metric or 1/metric as they are equivalent.

EqualityML/equalityml/fair.py

Line 414 in 90e0443

Returns the fairness metric score for the input fairness metric name.

Suggested text: "Returns the fairness metric score for the input fairness metric name. Note that in cases where the fairness metric is > 1 we return 1/fairness metric score to allow for easy comparison. "

Correlation remover uses fit_transform() instead of fit() and transform()

The training and testing sets should use the same correlation remover object. It should be fit() on only the training data, and should transform() both the training data and the testing data.

These lines of code show an example:

cr = CorrelationRemover(sensitive_feature_ids=['sex'], alpha=1)
cr.fit(train_data.drop(['two_year_recid'], axis=1))
train2 = cr.transform(train_data.drop(['two_year_recid'], axis=1))
test2 = cr.transform(testing_data.drop(['two_year_recid'], axis=1))

These are the relevant GitHub references for review.

EqualityML/equalityml/fair.py

Lines 219 to 244 in 90e0443

    
               def _cr_removing_data(self, 
        
                                     data, 
        
                                     alpha=1.0): 
        
                   """ 
        
                   Filters out sensitive correlations in a dataset using 'CorrelationRemover' function from fairlearn package. 
        
                   """ 
        
                   # Getting correlation coefficient for mitigation_method 'correlation_remover'. The input alpha parameter is 
        
                   # used to control the level of filtering between the sensitive and non-sensitive features 
        
                   # remove the outcome variable and sensitive variable 
        
                   data_rm_columns = data.columns.drop([self.protected_variable, self.target_variable]) 
        
                   cr = CorrelationRemover(sensitive_feature_ids=[self.protected_variable], alpha=alpha) 
        
                   data_std = cr.fit_transform(data.drop(columns=[self.target_variable])) 
        
                   train_data_cr = pd.DataFrame(data_std, columns=data_rm_columns, index=data.index) 
        
                   # Concatenate data after correlation remover 
        
                   mitigated_data = pd.concat( 
        
                       [pd.DataFrame(data[self.target_variable]), 
        
                        pd.DataFrame(data[self.protected_variable]), 
        
                        train_data_cr], axis=1) 
        
                   # Keep the same columns order 
        
                   mitigated_data = mitigated_data[data.columns] 
        
                   return mitigated_data

EqualityML/equalityml/fair.py

Lines 363 to 371 in 90e0443

    
           elif mitigation_method == "correlation-remover": 
        
               mitigated_training_data = self._cr_removing_data(self.training_data, alpha) 
        
               mitigated_dataset['training_data'] = mitigated_training_data 
        
               self.mitigated_training_data = mitigated_training_data 
        
               if self.testing_data is not None: 
        
                   mitigated_testing_data = self._cr_removing_data(self.testing_data, alpha) 
        
                   mitigated_dataset['testing_data'] = mitigated_testing_data 
        
                   self.mitigated_testing_data = mitigated_testing_data

Comment:

Disparate impact remover is coded to use only fit_transform(). AIF360 did not provide a transform() function https://github.com/Trusted-AI/AIF360/blob/master/aif360/algorithms/preprocessing/disparate_impact_remover.py. They say "In order to transform test data in the same manner as training data, the distributions of attributes conditioned on the protected attribute must be the same." We could technically make the same assumption and always use fit_transform() for correlation remover because it seems to perform well, but I think it is bad practice as in deployment sometimes we make only one prediction at a time and we can't estimate correlation when only one observation is present, fit_transform() will error out. This is a weakness of the disparate impact remover.

equalityai / equalityml Goto Github PK

equalityml's People

Contributors

Stargazers

Watchers

Forkers

equalityml's Issues

Independent thresholds for each method when using compare_mitigation_methods

Support for phenotype assessment tools?

Correlation remover doesn't show up in plots when using compare_mitigation_methods()

self.threshold may mean self._threshold?

Make results of compare_mitigation_models() reproducible

Dependency issue on Tkinter while using on databricks jupyter notebooks

Add random_seed for resampling

Inform user of 1/metric calculation

Correlation remover uses fit_transform() instead of fit() and transform()

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	for mitigation_method in mitigation_methods:
	ml_model = self.model_mitigation(mitigation_method=mitigation_method, **kwargs)
	if self.mitigated_testing_data is not None:
	testing_data = self.mitigated_testing_data
	else:
	testing_data = self.testing_data if self.testing_data is not None else self.training_data

	score = binary_threshold_score(ml_model,
	testing_data[self.features],
	testing_data[self.target_variable],
	scoring=scoring,
	threshold=self.threshold,
	utility_costs=utility_costs)
	fairness_metric = self.fairness_metric(self._metric_name)
	comparison_df.loc[mitigation_method] = [score, fairness_metric]

	def map_bias_mitigation(self):
	return {'treatment_equality_ratio': [''],
	'treatment_equality_difference': [''],
	'balance_positive_class': [''],
	'balance_negative_class': [''],
	'equal_opportunity_ratio': [''],
	'accuracy_equality_ratio': [''],
	'predictive_parity_ratio': [''],
	'predictive_equality_ratio': [''],
	'statistical_parity_ratio': ['disparate-impact-remover', 'resampling',
	'resampling-preferential', 'reweighing']}

	score = binary_threshold_score(self.orig_ml_model,
	testing_data[self.features],
	testing_data[self.target_variable],
	scoring=scoring,
	threshold=self.threshold,
	utility_costs=utility_costs)
	fairness_metric = self.fairness_metric(self._metric_name)
	comparison_df.loc['reference'] = [score, fairness_metric]

	# Iterate over mitigation methods list and re-evaluate score and fairness metric
	for mitigation_method in mitigation_methods:
	ml_model = self.model_mitigation(mitigation_method=mitigation_method, **kwargs)
	if self.mitigated_testing_data is not None:
	testing_data = self.mitigated_testing_data
	else:
	testing_data = self.testing_data if self.testing_data is not None else self.training_data

	score = binary_threshold_score(ml_model,
	testing_data[self.features],
	testing_data[self.target_variable],
	scoring=scoring,
	threshold=self.threshold,
	utility_costs=utility_costs)

	def compare_mitigation_methods(self,
	scoring=None,
	utility_costs=None,
	metric_name=None,
	mitigation_methods=None,
	fairness_threshold=0.8,
	show=False,
	save_figure=False,
	**kwargs):

	def _resampling_data(self,
	data,
	mitigation_method):
	"""
	Resample the input data using 'resample' function from dalex package.
	"""

	# Uniform resampling
	idx_resample = 0
	if (mitigation_method == "resampling-uniform") or (mitigation_method == "resampling"):
	idx_resample = resample(data[self.protected_variable],
	data[self.target_variable],
	type='uniform',
	verbose=False)
	# Preferential resampling
	elif mitigation_method == "resampling-preferential":
	_pred_prob = self._predict_binary_prob(self.orig_ml_model, data)
	idx_resample = resample(data[self.protected_variable],
	data[self.target_variable],
	type='preferential', verbose=False,
	probs=_pred_prob)

	mitigated_data = data.iloc[idx_resample, :]

	return mitigated_data

	def _cr_removing_data(self,
	data,
	alpha=1.0):
	"""
	Filters out sensitive correlations in a dataset using 'CorrelationRemover' function from fairlearn package.
	"""

	# Getting correlation coefficient for mitigation_method 'correlation_remover'. The input alpha parameter is
	# used to control the level of filtering between the sensitive and non-sensitive features

	# remove the outcome variable and sensitive variable
	data_rm_columns = data.columns.drop([self.protected_variable, self.target_variable])

	cr = CorrelationRemover(sensitive_feature_ids=[self.protected_variable], alpha=alpha)
	data_std = cr.fit_transform(data.drop(columns=[self.target_variable]))
	train_data_cr = pd.DataFrame(data_std, columns=data_rm_columns, index=data.index)

	# Concatenate data after correlation remover
	mitigated_data = pd.concat(
	[pd.DataFrame(data[self.target_variable]),
	pd.DataFrame(data[self.protected_variable]),
	train_data_cr], axis=1)

	# Keep the same columns order
	mitigated_data = mitigated_data[data.columns]
	return mitigated_data

	elif mitigation_method == "correlation-remover":
	mitigated_training_data = self._cr_removing_data(self.training_data, alpha)
	mitigated_dataset['training_data'] = mitigated_training_data
	self.mitigated_training_data = mitigated_training_data

	if self.testing_data is not None:
	mitigated_testing_data = self._cr_removing_data(self.testing_data, alpha)
	mitigated_dataset['testing_data'] = mitigated_testing_data
	self.mitigated_testing_data = mitigated_testing_data