equalityai / equalityml Goto Github PK
View Code? Open in Web Editor NEWEvidence-based tools and community collaboration to end algorithmic bias, one data scientist at a time.
License: Apache License 2.0
Evidence-based tools and community collaboration to end algorithmic bias, one data scientist at a time.
License: Apache License 2.0
Currently the compare_mitigation_methods() function seems to rely on a pre-defined threshold.
Lines 714 to 728 in 90e0443
It is statistically more correct to select a new threshold for each model. This will require taking in from the user the decision_maker when calling compare_mitigation_methods() I don't know if this is urgent - as the current approach seems to give a good approximate result.
Also, recall that the threshold function uses random seed.
Hi! We love your work @onefact and are happy to help if we can.
Work I helped develop during my postdoc is here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148336/
We have assessed several large language models for compliance with the Affordable Care Act non-discrimination clause (https://www.hhs.gov/about/leadership/melanie-fontes-rainer.html).
Specifically, the demographic parity metric is one I haven't found in your repository, and such assessment is necessary prior to training machine learning/artificial intelligence algorithms using labels derived from clinical phenotypes. For example presence or absence of a disease could be computed as a SQL query executed against a clinical data repository such as the one we work with from the NIH, researchallofus.org (@all-of-us).
Are such algorithmic fairness criteria for clinical phenotype assessment out of scope for @EqualityAI?
Please let us know as we will be releasing open source tools around this over the summer and don't want to duplicate your excellent work here!
Correlation remover doesn't show up in plots when using compare_mitigation_methods()
Suggest we add it to the list here.
Lines 172 to 182 in 90e0443
I didn't notice self.threshold being declared. Possibly I missed it. I did see self._threshold. Is it declared someplace I missed? If not, possible typo.
Lines 704 to 726 in 90e0443
Results are not reproducible. Add random_seed = ... to resampling method of bias mitigation, possibly other functions as well.
Lines 617 to 625 in 90e0443
Hello,
I'm trying to install equalityml
on databricks which is an environment that doesn't allow us to install system libraries using something like apt install python-tk
.
Here's a screenshot of the error I get:
I started investigating the issue until I found this ticket
Trusted-AI/AIF360#415
from the above issue, I realized that tkinter is not needed anymore as a dependency. However Aif360 hasn't had a new release since September, and I got no response from them so far when I asked for a newer release.
I've tried installing AIF360 from source, but the lastest commit has other newer dependencies that created issues.
That's why I've created a fork of AIF360 to have a quick fix for this issue until they make a new release including the fix.
https://github.com/lanterno/aif360
so, I'm opening this PR to see if anyone else is having a similar issue, and also to propose a solution.
I've tried modifying EqualityML dependencies with the fork I mentioned above, but it still didn't work because the complex interconnected dependencies that caused conflicts in dependencies.
The final solution that I finally managed to get to work came after switching the dependency management to poetry instead pip.
Poetry has better dependency resolution, and I was lucky that it solved the dependency issue without problems.
I'm not sure if the plan for EqualityML to switch to poetry, but it does seem like a good alternative since it really gives more power than a classic requirements.txt and setup.py
I will have a PR ready soon, and we can discuss it.
Resampling is random so we need to make it reproducible.
I think dalex.resample uses numpy. possibly we can just put np.random_seed(random_seed) in the code someplace.
Let me know if we need to talk about this one.
Lines 270 to 294 in 90e0443
Suggestion for documentation.
We should inform the user here that we return the fairness parity metric or 1/metric as they are equivalent.
Line 414 in 90e0443
Suggested text: "Returns the fairness metric score for the input fairness metric name. Note that in cases where the fairness metric is > 1 we return 1/fairness metric score to allow for easy comparison. "
The training and testing sets should use the same correlation remover object. It should be fit() on only the training data, and should transform() both the training data and the testing data.
These lines of code show an example:
cr = CorrelationRemover(sensitive_feature_ids=['sex'], alpha=1)
cr.fit(train_data.drop(['two_year_recid'], axis=1))
train2 = cr.transform(train_data.drop(['two_year_recid'], axis=1))
test2 = cr.transform(testing_data.drop(['two_year_recid'], axis=1))
These are the relevant GitHub references for review.
Lines 219 to 244 in 90e0443
Lines 363 to 371 in 90e0443
Comment:
Disparate impact remover is coded to use only fit_transform(). AIF360 did not provide a transform() function https://github.com/Trusted-AI/AIF360/blob/master/aif360/algorithms/preprocessing/disparate_impact_remover.py. They say "In order to transform test data in the same manner as training data, the distributions of attributes conditioned on the protected attribute must be the same." We could technically make the same assumption and always use fit_transform() for correlation remover because it seems to perform well, but I think it is bad practice as in deployment sometimes we make only one prediction at a time and we can't estimate correlation when only one observation is present, fit_transform() will error out. This is a weakness of the disparate impact remover.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.