kozodoi / fairness Goto Github PK

R package for computing and visualizing fair ML metrics

Home Page: https://cran.r-project.org/package=fairness

License: Other

R 100.00%

fairness machine-learning fairness-ml fairness-ai discrimination algorithmic-fairness disparate-impact algorithmic-discrimination r

fairness's Introduction

🧑‍💻 Applied ML Scientist at Amazon Web Services
🔥 Working on the frontier of research & business
🎓 PhD from HU Berlin & certified AWS expertise
🏅 Enjoy challenging myself in ML competitions

Check out my website with my ML blog, publications and portfolio.

fairness's People

Contributors

Stargazers

Watchers

Forkers

humboldt-wi piotrpiatyszek vankesteren minghao2016 tirgit njvdvelde haghish

fairness's Issues

insufficient RAM

Hello.

I would like to know why my system's RAM gets overloaded when I try to execute any of the parity metrics of your package. I have a Windows 10 Entreprise System, with 8 Gb RAM. My dataset has a size of 20 Mb. In R I get this message: "Error: no se puede ubicar un vector de tamaño 5.6 Mb".

I did the same excercise in Fedora and I had no issues (similar RAM specifications).

I attach an image of memory usage when I execute the function.

Thank you.

Inconsistency in outcome_base

This is a super helpful package, however there is a small inconsistency with outcome_base in prop_parity function.
To recreate the issue, taking outcome for Caucasian group only:

data("compas")
pred<-factor(compas$predicted[compas$ethnicity=="Caucasian"])
#converting outcome to binary
compas["Two_yr_Recidivism_01"]<- ifelse(compas$Two_yr_Recidivism=="yes",1,0) 
ref<-factor(compas$Two_yr_Recidivism_01[compas$ethnicity=="Caucasian"])  
#Ground Truth
caret::confusionMatrix(pred,ref,positive = "1")

Output:
Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 997 434
         1 284 388
                                          
               Accuracy : 0.6586          
                 95% CI : (0.6379, 0.6789)
    No Information Rate : 0.6091          
    P-Value [Acc > NIR] : 1.557e-06       
                                          
                  Kappa : 0.2588          
                                          
 Mcnemar's Test P-Value : 2.688e-08       
                                          
            **Sensitivity : 0.4720**          
            Specificity : 0.7783          
         Pos Pred Value : 0.5774          
         Neg Pred Value : 0.6967          
             Prevalence : 0.3909          
         Detection Rate : 0.1845          
   **Detection Prevalence : 0.3195**           
      Balanced Accuracy : 0.6252          
                                          
       **'Positive' Class : 1**

Note that the sensitivity for Caucasian group is 0.47 and detection prevalence (or proportional parity) is 0.3195

equal_odds(data    = compas,
                     outcome = 'Two_yr_Recidivism_01',
                     probs   = 'probability',
                     group   = 'ethnicity',
                     cutoff  = 0.5,
                     base    = 'Caucasian',outcome_base = "1")$Metric

Output:
                 Caucasian African_American      Asian    Hispanic Native_American       Other
Sensitivity       0.4720195        0.7525587  0.2500000   0.4656085        0.600000   0.4193548
Equalized odds    1.0000000        1.5943383  0.5296392   0.9864179        1.271134   0.8884270
Group size     2103.0000000     3175.0000000 31.0000000 509.0000000       11.000000 343.0000000

Sensitivity is correct, i.e 0.47 for Caucasian. Note however that outcome_base = "1". Changing this to 1, because this is the positive class to create confusion matrix.

Using same outcome_base to get prop_parity:

prop_parity(data    = compas,
                     outcome = 'Two_yr_Recidivism_01',
                     probs   = 'probability',
                     group   = 'ethnicity',
                     cutoff  = 0.5,
                     base    = 'Caucasian',outcome_base = "1")$Metric

Output:
                       Caucasian African_American      Asian    Hispanic Native_American       Other
Proportion             0.6804565        0.4081890  0.8709677   0.7072692       0.5454545   0.7521866
Proportional Parity    1.0000000        0.5998752  1.2799757   1.0394039       0.8016009   1.1054147
Group size          2103.0000000     3175.0000000 31.0000000 509.0000000      11.0000000 343.0000000

You can see that the proportion (i.e.(TP + FP) / (TP + FP + TN + FN)) is incorrectly calculated as 0.68. This is because you relevel the outcomes and predictions and then subtract 1 after converting this to numeric, which in a way reverses the class.

So, to get the correct result, outcome_base should be set to negative class in this function alone.

prop_parity(data    = compas,
                     outcome = 'Two_yr_Recidivism_01',
                     probs   = 'probability',
                     group   = 'ethnicity',
                     cutoff  = 0.5,
                     base    = 'Caucasian',outcome_base = "0")$Metric

          Caucasian African_American      Asian    Hispanic Native_American       Other
Proportion             0.3195435         0.591811  0.1290323   0.2927308       0.4545455   0.2478134
Proportional Parity    1.0000000         1.852051  0.4038018   0.9160907       1.4224838   0.7755232
Group size          2103.0000000      3175.000000 31.0000000 509.0000000      11.0000000 343.0000000

For the sake of consistency with other parity functions, I'd recommend using Detection Prevalence from confusion matriculates to calculate this metric.

Thank you!

Question about German credit dataset

Hi, Thanks for creating the 'fairness' package. I was curious about two things:

what model (i.e., logistic regression, CART, ...) was used to generate the 'probability' column in the dataset?
what features were included when learning that model?

Thank you!

problem with confustion matrix

Hi, i was using your v1.1.0 package when i encountered another problem.
Some metrics are wrongly calculated - see the example:

fpr_parity(data = fairness::compas,
           outcome = "Two_yr_Recidivism",
           group = "ethnicity",
           probs = "probability",
           base = "African_American",
           outcome_base = "yes")$Metric

fpr_parity(data = fairness::compas,
           outcome = "Two_yr_Recidivism",
           group = "ethnicity",
           probs = "probability",
           base = "African_American",
           outcome_base = "no")$Metric

both return same values despite the fact that FPR = 1 - TNR and TNR has different values :

spec_parity(data = fairness::compas,
           outcome = "Two_yr_Recidivism",
           group = "ethnicity",
           probs = "probability",
           base = "African_American",
           outcome_base = "yes")$Metric

spec_parity(data = fairness::compas,
           outcome = "Two_yr_Recidivism",
           group = "ethnicity",
           probs = "probability",
           base = "African_American",
           outcome_base = "no")$Metric

I think i know where the bug is.
Caret's confussionMatrix is same whether you pass parameter positive ="'yes", or not. So when taking values directly form matrix, they are the same unlike when taking form parameter byClass where Sensitivity, Specificity etc. are stored.

Doesn't work for tibbles

fairness/R/spec_parity.R

Line 62 in 963a3ef

group_status <- as.factor(data[, group])

Currently this line fails if data is a tibble. Could possibly add a line above that ensures data is a data.frame.

e.g.
data <- as.data.frame(data)
then
group_status <- as.factor(data[, group])

fairness package install not working with R version 3.6.3

I'm new to R and want to try out the fairness package with some data. I ran the below command and received that long stream of output. After reading through the output, I tried installing the two packages it recommended I download 'libxml2-dev' and 'libcurl4-openssl-dev' but those are not available for my version of R. I'm on an Ubuntu 20.04 machine using R 3.6.3. Is there some incompatibility with R 3.6.3 and the fairness package and its dependencies? Please let me know, thanks!

> install.packages('fairness')
Installing package into ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
also installing the dependencies ‘credentials’, ‘curl’, ‘gert’, ‘gh’, ‘xml2’, ‘usethis’, ‘covr’, ‘httr’, ‘roxygen2’, ‘rversions’, ‘devtools’
trying URL 'https://cloud.r-project.org/src/contrib/credentials_1.3.0.tar.gz'
Content type 'application/x-gzip' length 230082 bytes (224 KB)
==================================================
downloaded 224 KB
trying URL 'https://cloud.r-project.org/src/contrib/curl_4.3.tar.gz'
Content type 'application/x-gzip' length 673779 bytes (657 KB)
==================================================
downloaded 657 KB
trying URL 'https://cloud.r-project.org/src/contrib/gert_1.0.2.tar.gz'
Content type 'application/x-gzip' length 61324 bytes (59 KB)
==================================================
downloaded 59 KB
trying URL 'https://cloud.r-project.org/src/contrib/gh_1.2.0.tar.gz'
Content type 'application/x-gzip' length 43909 bytes (42 KB)
==================================================
downloaded 42 KB
trying URL 'https://cloud.r-project.org/src/contrib/xml2_1.3.2.tar.gz'
Content type 'application/x-gzip' length 271876 bytes (265 KB)
==================================================
downloaded 265 KB
trying URL 'https://cloud.r-project.org/src/contrib/usethis_2.0.0.tar.gz'
Content type 'application/x-gzip' length 556740 bytes (543 KB)
==================================================
downloaded 543 KB
trying URL 'https://cloud.r-project.org/src/contrib/covr_3.5.1.tar.gz'
Content type 'application/x-gzip' length 146686 bytes (143 KB)
==================================================
downloaded 143 KB
trying URL 'https://cloud.r-project.org/src/contrib/httr_1.4.2.tar.gz'
Content type 'application/x-gzip' length 159950 bytes (156 KB)
==================================================
downloaded 156 KB
trying URL 'https://cloud.r-project.org/src/contrib/roxygen2_7.1.1.tar.gz'
Content type 'application/x-gzip' length 254118 bytes (248 KB)
==================================================
downloaded 248 KB
trying URL 'https://cloud.r-project.org/src/contrib/rversions_2.0.2.tar.gz'
Content type 'application/x-gzip' length 41558 bytes (40 KB)
==================================================
downloaded 40 KB
trying URL 'https://cloud.r-project.org/src/contrib/devtools_2.3.2.tar.gz'
Content type 'application/x-gzip' length 373387 bytes (364 KB)
==================================================
downloaded 364 KB
trying URL 'https://cloud.r-project.org/src/contrib/fairness_1.2.0.tar.gz'
Content type 'application/x-gzip' length 474580 bytes (463 KB)
==================================================
downloaded 463 KB
* installing *source* package ‘curl’ ...
** package ‘curl’ successfully unpacked and MD5 sums checked
** using staged installation
Package libcurl was not found in the pkg-config search path.
Perhaps you should add the directory containing `libcurl.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libcurl' found
Package libcurl was not found in the pkg-config search path.
Perhaps you should add the directory containing `libcurl.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libcurl' found
Using PKG_CFLAGS=
Using PKG_LIBS=-lcurl
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libcurl was not found. Try installing:
 * deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)
 * rpm: libcurl-devel (Fedora, CentOS, RHEL)
 * csw: libcurl_dev (Solaris)
If libcurl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libcurl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘curl’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/curl’
Warning in install.packages :
  installation of package ‘curl’ had non-zero exit status
* installing *source* package ‘xml2’ ...
** package ‘xml2’ successfully unpacked and MD5 sums checked
** using staged installation
Package libxml-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libxml-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libxml-2.0' found
Package libxml-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libxml-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libxml-2.0' found
Using PKG_CFLAGS=
Using PKG_LIBS=-lxml2
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libxml-2.0 was not found. Try installing:
 * deb: libxml2-dev (Debian, Ubuntu, etc)
 * rpm: libxml2-devel (Fedora, CentOS, RHEL)
 * csw: libxml2_dev (Solaris)
If libxml-2.0 is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libxml-2.0.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘xml2’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/xml2’
Warning in install.packages :
  installation of package ‘xml2’ had non-zero exit status
ERROR: dependency ‘curl’ is not available for package ‘credentials’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/credentials’
Warning in install.packages :
  installation of package ‘credentials’ had non-zero exit status
ERROR: dependency ‘curl’ is not available for package ‘httr’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/httr’
Warning in install.packages :
  installation of package ‘httr’ had non-zero exit status
ERROR: dependency ‘xml2’ is not available for package ‘roxygen2’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/roxygen2’
Warning in install.packages :
  installation of package ‘roxygen2’ had non-zero exit status
ERROR: dependencies ‘curl’, ‘xml2’ are not available for package ‘rversions’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/rversions’
Warning in install.packages :
  installation of package ‘rversions’ had non-zero exit status
ERROR: dependency ‘credentials’ is not available for package ‘gert’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/gert’
Warning in install.packages :
  installation of package ‘gert’ had non-zero exit status
ERROR: dependency ‘httr’ is not available for package ‘gh’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/gh’
Warning in install.packages :
  installation of package ‘gh’ had non-zero exit status
ERROR: dependency ‘httr’ is not available for package ‘covr’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/covr’
Warning in install.packages :
  installation of package ‘covr’ had non-zero exit status
ERROR: dependencies ‘curl’, ‘gert’, ‘gh’ are not available for package ‘usethis’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/usethis’
Warning in install.packages :
  installation of package ‘usethis’ had non-zero exit status
ERROR: dependencies ‘usethis’, ‘covr’, ‘httr’, ‘roxygen2’, ‘rversions’ are not available for package ‘devtools’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/devtools’
Warning in install.packages :
  installation of package ‘devtools’ had non-zero exit status
ERROR: dependency ‘devtools’ is not available for package ‘fairness’
* removing ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6/fairness’
Warning in install.packages :
  installation of package ‘fairness’ had non-zero exit status
The downloaded source packages are in
	‘/tmp/RtmpX0Omiu/downloaded_packages’
> install.packages('libxml2-dev')
Installing package into ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘libxml2-dev’ is not available (for R version 3.6.3)
> install.packages('libcurl4-openssl-dev')
Installing package into ‘/home/mackenzie/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘libcurl4-openssl-dev’ is not available (for R version 3.6.3)

Outcome levels don't work

Hi, it appears that outcome_levels don't have effect on output. I believe that the idea was to change the order of factors, so one could specify if ("yes", "no") gives (0,1), or (1,0) factor order, so that caret would calculate confusion matrix one way or the other (correct me if i am wrong).
And by running

equal_odds(data    = fairness::compas, 
           outcome = "Two_yr_Recidivism", 
           group   = "ethnicity",
           preds   = "predicted",
           outcome_levels = c("yes","no"), 
           cutoff = 0.5, 
           base   = "African_American")

i get the same result that by running

equal_odds(data    = fairness::compas, 
           outcome = "Two_yr_Recidivism", 
           group   = "ethnicity",
           probs   = "probability", 
           preds   = NULL,
           outcome_levels = c("no","yes"), 
           cutoff = 0.5, 
           base   = "African_American")

scatter plots

Can we plot 'scatter plots' of the group data?
Because my group data is continuous instead of categorical. How to plot such data parities?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.