Code Monkey home page Code Monkey logo

localmodel's Issues

Shiny App

that would help compare different local explanations

Refactoring of individual_surrogate_model

Thanks for the effort put in this package.

I'm getting an error from individual_surrogate_model line 158

After spending 3 hours debugging this, it doesn't look like I'm having progress.

The main reason is the complexity of individual_surrogate_model.

What does the function in lines 160-179 suppose to do?

Could you please refactor it into a stand-alone function and add a unit test for easier dubbing?
Moreover, individual_surrogate_model does many things. This makes it hard to (a) understand
and (b) test isolate cases.

Thanks

image

Error in read.dcf(path) : Found continuation line starting ' person("Mateusz" ...' at begin of record.

Hi,

I have tried to install the package using devtools

devtools::install_github("ModelOriented/localModel")

but I got an error:

Error in read.dcf(path) :
Found continuation line starting ' person("Mateusz" ...' at begin of record.

Here is my sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7600)

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C                   LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        rstudioapi_0.7    magrittr_1.5      usethis_1.4.0     devtools_2.0.1   
 [6] pkgload_1.0.2     R6_2.4.0          rlang_0.3.3       tools_3.4.3       pkgbuild_1.0.2   
[11] sessioninfo_1.1.1 cli_1.1.0         withr_2.1.2       remotes_2.0.2     yaml_2.2.0       
[16] assertthat_0.2.1  digest_0.6.18     rprojroot_1.3-2   crayon_1.3.4      processx_3.2.0   
[21] callr_3.0.0       base64enc_0.1-3   fs_1.2.6          ps_1.2.1          curl_3.1         
[26] testthat_2.0.0    glue_1.3.1        memoise_1.1.0     compiler_3.4.3    desc_1.2.0       
[31] backports_1.1.3   prettyunits_1.0.2

Vignettes

Add links to methodology vignettes to others

Improve unit tests

Add unit tests for smaller functions after refactoring individual_surrogate_model

suspicious results by individual_surrogate_model

I'm using the Titanic dataset to explain a random passenger survival rate.
To do that I fit a GLM model with the following statistics, ordered by p-value:

term estimate std.error statistic p.value
GENDERmale -2.66 0.20 -13.33 0.00
(Intercept) 4.70 0.48 9.76 0.00
CLASS3rd -2.50 0.32 -7.91 0.00
AGE -0.04 0.01 -5.57 0.00
CLASS2nd -1.51 0.31 -4.86 0.00
SIBSP -0.38 0.11 -3.47 0.00
EMBARKEDSouthampton -0.66 0.23 -2.91 0.00
EMBARKEDQueenstown -1.08 0.38 -2.88 0.00
FARE 0.00 0.00 -0.43 0.67
PARCH 0.01 0.11 0.10 0.92

We can see that GENDER is the most important variable. However, plotting localModel::individual_surrogate_model output doesn't show this variable is important.

image

Here are some comparisons with other explainers:

  1. ceterisParibus::ceteris_paribus
  2. breakDown::broken

image

image

We see that the other two methods capture and report GENDER impact.

I think the following print gives a direction for a potential bug. This is the individual_surrogate_model output print. Notice the NA near GENDER

image

Does that make any sense?

Error with my new example

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

set.seed(17)
faktor <- sample(c(0, 1), 500, prob = c(0.5, 0.5), replace = T)
table(faktor)
set.seed(17)
numeric <- rnorm(500)
set.seed(17)
y <- (-1)^(faktor)*4*numeric + 0.5 + rnorm(500)
set.seed(17)
X <- data.frame(y = y,
                x1 = as.factor(faktor),
                x2 = numeric,
                x3 = runif(500))
library(randomForest)
library(dplyr)
X_train <- sample_frac(X, 0.8)
X_test <- setdiff(X, X_train)

rf_m <- randomForest(y ~., data = X_train, ntree = 1000)
mean((predict(rf_m, X_test[, -1]) - X_test[, 1])^2)
library(DALEX2)

dalex_explainer <- DALEX2::explain(rf_m, data = X_train[, -1])

library(localModel)
lm_explanation_4 <- individual_surrogate_model(dalex_explainer, X[4, -1],
                                               1000, 17, gaussian_kernel)

Will version 0.5 be published on CRAN?

Thank you for this great package! Do you intend to publish the latest version soon on CRAN?
My collegues and I would like to use the new functions, but we cannot install packages from Github because of security policies.

individual_surrogate_model returns counterintuitive results

For titanic dataset the random forest model shall increase probability of survival for young passangers.

But this explainer predicts negative influence of young age

library("DALEX")
library("randomForest")

titanic <- archivist::aread("pbiecek/models/27e5c")
titanic_rf_v6 <- archivist::aread("pbiecek/models/31570")
johny_d <- archivist::aread("pbiecek/models/e3596")

library("localModel")
localModel_lok <- individual_surrogate_model(explain_rf_v6, johny_d,
                                        size = 5000, seed = 1313)

plot(localModel_lok) 

It is not consistent with ceteris paribus features (which confirm that young age increases survival)

plot_interpretable_feature(localModel_lok, "age")

Acknowledgments

Please add information about the source of funding to the README.md

## Acknowledgments

Work on this package is financially supported by the NCN Opus grant 2017/27/B/ST6/01307.

individual_surrogate_model yields incorrect results because column names are mixed up

The function transform_to_interpretable returns a data frame with the wrong column names.
I'm using the latest localModel version from Github (0.5).

Test case:

library(archivist)
library(DALEX)
library(DALEXtra)
library(randomForest)

titanic  <- archivist::aread("pbiecek/models/27e5c")
new_observation <- archivist::aread("pbiecek/models/e3596")
titanic_rf  <- archivist::aread("pbiecek/models/4e0fc")

x <- explain(model = titanic_rf, 
                    data = titanic[, -9],
                    y = titanic$survived == "yes", 
                    label = "Random Forest",
                    verbose = FALSE)
seed <- 1

I then run this code (taken from individual_surrogate_model function):

  x$data <- x$data[, intersect(colnames(x$data), colnames(new_observation)), drop = F]
  predicted_names <- assign_target_names(x)

  # Create interpretable features
  feature_representations_full <- get_feature_representations(x, new_observation,
                                                              predicted_names, seed)
  encoded_data <- transform_to_interpretable(x, new_observation,
                                             feature_representations_full)

Contents of encoded_data:

          class       gender      age                           sibsp    parch     fare
1 gender = male     baseline baseline embarked = Belfast, Southampton baseline baseline
2 gender = male age <= 15.36 baseline embarked = Belfast, Southampton baseline baseline
3 gender = male     baseline baseline embarked = Belfast, Southampton baseline baseline

     parch     fare embarked
1 baseline baseline baseline
2 baseline baseline baseline
3 baseline baseline baseline

The column names are not correct: class should be gender, gender should be age, etcetera.

I think this causes the counterintuitive LIME explanation for Johnny D's prediction from the random forest model found in section 9.6.2 of the book Explanatory Model Analysis:

http://ema.drwhy.ai/ema_files/figure-html/limeExplLocalModelTitanic-1.png

This problem is caused by a line of code in the function transform_to_interpretable:

transform_to_interpretable <- function(x, new_observation,
                                       feature_representations) {
  encoded_data <- as.data.frame(lapply(feature_representations,
                                       function(x) x[[1]]))
  colnames(encoded_data) <- intersect(colnames(new_observation),
                                       colnames(x$data))
  encoded_data
}

This solution works for my test case, but might not work for all cases:

transform_to_interpretable <- function(x, new_observation,
                                       feature_representations) {
  encoded_data <- as.data.frame(lapply(feature_representations,
                                       function(x) x[[1]]))
  colnames(encoded_data) <- colnames(x$data)
  encoded_data
}

in which case the new_observation argument is obsolete.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.