Code Monkey home page Code Monkey logo

modeloriented / treeshap Goto Github PK

View Code? Open in Web Editor NEW
75.0 14.0 21.0 18.16 MB

Compute SHAP values for your tree-based models using the TreeSHAP algorithm

Home Page: https://modeloriented.github.io/treeshap/

License: GNU General Public License v3.0

R 83.96% C++ 15.80% Rez 0.24%
explainability explainable-ai explainable-artificial-intelligence explanatory-model-analysis iml interpretability interpretable-machine-learning machine-learning responsible-ml shap

treeshap's Introduction

treeshap

R-CMD-check CRAN status

In the era of complicated classifiers conquering their market, sometimes even the authors of algorithms do not know the exact manner of building a tree ensemble model. The difficulties in models’ structures are one of the reasons why most users use them simply like black-boxes. But, how can they know whether the prediction made by the model is reasonable? treeshap is an efficient answer for this question. Due to implementing an optimized algorithm for tree ensemble models (called TreeSHAP), it calculates the SHAP values in polynomial (instead of exponential) time. Currently, treeshap supports models produced with xgboost, lightgbm, gbm, ranger, and randomForest packages. Support for catboost is available only in catboost branch (see why here).

Installation

The package is available on CRAN:

install.packages('treeshap')

You can install the latest development version from GitHub using devtools with:

devtools::install_github('ModelOriented/treeshap')

Example

First of all, let’s focus on an example how to represent a xgboost model as a unified model object:

library(treeshap)
library(xgboost)
data <- fifa20$data[colnames(fifa20$data) != 'work_rate']
target <- fifa20$target
param <- list(objective = "reg:squarederror", max_depth = 6)
xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 200, verbose = 0)
unified <- unify(xgb_model, data)
head(unified$model)
#>   Tree Node   Feature Decision.type Split Yes No Missing Prediction Cover
#> 1    0    0   overall            <=  81.5   2  3       2         NA 18278
#> 2    0    1   overall            <=  73.5   4  5       4         NA 17949
#> 3    0    2   overall            <=  84.5   6  7       6         NA   329
#> 4    0    3   overall            <=  69.5   8  9       8         NA 15628
#> 5    0    4 potential            <=  79.5  10 11      10         NA  2321
#> 6    0    5 potential            <=  83.5  12 13      12         NA   221

Having the object of unified structure, it is a piece of cake to produce SHAP values for a specific observation. The treeshap() function requires passing two data arguments: one representing an ensemble model unified representation and one with the observations about which we want to get the explanations. Obviously, the latter one should contain the same columns as data used during building the model.

treeshap1 <- treeshap(unified,  data[700:800, ], verbose = 0)
treeshap1$shaps[1:3, 1:6]
#>            age height_cm weight_kg overall potential international_reputation
#> 700   297154.4  5769.186 12136.316 8739757  212428.8               -50855.738
#> 701 -2550066.6 16011.136  3134.526 6525123  244814.2                22784.430
#> 702   300830.3 -9023.299 15374.550 8585145  479118.8                 2374.351

We can also compute SHAP values for interactions. As an example we will calculate them for a model built with simpler (only 5 columns) data and first 100 observations.

data2 <- fifa20$data[, 1:5]
xgb_model2 <- xgboost::xgboost(as.matrix(data2), params = param, label = target, nrounds = 200, verbose = 0)
unified2 <- unify(xgb_model2, data2)

treeshap_interactions <- treeshap(unified2,  data2[1:100, ], interactions = TRUE, verbose = 0)
treeshap_interactions$interactions[, , 1:2]
#> , , 1
#> 
#>                   age  height_cm  weight_kg     overall  potential
#> age       -1886241.70   -3984.09  -96765.97   -47245.92  1034657.6
#> height_cm    -3984.09 -628797.41  -35476.11  1871689.75   685472.2
#> weight_kg   -96765.97  -35476.11 -983162.25  2546930.16  1559453.5
#> overall     -47245.92 1871689.75 2546930.16 55289985.16 12683135.3
#> potential  1034657.61  685472.23 1559453.46 12683135.27   868268.7
#> 
#> , , 2
#> 
#>                  age  height_cm  weight_kg    overall  potential
#> age       -2349987.9  306165.41  120483.91 -9871270.0  960198.02
#> height_cm   306165.4  -78810.31  -48271.61  -991020.7  -44632.74
#> weight_kg   120483.9  -48271.61  -21657.14  -615688.2 -380810.70
#> overall   -9871270.0 -991020.68 -615688.21 57384425.2 9603937.05
#> potential   960198.0  -44632.74 -380810.70  9603937.1 2994190.74

Plotting results

The explanation results can be visualized using shapviz package, see here.

However, treeshap also provides 4 plotting functions:

Feature Contribution (Break-Down)

On this plot we can see how features contribute into the prediction for a single observation. It is similar to the Break Down plot from iBreakDown package, which uses different method to approximate SHAP values.

plot_contribution(treeshap1, obs = 1, min_max = c(0, 16000000))

Feature Importance

This plot shows us average absolute impact of features on the prediction of the model.

plot_feature_importance(treeshap1, max_vars = 6)

Feature Dependence

Using this plot we can see, how a single feature contributes into the prediction depending on its value.

plot_feature_dependence(treeshap1, "height_cm")

Interaction Plot

Simple plot to visualize an SHAP Interaction value of two features depending on their values.

plot_interaction(treeshap_interactions, "height_cm", "overall")

How to use the unifying functions?

For your convenience, you can now simply use the unify() function by specifying your model and reference dataset. Behind the scenes, it uses one of the six functions from the .unify() family (xgboost.unify(), lightgbm.unify(), gbm.unify(), catboost.unify(), randomForest.unify(), ranger.unify()). Even though the objects produced by these functions are identical when it comes to the structure, due to different possibilities of saving and representing the trees among the packages, the usage of these model-specific functions may be slightly different. Therefore, you can use them independently or pass some additional parameters to unify().

library(treeshap)
library(gbm)
x <- fifa20$data[colnames(fifa20$data) != 'work_rate']
x['value_eur'] <- fifa20$target
gbm_model <- gbm::gbm(
  formula = value_eur ~ .,
  data = x,
  distribution = "laplace",
  n.trees = 200,
  cv.folds = 2,
  interaction.depth = 2
)
unified_gbm <- unify(gbm_model, x)
unified_gbm2 <- gbm.unify(gbm_model, x) # legacy API

Setting reference dataset

Dataset used as a reference for calculating SHAP values is stored in unified model representation object. It can be set any time using set_reference_dataset() function.

library(treeshap)
library(ranger)
data_fifa <- fifa20$data[!colnames(fifa20$data) %in%
                             c('work_rate', 'value_eur', 'gk_diving', 'gk_handling',
                              'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')]
data <- na.omit(cbind(data_fifa, target = fifa20$target))
rf <- ranger::ranger(target~., data = data, max.depth = 10, num.trees = 10)

unified_ranger_model <- unify(rf, data)
unified_ranger_model2 <- set_reference_dataset(unified_ranger_model, data[c(1000:2000), ])

Other functionalities

Package also implements predict() function for calculating model’s predictions using unified representation.

How fast does it work?

The complexity of TreeSHAP is $\mathcal{O}(TLD^2)$, where $T$ is the number of trees, $L$ is the number of leaves in a tree, and $D$ is the depth of a tree.

Our implementation works at a speed comparable to the original Lundberg’s Python package shap implementation using C and Python.

The complexity of SHAP interaction values computation is $\mathcal{O}(MTLD^2)$, where $M$ is the number of explanatory variables used by the explained model, $T$ is the number of trees, $L$ is the number of leaves in a tree, and $D$ is the depth of a tree.

CatBoost

Originally, treeshap also supported the CatBoost models from the catboost package but due to the lack of this package on CRAN or R-universe (see catboost issues issues #439, #1846), we decided to remove support from the main version of our package.

However, you can still use the treeshap implementation for catboost by installing our package from catboost branch.

This branch can be installed with:

devtools::install_github('ModelOriented/treeshap@catboost')

References

  • Lundberg, S.M., Erion, G., Chen, H. et al. “From local explanations to global understanding with explainable AI for trees”, Nature Machine Intelligence 2, 56–67 (2020).

treeshap's People

Contributors

hbaniecki avatar kapsner avatar konrad-komisarczyk avatar krzyzinskim avatar maksymiuks avatar mayer79 avatar mikolajsp avatar pbiecek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

treeshap's Issues

Error in ranger.unify()

I tried to unify my binary classification rf model trained by ranger. There is an error
Warning message:
In Ops.factor(get("Prediction"), n) : ‘/’ not meaningful for factors

I don't know how to get rid of it and make my code run. Any idea about this error would be appreciated.

treeshap ver. 1.0.0 goals

Critical:

  • Rebuild catboost.unify, and make the function pass tests and evoke examples without an error
  • Rebuild ranger.unify, and make the function pass tests and evoke examples without an error
  • Prepare extensive tests for all unifiers
  • Clean-up dependencies, remove unnecessary one
  • Interaction visualization

Optional:

  • Add experimental algorithm for tree shap O(TLD) comupatations
  • Refresh README, prepare hex
  • Extend documentation
  • add rpart support
  • implement recalculate covers in C++
  • make all unifiers work on DT

Each goal should be solved as a separate issue.

Support for Cubist package?

Hi,

I would like to know if there are any plans for supporting the Cubist package? It is also based on decision trees.

lightgbm.unify example erroring.

Following example in ?lightgbm.unify

library(treeshap)

library(lightgbm)
#> Loading required package: R6
param_lgbm <- list(objective = "regression", max_depth = 2,  force_row_wise = TRUE)
data_fifa <- fifa20$data[!colnames(fifa20$data) %in%
                           c('work_rate', 'value_eur', 'gk_diving', 'gk_handling',
                             'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')]
data <- na.omit(cbind(data_fifa, fifa20$target))
sparse_data <- as.matrix(data[,-ncol(data)])
x <- lightgbm::lgb.Dataset(sparse_data, label = as.matrix(data[,ncol(data)]))
lgb_data <- lightgbm::lgb.Dataset.construct(x)
lgb_model <- lightgbm::lightgbm(data = lgb_data, params = param_lgbm, save_name = "", verbose = 0)
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> Error in bst$save_model(filename = save_name): Model file  is not available for writes
# unified_model <- lightgbm.unify(lgb_model, sparse_data)
# shaps <- treeshap(unified_model, data[1:2, ])
# plot_contribution(shaps, obs = 1)
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] lightgbm_3.3.1 R6_2.5.1       treeshap_0.1.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.7        pillar_1.6.4      compiler_4.1.2    highr_0.9        
#>  [5] R.methodsS3_1.8.1 R.utils_2.11.0    tools_4.1.2       digest_0.6.29    
#>  [9] jsonlite_1.7.2    lattice_0.20-44   evaluate_0.14     lifecycle_1.0.1  
#> [13] tibble_3.1.2      gtable_0.3.0      R.cache_0.15.0    pkgconfig_2.0.3  
#> [17] rlang_0.4.10      Matrix_1.3-4      reprex_2.0.1      DBI_1.1.2        
#> [21] yaml_2.2.1        xfun_0.29         fastmap_1.1.0     dplyr_1.0.7      
#> [25] withr_2.4.3       styler_1.6.2      stringr_1.4.0     knitr_1.37       
#> [29] generics_0.1.1    fs_1.5.2          vctrs_0.3.8       tidyselect_1.1.1 
#> [33] grid_4.1.2        glue_1.4.2        data.table_1.14.2 fansi_0.4.2      
#> [37] rmarkdown_2.11    purrr_0.3.4       ggplot2_3.3.5     magrittr_2.0.1   
#> [41] backports_1.4.1   scales_1.1.1      ellipsis_0.3.2    htmltools_0.5.2  
#> [45] assertthat_0.2.1  colorspace_2.0-2  utf8_1.1.4        stringi_1.5.3    
#> [49] munsell_0.5.0     crayon_1.4.2      R.oo_1.24.0

Created on 2022-01-03 by the reprex package (v2.0.1)

Release treeshap 0.3.1

First release:

Prepare for release:

  • git pull
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

Error in ranger_surv.unify when type= "chf"

On treeshap version 0.3.0, there is an error thrown by ranger_surv.unify, when type= "chf":

`unified_model_surv <- ranger_surv.unify(rf, train_x, times = c(23), type = "chf")

Error in ranger_surv.unify(rf, train_x, times = c(23), type = "chf") :
times must be a numeric vector and argument
type = 'survival' or type = 'chf' must be set.
`

I believe there is some issue here:

stopifnot(`times` must be a numeric vector and argument \n `type = 'survival'` or `type = 'chf'` must be set. = ifelse(!is.null(times), is.numeric(times) && type == "survival", TRUE))

Support H2O models

Great work. Are you planing on adding support for H2O based tree models as well?

Questions: Applying classification model to 'treeshap' package..

Hello?
I tried to apply treeSHAP to randomforest (RF) model of classification type in R program.
I made two RF models for classification with 'iris' data showing same result.
In the steps, step with 'randomForest.unify' function for two RF models showed errors.

unified.rf1 <- randomForest.unify(iris.rf1, train.data)
_Error in randomForest.unify(iris.rf1, train.data) :
Models built on data with categorical features are not supported - please encode them before training.

unified.rf2 <- randomForest.unify(iris.rf2, train.x)
_Error in Prediction/n : 이항연산자에 수치가 아닌 인수입니다
(Translation: Binomial operator is an argument, not a number)

Error parts in 'randomForest.unify' function code are

  • ' if (any(attr(rf_model$terms, "dataClasses") != "numeric")) ' for iris.rf1'
  • ' y[is.na(Feature), :=(Prediction, Prediction/n)] ' for iris.rf1' & iris.rf2

In this code, all classification models maybe showed same error in second part
because 'Prediction' of classification model was categorical variables.
How do I solve this problem?
(split the data by each class, and use class probability? I do not know in detail...)
Or Is there another way (or method) to run 'randomForest.unify' function for classification model?

Thank you in advance.

Code i used is as follows:

library(randomForest)
data(iris)

set.seed(123)
sample.num<- sample(1:dim(iris)[1], dim(iris)[1]*0.7) # data spilt
train.data <- iris[sample.num, ]
test.data <- iris[-sample.num, ]

set.seed(123)
iris.rf1 <- randomForest(Species ~ ., data=train.data, ntree=100)

names(train.data)
train.x <- train.data[,-5] # species column
train.y <- train.data[,5] # other columns

set.seed(123)
iris.rf2 <- randomForest(x =train.x, y= factor(train.y), ntree=100)

######## ERROR..
unified.rf1 <- randomForest.unify(iris.rf1 ,train.data)
unified.rf2 <- randomForest.unify(iris.rf2 ,train.x)

treeshap.model_unified fails with data.table input

In treeshap.model_unified function, this line does not behave as expected if x is a data.table :

x <- x[,colnames(x) %in% unified_model$feature_names]

x=data.table(a=1:3,b=2:4)
x[,colnames(x)%in%colnames(x)]
[1] TRUE TRUE
x%>%as.data.frame%>%.[,colnames(x)%in%colnames(x)]
  a b
1 1 2
2 2 3
3 3 4

I suggest to convert x to a data.frame when x is a data.table

SHAP interactions?

Fantastic start, thanks a lot! The big three (LGB, XGB, CatBoost) are all shipped with their own treeshap implementation (XGBoost via predict). However, XGBoost seems to be the only one with shap interaction decompositions, which are very useful. At least LightGBM won't implement them (microsoft/LightGBM#3127).

Thus as first issue: could we find a way to extend the current approach to include SHAP interactions, at least as long-term plan?

random survival forest

Good day. Is it possible to get SHAP values for random survival forest models with this package? For instance, those generated from randomForestSRC? Thank you very much.

ranger.unify fails on large models

Hi,

We are training a large random forest model (rf object size is ~270mb) on a large dataset (dim 1,670,000 x 267, object size 3.3gb) and are hitting errors. The machine tested on has 96 cpus/354Gb ram.

Here is a repro.

library(treeshap)
library(ranger)
library(tidyverse)

# Generate random training tibble of similar size to our data
m = matrix(nrow = 800000,ncol = 200,data = runif(n = 800000*200))
object.size(m)/1024^3 # 1.2 gb
trainM = m %>% as_tibble
srf <- ranger(V200 ~ ., data=trainM, num.trees = 5,verbose = TRUE)
object.size(srf)/1024^2 # 89.4 MB
rfu = treeshap::ranger.unify(srf, trainM)

We then got this error:

# *** caught segfault ***
#   address 0x55e43e173ed0, cause 'memory not mapped'
# 
# Traceback:
# 1: new_covers(x, is_na, roots, yes, no, missing, is_leaf, feature,     split, decision_type)
# 2: set_reference_dataset(ret, as.data.frame(data))
# 3: treeshap::ranger.unify(srf, trainM)
# An irrecoverable exception occurred. R is aborting now ...
# Segmentation fault (core dumped)

# R version 4.0.2 (2020-06-22)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 20.04 LTS

Any ideas as to what may be causing this issue? Is it a limitation of the current implementation of the package, or perhaps an issue related to our R environment?

Thanks.

missing decision types

I was trying to create a unified lightgbm. I've fit the model using the tidymodels framework.
Unfortunately I got this error: Error in ifelse(decision_type %in% c(">=", ">"), ret.second(split_index), : Unknown decision_type. My understing is that there is a problem in decision_type. Checkig the model I've noticed that there are thousands of missing value in the decision type column...Any idea of why decisions are missing and how to solve the issue?

Error: Should `unify` use the target variable?

This code returns an error:

n <- 1000
p <- 5
X <- as.data.frame(matrix(rnorm(n*p), nrow=n))
logit <- exp(X[, 1] * X[, 2])
y <- 1 / (1 + logit) > 0.5
plot(X[, 1], X[, 2], col = as.factor(y))
plot(X[, 1], X[, 3], col = as.factor(y))
df <- cbind(X, y = y)
            
library(gbm)
model <- gbm(y~., data=df, interaction.depth=2, n.trees=50)

library(treeshap)
uexp <- gbm.unify(model, X)
treeshap1 <- treeshap(uexp,  X, verbose = 0)
plot_feature_importance(treeshap1)

plot_contribution(treeshap1)

When passing df to gbm.unify and treeshap, the y variable shows on both plots, and it should not.

`max_vars` exceeding number of variables causes error in `plot_contribution`

reproducible example:

library(xgboost)
library(treeshap)
data <- fifa20$data[colnames(fifa20$data) != 'work_rate']
target <- fifa20$target
param <- list(objective = "reg:squarederror", max_depth = 3)
xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 200)
unified_model <- xgboost.unify(xgb_model, as.matrix(data))
x <- head(data, 1)
shap <- treeshap(unified_model, x)
plot_contribution(shap, 1,  max_vars = 66)

Feature request: consolidated unify function

Not sure if there are other modeling functions to look out for, but maybe something like:

unify <- function(model, x){
  .cl <- class(model)
  if("randomForest" %in% .cl){
    ret <- randomForest.unify(model, x)
  }else if("ranger" %in% .cl){
    ret <- ranger.unify(model, x)
  }else if("gbm" %in% .cl){
    ret <- gbm.unify(model, x)
  }else if("xgb.Booster" %in% .cl){
    ret <- xgboost.unify(model, x)
  }else if("lgb.Booster" %in% .cl){
    ret <- lightgbm.unify(model, x)
  }else if("catboost.Model" %in% .cl){
    ret <- catboost.unify(model, x)
  }else
    stop("Model is not a treeshap supported model.")
  ret
}

support categorical features

ranger.unify and gbm.unify don't support categorical features, which is essential when working with factors in R.

Release treeshap 0.3.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • Prepare GitHub release based on the CRAN release

Feature: ranger_surv.unify(type = "survival")

Hello @pbiecek,

I am currently trying to add a functionality to treeshap that allows to compute SHAP values depending on the survival time.
My goal is to provide SHAP values computed with treeshap that allow to be further used with the survex R package in order to compute SurvSHAP(t) values, as the current computation using kernelshap seems to be rather slow for ranger (https://github.com/ModelOriented/survex/blob/main/R/surv_shap.R#L148).

The current state of implementation is located here in my fork: kapsner@ead7d15

Before I open a PR here, I want to make sure that there are no major issues.

At one of the unit tests, I encountered some irregularities when comparing "unified" to "original" predictions:
https://github.com/kapsner/treeshap/blob/master/tests/testthat/test_ranger_surv.R#L142

Somehow, the difference between unified and original predictions increases with greater death_times, leading to a failing of the original comparison (expect_true(all(abs((from_unified - original) / original) < 10**(-14)))).

Does that somehow make sense? If this seems errorneous, would you have an idea, where the error might be located?

Thanks a lot in advance.
Best, Lorenz

randomForest.unify error

I am trying to unify my random forest model using the available function but am consistently getting the same error, "Error in Prediction/n : non-numeric argument to binary operator"

The random forest model is a binary classifier. I have ensured that there are no categorical variables in the predictor data (i.e., the data is completely numeric) and that all NA values have been removed. I have also reconstructed the random forest so that the labels for the binary classification are "0" and "1" respectively, to see if that made a difference. What is perplexing to me is that, within the randomForest package, the "predict" function executes just fine and gives a correct output.

Any idea where the error is originating and how to fix it? Thanks!

Saabas Values

Dear authors,
thanks for a great package.
The original python version by Lundberg offers the Boolean approximate=True parameter which will compute Saabas scores instead of SHAP values.
Is that a possibility in treeshap?
Thx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.