Comments (1)
@MoonCapture There is no parameter to do that so you would have to implement it by yourself.
I would approach this by calculating linear approximation similarly as is done in Generalized DeepSHAP.
You will need to get SHAP predictions by best_model_01.predict_contributions(df_test_normalized[0,:], background_frame=df_train_normalized, output_space=True, output_per_reference=True)
.
First you have to ensure that the output of the SHAP values are in the same space as the predictions (i.e. if the model uses link function you might have to apply inverse link function on the SHAP values) this is what parameter output_space=True
does.
Then you will need contribution to the change of prediction against every single point from background_frame
, that's what output_per_reference
is for.
Relevant part of the doc string:
:param output_space: If True, linearly scale the contributions so that they sum up to the prediction.
NOTE: This will result only in approximate SHAP values even if the model supports exact SHAP calculation.
NOTE: This will not have any effect if the estimator doesn't use a link function.
:param output_per_reference: If True, return baseline SHAP, i.e., contribution for each data point for each reference from the background_frame.
If False, return TreeSHAP if no background_frame is provided, or marginal SHAP if background frame is provided.
Can be used only with background_frame.
Next you denormalize the SHAP values. This depends on the way you normalize the data, if you can inverse the normalization just by multiplication then it's simple just multiply all values. If you need to use addition as well then this I would do only for the Bias after the multiplication. If the normalization procedure you use is more complicated, use eq. 3 from Explaining a series of models by propagating Shapley values. (or you can check my implementation of simplified G-DeepSHAP in our StackedEnsembles (simplified because it is applied only on two layers (basemodels -> metalearner))).
Next you should check that the Bias is denormalized prediction on the background frame point.
Pseudocode:
abs(denormalize(best_model_01.predict(background_frame[i, :])) - denorm_shap_pred[denorm_shap_pred["BackgroundRowIdx"]==i, "Bias"]) < 1e-6
Then you can also check that row sums of the denorm_shap_pred
(excluding RowIdx
and BackgroundRowIdx
) are roughly the same as denormalized prediction (i.e. denormalized contributions + denormalized bias == denormalized prediction).
Next if you're confident that those values are close enough (depending on the model the epsilon can be 1e-6 up to 1e-3 (XGBoost uses floats in our implementation for prediction and double for contributions so there will be the epsilon closer to 1e-3)), you take the average contribution across the background frame. Something like:
denorm_shap_pred.drop("BackgroundRowIdx").groupby("RowIdx").mean()
And that should be the result you are looking for. It's not exact SHAP value since G-DeepSHAP gives only approximation if there is some non-linearity but at least you can compute it in reasonable time.
from h2o-3.
Related Issues (20)
- Appendix g/h/i: updating user guide page to adhere to style guide (gainslift_bins, gradient_epsilon, HGLM, histogram_type, huber_alpha, ignore_const_col, ignored_columns, impute_missing, in_training_checkpoints_dir, in_training_checkpoints_tree_interval, include_algos, inflection_point, init (GLRM, K-Means), init (CoxPH), interaction_constraints, interaction_pairs, interactions, intercept) HOT 1
- Appendix k/l: updating user guide page to adhere to style guide (k, keep_cross_validation_fold_assignment, keep_cross_validation_models, keep_cross_validation_predictions, lambda, lambda_min_ratio, lambda_search, laplace, learn_rate, learn_rate_annealing, link, lre_min) HOT 1
- Appendix m: updating user guide page to adhere to style guide (max_abs_leafnode_pred, max_active_predictors, max_after_balance_size, max_depth, max_iterations, max_models, max_runtime_secs, max_runtime_secs_per_model, metalearner_algorithm, metalearner_params, metalearner_transform, min_prob, min_rows, min_sdev, min_split_improvement, missing_values_handling, model_id, monotone_constraints, mtries) HOT 1
- Appendix n/o/p: updating user guide page to adhere to style guide (nbins, nbins_cats, nbins_top_level, nfolds, nlambdas, noise, non_negative, ntrees, objective_epsilon, offset_column, out_of_bounds, pca_impl, pca_method, plug_values, pred_noise_bandwidth, prior) HOT 1
- Appendix q/r/s: updating user guide page to adhere to style guide (quantile_alpha, rand_family, random_columns, rate, rate_annealing, rate_decay, remove_collinear_columns, sample_rate, sample_rate_per_class, sample_size, score_each_iteration, score_tree_interval, seed, single_node_mode, smoothing, solver, sort_metric, standardize, start_column, stop_column, stopping_metric, stopping_rounds, stopping_tolerance, stratify_by) HOT 1
- Appendix t/u/v/w/x/y: updating user guide page to adhere to style guide (theta, ties, training_frame, transform, treatment_column, tweedie_link_power, tweedie_power, tweedie_variance_power, uplift_metric, upload_custom_distribution, upload_custom_metric, use_all_factor_levels, user_points, validation_frame, weights_column, x, y) HOT 1
- Implement UMAP
- Implement HDBSCAN
- Job request failed Local server has died unexpectedly. RIP., will retry after 3s
- Fix plotting in explain: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show()
- List tests that needed to be manually verified when changing plotting actions in Python for explain function HOT 1
- Fix as_data_frame and not use csv as a medium HOT 1
- Add use_multi_thread for as_data_frame
- Bug in ICE Plot with R 4.4
- Add support for Websockets to steam.jar
- R 4.4 warning `Did you mean to use "<<-"? ( in method "get_model" for class "models_info")`
- Upload H2O-3 3.46.0.3 to CRAN
- Bug in GBM python example
- 3.46.0.3 Release Notes
- Overview video for H2O-3 like DAI
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2o-3.