Code Monkey home page Code Monkey logo

acv00's People

Contributors

cyrlemaire avatar salimamoukou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

acv00's Issues

Getting `clang: error: unsupported option '-fopenmp'` when installing with pip on M1 mac

Hi!

I'm eager to try this library out. Unfortunately I get an error upon installation:

clang: error: unsupported option '-fopenmp'
  • I updated llvm using homebrew (did not solve the problem).

  • clang --help | grep fopenmp returns

      -fopenmp                Parse OpenMP pragmas and generate parallel code.
    

so it's just strange that this argument is not recognized during installation.

Any idea how to solve this?

My specs are:

Apple M1 Pro (2021)
MacOS 12.5.1
Python 3.10

Memory leak ?

Running a simple example with CatBoostClassifier and a relativly small dataset (25000, 160),memory consomption just exploded(with python = 3.8):

the first step got executed correctly:
acvtree = ACVTree(model, X_test_enc_top_extreme.fillna(0).values)

then this step just kill my machine with the small dataset (25000, 160),more than 100gig of RAM was used before it crashed...:
shap_val_acv = acvtree.shap_values(X_test_enc_top_extreme.fillna(0).values)

image

image

acvtree.global_sdp_importance_clf error with LightGBM, but not RandomForest

Hello,

First of all, kudos for this lib, it's amazing how many models you already support (sklearn, skopt, {xgb,cat,light}gbm).

My test works for RandomForest, with basically the same current performance limitations. Having looked at the code, maybe the C extension (cext_acv) which should speed things up is not yet implemented.

Basically, the very same run of global_sdp_importance_clf on a subset (due to the performance issue) which works with sklearn RandomForest fails with LightGBM.

Since the syntax changed a little from the previous lib, I followed one notebook example for the C parameter (maybe I'm wrong there).

n = 100
C = [[]]
# columns = list of features
# already fitted model of type "lightgbm.sklearn.LGBMClassifier"
acvtree = ACVTree(model, X_train[:n].values)
sdp_importance_m, sdp_importance, sdp_importance_proba, sdp_importance_coal_count, sdp_importance_variable_count = acvtree.global_sdp_importance_clf(data=X_test[:n].values[y_test[:n]<1], data_bground=X_train[:n].values, columns_names=columns, global_proba=0.9, decay=0.7, threshold=0.6, proba=0.9,verbose=1,C=C, verbose=0)

leading to this error

~/.virtualenvs/venv/lib/python3.8/site-packages/acv_explainers/acv_tree.py in global_sdp_importance_clf(self, data, data_bground, columns_names, global_proba, decay, threshold, proba, C, verbose)
     64                           proba, C, verbose):
     65
---> 66         return global_sdp_importance(data, data_bground, columns_names, global_proba, decay, threshold,
     67                           proba, C, verbose, self.compute_sdp_clf, self.predict)
     68

~/.virtualenvs/venv/lib/python3.8/site-packages/acv_explainers/py_acv.py in global_sdp_importance(data, data_bground, columns_names, global_proba, decay, threshold, proba, C, verbose, cond_func, predict)
    475             fx = predict(np.expand_dims(ind, 0))[0]
    476
--> 477         local_sdp(ind, fx, threshold, proba, index, data_bground, final_coal, decay,
    478                   C=C, verbose=verbose, cond_func=cond_func)
    479

~/.virtualenvs/venv/lib/python3.8/site-packages/acv_explainers/py_acv.py in local_sdp(x, f, threshold, proba, index, data, final_coal, decay, C, verbose, cond_func)
    405                 if c not in C_off:
    406
--> 407                     value = cond_func(x, f, threshold, S=chain_l(c), data=data)
    408                     c_value[size][str(c)] = value
    409

~/.virtualenvs/venv/lib/python3.8/site-packages/acv_explainers/acv_tree.py in compute_sdp_clf(self, x, fx, tx, S, data)
     37
     38     def compute_sdp_clf(self, x, fx, tx, S, data):
---> 39         sdp = cond_sdp_forest_clf(x, fx, tx, self.trees, S, data=data)
     40         return sdp
     41

~/.virtualenvs/venv/lib/python3.8/site-packages/acv_explainers/py_acv.py in cond_sdp_forest_clf(x, fx, tx, forest, S, data)
    239
    240         s = (mean_forest['all'] - mean_forest['down']) / (mean_forest['up'] - mean_forest['down'])
--> 241         sdp += 0 * (s[int(fx)] < 0) + 1 * (s[int(fx)] > 1) + s[int(fx)] * (0 <= s[int(fx)] <= 1)
    242     # sdp = 0 * (sdp[int(fx)] < 0) + 1 * (sdp[int(fx)] > 1) + sdp[int(fx)] * (0 <= sdp[int(fx)] <= 1)
    243     return sdp/n_trees

IndexError: index 1 is out of bounds for axis 0 with size 1

BTW since you seem interested in multi-arm bandit, you may find this hyper-parameter search library interesting. It's a multi-armed bandit bayesian optimizer based on the gaussian process.

Thanks!

Cheers

I have no actual issue at the moment but just finished reading the papers and I wanted to offer my praise for your work. It is great stuff.

I am also very much looking forward to the implementation of Swing Shapley Values for tree-based models.

I may have some real world tests/comparisons between your methods and classic SHAP results I can at least partially share in a few months .

Thank you again for sharing your work!

MemoryError while using category grouping in ACVTree

I wanted to use your TreeSHAP computation, because some variable in my dataset should be linked.
I selected a small model (one tree... with a thousand leaves), used on a dataset with 23 features.
I do not understand the size of the arrays used in the computation : it produces MemoryErrors, even when I try to use very small data : only one tree of my random forest, 20 data as background for the explainer, and 3 data to explain.

"tree = rf.estimators_[0]
acvtree = ACVTree(tree, train.iloc[:20])
cats = [[0],[1,4],[2,5],[3,6],
[7,15],[8,16],[9,17],[10,18],[11,19],[12,20],[13,21],[14,22]]
sv = acvtree.shap_values(test.iloc[:3],cats)"
"MemoryError: Unable to allocate 161. GiB for an array with shape (1196, 131072, 3, 23, 2) and data type float64"

I can explain any number in this shape except 131072 (1192 is the number of leaves, two must is the number of outputs of the classifier).
Do you have any idea what I should do to mitigate this problem ?

upgrade numpy version

Hi,
Is it possible to modify ACV so that it is compatible with the new versions of numpy?
numpy <1.22 is more than 2 years old.

We have ACV as a dependency in Shapash (https://github.com/MAIF/shapash) and it's complicated to keep it if the numpy version is limited because the other dependencies require new numpy versions.

TypeError: unhashable type: 'list' in compute_local_sdp function

Hello,

Thank you for a great package. I've been trying out the code on the front page. I ran into an issue when I was trying to generate the local explanatory importance scores and I wondered if you might be able to help? I got the following error:


TypeError                                 Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 lximp = acv_explainer.compute_local_sdp(X_train.shape[1], sufficient_expl)

File ~/.local/lib/python3.9/site-packages/acv_explainers/acv_agnosticX.py:627, in ACXplainer.compute_local_sdp(d, sufficient_coal)
    625 flat = [item for sublist in sufficient_coal for item in sublist]
    626 flat = pd.Series(flat)
--> 627 flat = dict(flat.value_counts() / len(sufficient_coal))
    628 local_sdp = np.zeros(d)
    629 for key in flat.keys():

TypeError: unhashable type: 'list'

I tried to manually calculate the LEI based on your paper, since it's a just a simple percentage of how many SE in the A-SE a feature appears in, but I also found that the sufficient_expl list has negative values? Do they indicate a feature as well? Worth noting that sometimes the only result I get for the A-SE is -1.

ValueError: Buffer dtype mismatch, expected 'long' but got 'long long'

If I try to run the code in the Python Notebook and change it into something a python script can run, the code has the error I wrote in the title when calculating the SDP using compute_sdp_clf. I believe this has something to do with the Cython file, in line 238 something has to be changed, maybe long into long long?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.