Hello! On my dataset SAGE values depend quite a lot on the train-test split. Would it

SAGE values on cross-validation about sage HOT 2 CLOSED

iancovert commented on May 18, 2024

SAGE values on cross-validation

from sage.

Comments (2)

iancovert commented on May 18, 2024

Hi there, that's an interesting situation. When you try a different train-test split, do you train a new model? Or do you use a different train-test split (with the same model) just when estimating SAGE values? And also, is the estimator running to convergence so that you get pretty narrow confidence intervals?

Assuming that the SAGE values are known with high confidence (narrow confidence intervals), here's what I think you can do.

If it's the first situation, then it may mean that your model depends quite a bit on the train-test split. Ideally that wouldn't happen, especially if there's enough data, but averaging the SAGE values is a reasonable approach. (For the confidence intervals, I would calculate the standard deviations by taking the square root of the average variance.)

If it's the second situation, then I would put more trust in the SAGE values that are calculated using data that was not touched during training (the test data), because the loss values (and therefore the SAGE values) may be artificially changed by overfitting to the train set.

Let me know how that sounds.

from sage.

garkavem commented on May 18, 2024

Hello, thank you for the answer! It is the first situation. Maybe there is not enough data. I guess I will average values and calculate confidence intervals as you suggest. Thanks!

from sage.

Recommend Projects

SAGE values on cross-validation about sage HOT 2 CLOSED

Comments (2)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent