Hi, If I use a custom metric like the brier score where lower is bet

hmm I keep getting an error using Brier Score Loss (<a href="https://scikit-learn.org/

your boosting model is simply predicting <a href="https://github.com/scikit-learn/scik

This is <a href="https://colab.research.google.com/drive/1wbYdDrfRd1aGPnwMMUzBhsxTH9tq

Eval Metric directionality? about shap-hypetune HOT 11 CLOSED

cerlymarco commented on May 12, 2024

Eval Metric directionality?

from shap-hypetune.

Comments (11)

cerlymarco commented on May 12, 2024

Hi,

you are looking for greater_is_better param

    greater_is_better : bool, default=False
        Effective only when hyperparameters searching.
        Whether the quantity to monitor is a score function,
        meaning high is good, or a loss function, meaning low is good.

all the best

from shap-hypetune.

ericvoots commented on May 12, 2024

hmm I keep getting an error using Brier Score Loss (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.brier_score_loss.html).

I was able to get it working with the AUC metric fine.

Here is the error and the function:

ValueError: y_prob contains values less than 0.

def BRS(y_hat, dtrain):
y_true = dtrain.get_label()
return 'brs', brier_score_loss(y_true, y_hat)

I checked the data and there good mixture of both 1 and 0's and nothing else.

from shap-hypetune.

cerlymarco commented on May 12, 2024

your boosting model is simply predicting negative values.

from shap-hypetune.

ericvoots commented on May 12, 2024

When I checked it directly from the model object, all the probabilities were above 0. I also ran into issues using the balanced accuracy measure. Only AUC seems to work.

from shap-hypetune.

cerlymarco commented on May 12, 2024

This is a dummy working example which works fine... I hope u can find it helpful.

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import brier_score_loss

from shaphypetune import BoostRFE

from lightgbm import *

X, y = make_classification(n_samples=6000, n_features=20, n_classes=2, 
                                   n_informative=4, n_redundant=6, random_state=0)

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, shuffle=False)

def BRIER(y_true, y_hat):
    return 'brier', brier_score_loss(y_true, y_hat, pos_label=1), False

param_grid = {
    'learning_rate': [0.2, 0.1],
    'num_leaves': [25, 35],
    'max_depth': [10, 12]
}

model = BoostRFE(
    LGBMClassifier(n_estimators=150, random_state=0, metric="custom"), 
    param_grid=param_grid, min_features_to_select=1, step=1,
    greater_is_better=False
)
model.fit(
    X_train, y_train, 
    eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=1, 
    eval_metric=BRIER
)

All the best

from shap-hypetune.

ericvoots commented on May 12, 2024

So the BoostRFE can be used fine with the classification models? On most of the examples here it showed BoostRFE with the regression models?

https://github.com/cerlymarco/shap-hypetune/blob/main/notebooks/XGBoost_usage.ipynb

from shap-hypetune.

cerlymarco commented on May 12, 2024

All the estimators available in shap-hypetune can be used for classification and regression with both xgboost or lgbm

from shap-hypetune.

ericvoots commented on May 12, 2024

Ah got you. Okay I'm still getting errors on the brier score but also got this error on balanced accuracy:

raise ValueError("Classification metrics can't handle a mix of {0} "
ValueError: Classification metrics can't handle a mix of binary and continuous target

Both in the original DB and the dataframe for the target, created all values are 0 and 1.

The regular clf_xgb fits fine and can do both Brier & Balanced Accuracy without issue, but the code crashes on the BoostRFE model (also Boruta too) on the '.fit' step. Here is the code:

clf_xgb = XGBClassifier(n_estimators=2000,
                        random_state=0,
                        verbosity=3,
                        n_jobs=-1,
                        scale_pos_weight=1,
                        use_label_encoder=False,
                        objective='binary:logistic',
                        eval_set=[(cv_x, cv_y)])

    clf_xgb.fit(train_x, train_y)

class_pred = clf_xgb.predict(train_x)

balanced_accuracy = balanced_accuracy_score(class_pred, train_y)

brier_score = brier_score_loss(class_pred, train_y)

print(brier_score)

print(balanced_accuracy)

model = BoostRFE(clf_xgb, param_grid=param_dist, min_features_to_select=1, step=1, n_iter=8, sampling_seed=0)

model.fit(train_x, train_y, eval_set=[(cv_x, cv_y)], early_stopping_rounds=6, verbose=100,eval_metric=ACC)
print(model.estimator_, model.best_params_, model.best_score_, model.n_features_)

print(f"feature ranking {model.ranking_}")

model_ranking_list = list(model.ranking_)

print(model_ranking_list)

from shap-hypetune.

cerlymarco commented on May 12, 2024

it seems you are not using eval_metric=ACC in regular clf_xgb

Pay attention! I think that you are passing to balanced_accuracy_score probabilities (continuous values) instead of predicted classes/targets.

from shap-hypetune.

ericvoots commented on May 12, 2024

I was using the balanced accuracy directly with the following and no crashes:

balanced_accuracy = balanced_accuracy_score(class_pred, train_y)

and when printing the score out. Even when I modify clf_xgb to use the custom Accuracy function as so there are no errors:

clf_xgb = XGBClassifier(n_estimators=2000,
                        random_state=0,
                        verbosity=3,
                        n_jobs=-1,
                        scale_pos_weight=1,
                        use_label_encoder=False,
                        objective='binary:logistic',
                        eval_set=[(cv_x, cv_y)],
                        eval_metric=ACC)

and I'm able to print the both the balanced accuracy score (0.984741888307878) and brier score (0.02292) to console.

from shap-hypetune.

cerlymarco commented on May 12, 2024

This is a dummy working example which works fine... I hope u can find it helpful.

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import balanced_accuracy_score

from shaphypetune import BoostRFE

from xgboost import *

X, y = make_classification(n_samples=6000, n_features=20, n_classes=2, 
                                   n_informative=4, n_redundant=6, random_state=0)

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, shuffle=False)

def ACC(y_pred, dtrain):
    y_true = dtrain.get_label()
    y_pred = (y_pred > 0.5).astype(int)
    err = 1 - balanced_accuracy_score(y_true, y_pred)
    return 'bal_acc', err

param_grid = {
    'learning_rate': [0.2, 0.1],
    'num_leaves': [25, 35],
    'max_depth': [10, 12]
}

model = BoostRFE(
    XGBClassifier(n_estimators=150, random_state=0, metric="custom"), 
    param_grid=param_grid, min_features_to_select=1, step=1,
    greater_is_better=False
)
model.fit(
    X_train, y_train, 
    eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=1, 
    eval_metric=ACC
)

sincerely this is the best I can do... all the best. bie

from shap-hypetune.

Eval Metric directionality? about shap-hypetune HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent