Code Monkey home page Code Monkey logo

Comments (11)

cerlymarco avatar cerlymarco commented on May 12, 2024

Hi,

you are looking for greater_is_better param

    greater_is_better : bool, default=False
        Effective only when hyperparameters searching.
        Whether the quantity to monitor is a score function,
        meaning high is good, or a loss function, meaning low is good.

all the best

from shap-hypetune.

ericvoots avatar ericvoots commented on May 12, 2024

hmm I keep getting an error using Brier Score Loss (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.brier_score_loss.html).

I was able to get it working with the AUC metric fine.

Here is the error and the function:

ValueError: y_prob contains values less than 0.

def BRS(y_hat, dtrain):
y_true = dtrain.get_label()
return 'brs', brier_score_loss(y_true, y_hat)

I checked the data and there good mixture of both 1 and 0's and nothing else.

from shap-hypetune.

cerlymarco avatar cerlymarco commented on May 12, 2024

your boosting model is simply predicting negative values.

from shap-hypetune.

ericvoots avatar ericvoots commented on May 12, 2024

When I checked it directly from the model object, all the probabilities were above 0. I also ran into issues using the balanced accuracy measure. Only AUC seems to work.

from shap-hypetune.

cerlymarco avatar cerlymarco commented on May 12, 2024

This is a dummy working example which works fine... I hope u can find it helpful.

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import brier_score_loss

from shaphypetune import BoostRFE

from lightgbm import *

X, y = make_classification(n_samples=6000, n_features=20, n_classes=2, 
                                   n_informative=4, n_redundant=6, random_state=0)

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, shuffle=False)

def BRIER(y_true, y_hat):
    return 'brier', brier_score_loss(y_true, y_hat, pos_label=1), False

param_grid = {
    'learning_rate': [0.2, 0.1],
    'num_leaves': [25, 35],
    'max_depth': [10, 12]
}

model = BoostRFE(
    LGBMClassifier(n_estimators=150, random_state=0, metric="custom"), 
    param_grid=param_grid, min_features_to_select=1, step=1,
    greater_is_better=False
)
model.fit(
    X_train, y_train, 
    eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=1, 
    eval_metric=BRIER
)

All the best

from shap-hypetune.

ericvoots avatar ericvoots commented on May 12, 2024

So the BoostRFE can be used fine with the classification models? On most of the examples here it showed BoostRFE with the regression models?

https://github.com/cerlymarco/shap-hypetune/blob/main/notebooks/XGBoost_usage.ipynb

from shap-hypetune.

cerlymarco avatar cerlymarco commented on May 12, 2024

All the estimators available in shap-hypetune can be used for classification and regression with both xgboost or lgbm

from shap-hypetune.

ericvoots avatar ericvoots commented on May 12, 2024

Ah got you. Okay I'm still getting errors on the brier score but also got this error on balanced accuracy:

raise ValueError("Classification metrics can't handle a mix of {0} "
ValueError: Classification metrics can't handle a mix of binary and continuous target

Both in the original DB and the dataframe for the target, created all values are 0 and 1.

The regular clf_xgb fits fine and can do both Brier & Balanced Accuracy without issue, but the code crashes on the BoostRFE model (also Boruta too) on the '.fit' step. Here is the code:

clf_xgb = XGBClassifier(n_estimators=2000,
                        random_state=0,
                        verbosity=3,
                        n_jobs=-1,
                        scale_pos_weight=1,
                        use_label_encoder=False,
                        objective='binary:logistic',
                        eval_set=[(cv_x, cv_y)])

    clf_xgb.fit(train_x, train_y)

class_pred = clf_xgb.predict(train_x)

balanced_accuracy = balanced_accuracy_score(class_pred, train_y)

brier_score = brier_score_loss(class_pred, train_y)

print(brier_score)

print(balanced_accuracy)

model = BoostRFE(clf_xgb, param_grid=param_dist, min_features_to_select=1, step=1, n_iter=8, sampling_seed=0)

model.fit(train_x, train_y, eval_set=[(cv_x, cv_y)], early_stopping_rounds=6, verbose=100,eval_metric=ACC)
print(model.estimator_, model.best_params_, model.best_score_, model.n_features_)

print(f"feature ranking {model.ranking_}")

model_ranking_list = list(model.ranking_)

print(model_ranking_list)

from shap-hypetune.

cerlymarco avatar cerlymarco commented on May 12, 2024

it seems you are not using eval_metric=ACC in regular clf_xgb

Pay attention! I think that you are passing to balanced_accuracy_score probabilities (continuous values) instead of predicted classes/targets.

from shap-hypetune.

ericvoots avatar ericvoots commented on May 12, 2024

I was using the balanced accuracy directly with the following and no crashes:

balanced_accuracy = balanced_accuracy_score(class_pred, train_y)

and when printing the score out. Even when I modify clf_xgb to use the custom Accuracy function as so there are no errors:

clf_xgb = XGBClassifier(n_estimators=2000,
                        random_state=0,
                        verbosity=3,
                        n_jobs=-1,
                        scale_pos_weight=1,
                        use_label_encoder=False,
                        objective='binary:logistic',
                        eval_set=[(cv_x, cv_y)],
                        eval_metric=ACC)

and I'm able to print the both the balanced accuracy score (0.984741888307878) and brier score (0.02292) to console.

image

from shap-hypetune.

cerlymarco avatar cerlymarco commented on May 12, 2024

This is a dummy working example which works fine... I hope u can find it helpful.

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import balanced_accuracy_score

from shaphypetune import BoostRFE

from xgboost import *

X, y = make_classification(n_samples=6000, n_features=20, n_classes=2, 
                                   n_informative=4, n_redundant=6, random_state=0)

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, shuffle=False)

def ACC(y_pred, dtrain):
    y_true = dtrain.get_label()
    y_pred = (y_pred > 0.5).astype(int)
    err = 1 - balanced_accuracy_score(y_true, y_pred)
    return 'bal_acc', err

param_grid = {
    'learning_rate': [0.2, 0.1],
    'num_leaves': [25, 35],
    'max_depth': [10, 12]
}

model = BoostRFE(
    XGBClassifier(n_estimators=150, random_state=0, metric="custom"), 
    param_grid=param_grid, min_features_to_select=1, step=1,
    greater_is_better=False
)
model.fit(
    X_train, y_train, 
    eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=1, 
    eval_metric=ACC
)

sincerely this is the best I can do... all the best. bie

from shap-hypetune.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.