erdogant / hgboost Goto Github PK

hgboost is a python package for hyper-parameter optimization for xgboost, catboost or lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.

Home Page: http://erdogant.github.io/hgboost

License: Other

Python 82.71% Shell 0.33% Jupyter Notebook 16.96%

gridsearch hyperoptimization xgboost catboost lightboost crossvalidation machine-learning python

hgboost's Introduction

hgboost - Hyperoptimized Gradient Boosting

hgboost is short for Hyperoptimized Gradient Boosting and is a python package for hyperparameter optimization for xgboost, catboost and lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.

hgboost is fun because:

* 1. Hyperoptimization of the Parameter-space using bayesian approach.
* 2. Determines the best scoring model(s) using k-fold cross validation.
* 3. Evaluates best model on independent evaluation set.
* 4. Fit model on entire input-data using the best model.
* 5. Works for classification and regression
* 6. Creating a super-hyperoptimized model by an ensemble of all individual optimized models.
* 7. Return model, space and test/evaluation results.
* 8. Makes insightful plots.

⭐️ Star this repo if you like it ⭐️

Blogs

Medium Blog 1: The Best Boosting Model using Bayesian Hyperparameter Tuning but without Overfitting.

Medium Blog 2: Create Explainable Gradient Boosting Classification models using Bayesian Hyperparameter Optimization.

Documentation pages

On the documentation pages you can find detailed information about the working of the hgboost with many examples.

Colab Notebooks

Regression example
Classification example

Schematic overview of hgboost

Installation Environment

conda create -n env_hgboost python=3.8
conda activate env_hgboost

Install from pypi

pip install hgboost
pip install -U hgboost # Force update

Import hgboost package

import hgboost as hgboost

Examples

Example: Fit catboost by hyperoptimization and cross-validation

Example: Fit lightboost by hyperoptimization and cross-validation

Example: Fit xgboost by hyperoptimization and cross-validation

Example: Plot searched parameter space

Example: plot summary

Example: Tree plot

Example: Plot the validation results

Example: Plot the cross-validation results

Example: use the learned model to make new predictions

Example: Create ensemble model for Classification

Example: Create ensemble model for Regression

Classification example for xgboost, catboost and lightboost:

# Load library
from hgboost import hgboost

# Initialization
hgb = hgboost(max_eval=10, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=42)

# Fit xgboost by hyperoptimization and cross-validation
results = hgb.xgboost(X, y, pos_label='survived')

# [hgboost] >Start hgboost classification..
# [hgboost] >Collecting xgb_clf parameters.
# [hgboost] >Number of variables in search space is [11], loss function: [auc].
# [hgboost] >method: xgb_clf
# [hgboost] >eval_metric: auc
# [hgboost] >greater_is_better: True
# [hgboost] >pos_label: True
# [hgboost] >Total dataset: (891, 204) 
# [hgboost] >Hyperparameter optimization..
#  100% |----| 500/500 [04:39<05:21,  1.33s/trial, best loss: -0.8800619834710744]
# [hgboost] >Best performing [xgb_clf] model: auc=0.881198
# [hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50
# 100%|██████████| 10/10 [00:42<00:00,  4.27s/it]
# [hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).
# [hgboost] >[auc] on independent validation dataset: -0.832
# [hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.

# Plot the ensemble classification validation results
hgb.plot_validation()

References

* http://hyperopt.github.io/hyperopt/
* https://github.com/dmlc/xgboost
* https://github.com/microsoft/LightGBM
* https://github.com/catboost/catboost

Maintainers

Erdogan Taskesen, github: erdogant

Contribute

Contributions are welcome.

Licence See LICENSE for details.

Coffee

If you wish to buy me a Coffee for this work, it is very appreciated :)

hgboost's People

Contributors

Stargazers

Watchers

Forkers

bneijt recherhe atangfan vishalbelsare 321hg mayamm99 seanahmad huaxingxu omvishal1 dixinlike auvi sasmendonca inkmindai tdl77 stjordanis onethursday

hgboost's Issues

Getting the native model for compatibility with shap.TreeExplainer

Hello, first of all really nice project. I've just found out about it today and started playing with it a little bit.
Is there any way to get the trained model as an XGBoost, LightGBM or CatBoost class in order to fit a shap.TreeExplainer instance to it?

Thanks in advance!
-Nicolás

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

There is an error when f1 score is used for multı-class classification. The error of line is on hgboost.py:904 while calculating f1 score, average param default is binary which is not suitable for multi-class.

Error in rmse calculaiton

if self.eval_metric=='rmse':
                loss = mean_squared_error(y_test, y_pred)

mean_squared_error in sklearn gives mse, use mean_squared_error(y_true, y_pred, squared=False) for rmse

hgb.plot_params() graphs are not complete

I am using version 1.1.5 and when I plot the hgb.plot_params() the figures do not appear to be complete (i.e missing parameter graphs and graph contents).

numpy.AxisError: axis 1 is out of bounds for array of dimension 1

When eval_metric is auc, it raises an error. The related line is hgboost.py:906 and the related issue is: https://stackoverflow.com/questions/61288972/axiserror-axis-1-is-out-of-bounds-for-array-of-dimension-1-when-calculating-auc

Running regression example error

when I try to use hgboost for regression model ,there was a error:
_get_params() got multiple values for argument 'eval_metric',
at the begining,I think there are some error with my script,The same problem occurs when running the example follow:
https://erdogant.github.io/hgboost/pages/html/Examples.html#xgboost-reg

Specific data results problem

Your package is very convenient and effective, but if I want to redraw the plot, what can I do to get specific data results?
thanks for helping.

import error during import hgboost

When I finished installation of hgboost and try to import hgboost,there is something wrong,could you please help me out? Details are as follows:

ImportError Traceback (most recent call last)
in
----> 1 from hgboost import hgboost

C:\ProgramData\Anaconda3\lib\site-packages\hgboost_init_.py in
----> 1 from hgboost.hgboost import hgboost
2
3 from hgboost.hgboost import (
4 import_example,
5 )

C:\ProgramData\Anaconda3\lib\site-packages\hgboost\hgboost.py in
9 import classeval as cle
10 from df2onehot import df2onehot
---> 11 import treeplot as tree
12 import colourmap
13

C:\ProgramData\Anaconda3\lib\site-packages\treeplot_init_.py in
----> 1 from treeplot.treeplot import (
2 plot,
3 randomforest,
4 xgboost,
5 lgbm,

C:\ProgramData\Anaconda3\lib\site-packages\treeplot\treeplot.py in
14 import numpy as np
15 from sklearn.tree import export_graphviz
---> 16 from sklearn.tree.export import export_text
17 from subprocess import call
18 import matplotlib.image as mpimg

ImportError: cannot import name 'export_text' from 'sklearn.tree.export'

thanks a lot!

Test:Validation:Train split

Shouldn't be the new test-train split be test_size=self.test_size/(1-self.val_size) in def _HPOpt(self):. We updated the shape of X in _set_validation_set(self, X, y)

I'm assuming that the test, train, and validation set ratios are defined on the original data.

issue with shuffling results and loading model

R2, RMSE,MAE, MAPE results were changing. so it is difficult to compare different experiments.
I am unable to load the saved model.

Xgboost parameter

After using the code hgb.plot_params(), the parameter of learning rate is 796. I don't think it's reasonable.
Can I see the model parameters optimized by using HyperOptimized parameters？

hgboost installation error

When installing in colab...this error is coming
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Could you pl help

HP Tuning: best_model uses different parameters from those that were reported as best ones

I used hgboost for optimizing the hyper-parameters of my XGBoost model as described in the API References with the following parameters:

hgb = hgboost()
results = hgb.xgboost(X_train, y_train, pos_label=1, method='xgb_clf', eval_metric='logloss')

As noted in the documentation, results is a dictionary that, among other things, returns the best performing parameters (best_params) and the best performing model (model). However, the parameters that the best performing model uses are different from what the function returns as best_params:

`best_params`

'params': {'colsample_bytree': 0.47000000000000003,
  'gamma': 1,
  'learning_rate': 534,
  'max_depth': 49,
  'min_child_weight': 3.0,
  'n_estimators': 36,
  'subsample': 0.96}

`model`

'model': XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
               colsample_bynode=1, colsample_bytree=0.47000000000000003,
               enable_categorical=False, gamma=1, gpu_id=-1,
               importance_type=None, interaction_constraints='',
               learning_rate=0.058619090164329916, max_delta_step=0,
               max_depth=54, min_child_weight=3.0, missing=nan,
               monotone_constraints='()', n_estimators=200, n_jobs=-1,
               num_parallel_tree=1, predictor='auto', random_state=0,
               reg_alpha=0, reg_lambda=1, scale_pos_weight=0.5769800646551724,
               subsample=0.96, tree_method='exact', validate_parameters=1,
               verbosity=0),

As you can see, for example, max_depth=49 in the best_params, but the model uses max_depth=54 etc.

Is this a bug or the intended behavior? In case of the latter, I'd really appreciate an explanation!

My setup:

OS: WSL (Ubuntu)
Python: 3.9.7
hgboost: 1.0.0

hgb.plot command returns seaborn regplot error

From the example classification notebook:

Maybe tmpdf is a list or dictionary that seaborn is complaining about?

The following issues occur when installing hgboost using anaconda

Treeplot failure - missing graphviz dependency

I'm running through the example classification notebook now, and the treeplot fails to render, with the following warning:

It seems that graphviz being a compiled c library is not bundled in pip (it is included in conda install treeplot/graphviz though).

Since we have no recourse to add this to pip requirements, maybe a sentence in the Instalation instructions warning that graphviz must already be available and/or installed separately.

(note the suggested apt command for linux is not entirely necessary, because pydot does get installed with treeplot via pip)

Hgboost on GPU

Does this package has the capability to operate on GPU? if yes, can you specify where to specify the parameter?

TypeError: LGBMRegressor.fit() got an unexpected keyword argument 'early_stopping_rounds'

First of all, thank you for the development of this Sw that allows in an easy way to find the optimal parameters for gradient descent algorithms with boosting.

It seems to me that they have updated the LGBM version and now 4.0.x is available. I am using Google Colab, and installing the package, both by pypi and directly from Github shows the error in the title.

I hope it can be easily fixed. Best regards.